「[翻訳] Durango Memory System Overview」の編集履歴(バックアップ)一覧はこちら
追加された行は緑色になります。
削除された行は赤色になります。
<div>
<blockquote>このページは<a href="http://www.vgleaks.com/durango-memory-system-overview/">http://www.vgleaks.com/durango-memory-system-overview/</a>からの引用です
<div class="gdl-navigation-wrapper"> </div>
</blockquote>
<blockquote>作業中・・・<br /></blockquote>
</div>
<div class="body-wrapper">
<div class="container-wrapper">
<div class="content-wrapper main container">
<div class="page-wrapper single-blog single-sidebar left-sidebar">
<div class="row">
<div class="gdl-page-left twelve columns">
<div class="row">
<div class="gdl-page-item mb20 gdl-blog-full eight columns">
<h1 class="blog-title"><a href="http://www.vgleaks.com/durango-memory-system-overview/">Durango Memory System
Overview</a></h1>
<div class="blog-content-wrapper">
<blockquote>
<div class="blog-content">We have read multiples replies and discussions
around<a href="http://www.vgleaks.com/durango/" title="Durango">Durango</a>’s
memory system throughout the internet, due to we would like to share this
information with all of you. In this article we expose the different types of
memories that Durango has and how this memories work together with the rest of
the system.</div>
</blockquote>
(前半省略)この記事で我々はDurangoが持つ異なるタイプのメモリーについてと、これらのメモリーがシステムの他の部分といかに協調して動作するかについて公開する。
<blockquote>
<p>The central elements of the Durango memory system are the<i>north
bridge</i>and the<i>GPU memory system</i>. The memory system supports multiple
clients (for example, the CPU and the GPU), coherent and non-coherent memory
access, and two types of memory (DRAM and ESRAM).</p>
</blockquote>
Durangoのメモリーシステムの中心的な要素はNorth
BridgeとGPUメモリーシステムだ。メモリーシステムは複数クライアント(例えばCPUとGPU)対応や、コヒーレント/ノンコヒーレントメモリーアクセス、2つのタイプのメモリー(DRAMとESRAM)をサポートする。
<h4>Memory clients</h4>
<blockquote>
<p>The following diagram shows you the Durango memory clients with the maximum
available bandwidth in every path.</p>
</blockquote>
次の図はDurangoのメモリークライアントとそれぞれのパスにおける最大利用可能帯域を示している。
<h4><img width="474" height="521" src="http://www.vgleaks.com/wp-content/uploads/2013/03/durango_memory.jpg" alt="durango_memory" class="size-full wp-image-2208 aligncenter" /></h4>
<h4>Memory</h4>
<blockquote>
<p>As you can see on the right side of the diagram, the Durango console
has:</p>
<ul><li>8 GB of DRAM.</li>
<li>32 MB of ESRAM.</li>
</ul></blockquote>
<h4>DRAM</h4>
<blockquote>
<p>The maximum combined read and write bandwidth to DRAM is 68 GB/s (gigabytes
per second). In other words, the sum of read and write bandwidth to DRAM cannot
exceed 68 GB/s. You can realistically expect that about 80 – 85% of that
bandwidth will be achievable (54.4 GB/s – 57.8 GB/s).</p>
</blockquote>
<blockquote>
<p>DRAM bandwidth is shared between the following components:</p>
<ul><li>CPU</li>
<li>GPU</li>
<li>Display scan out</li>
<li>Move<a href="http://www.vgleaks.com/engines/" title="engines">engines</a></li>
<li>Audio system</li>
</ul></blockquote>
<h4>ESRAM</h4>
<blockquote>
<p>The maximum combined ESRAM read and write bandwidth is 102 GB/s. Having high
bandwidth and lower latency makes ESRAM a really valuable memory resource for
the GPU.</p>
</blockquote>
<blockquote>
<p>ESRAM bandwidth is shared between the following components:</p>
<ul><li>GPU</li>
<li>Move engines</li>
</ul></blockquote>
<h4>Video encode/decode engine. System coherency</h4>
<blockquote>
<p>There are two types of coherency in the Durango memory system:</p>
<ul><li>Fully hardware coherent</li>
<li>I/O coherent</li>
</ul></blockquote>
<blockquote>
<p>The two CPU modules are<i>fully coherent</i>. The term<i>fully
coherent</i>means that the CPUs do not need to explicitly flush in order for
the latest copy of modified data to be available (except when using<i>Write
Combined</i>access).</p>
</blockquote>
<blockquote>
<p>The rest of the Durango infrastructure (the GPU and I/O devices such as,
Audio and the Kinect Sensor) is<i>I/O coherent</i>. The term<i>I/O
coherent</i>means that those clients can access data in the CPU caches, but
that their own caches cannot be probed.</p>
</blockquote>
<blockquote>
<p>When the CPU produces data, other system clients can choose to consume that
data without any extra synchronization work from the CPU.</p>
</blockquote>
<blockquote>
<p>The total coherent bandwidth through the north bridge is limited to about 30
GB/s.</p>
</blockquote>
<blockquote>
<p>The CPU requests do not probe any other non-CPU clients, even if the clients
have caches. (For example, the GPU has its own cache hierarchy, but the GPU is
not probed by the CPU requests.) Therefore, I/O coherent clients must
explicitly flush modified data for any latest-modified copy to become visible
to the CPUs and to the other I/O coherent clients.</p>
</blockquote>
<blockquote>
<p>The GPU can perform both coherent and non-coherent memory access. Coherent
read-bandwidth of the GPU is limited to 30 GB/s when there is a cache miss, and
it’s limited to 10 – 15 GB/s when there is a hit. A GPU memory page attribute
determines the coherency of memory access.</p>
</blockquote>
<h4>The CPU</h4>
<blockquote>
<p>The Durango console has two CPU modules, and each module has its own 2 MB L2
cache. Each module has four cores, and each of the four cores in each module
also has its own 32 KB L1 cache.</p>
</blockquote>
<blockquote>
<p>When a local L2 miss occurs, the Durango console probes the adjacent L2
cache via the north bridge. Since there is no fast path between the two L2
caches, to avoid cache thrashing, it’s important that you maximize the sharing
of data between cores in a module, and that you minimize the sharing between
the two CPU modules.</p>
</blockquote>
<blockquote>
<p>Typical latencies for local and remote cache hits are shown in this
table.</p>
<table cellspacing="0" cellpadding="0" border="1"><tbody><tr><td width="113" valign="top"><b>Remote L2 hit</b></td>
<td width="217" valign="top">approximately 100 cycles</td>
</tr><tr><td width="113" valign="top"><b>Remote L1 hit</b></td>
<td width="217" valign="top">approximately 120 cycles</td>
</tr><tr><td width="113" valign="top"><b>Local L1 Hit</b></td>
<td width="217" valign="top">3 cycles for 64-bit values<br />
5 cycles for 128-bit values</td>
</tr><tr><td width="113" valign="top"><b>Local L2 Hit</b></td>
<td width="217" valign="top">approximately 30 cycles</td>
</tr></tbody></table></blockquote>
<blockquote>
<p>Each of the two CPU modules connects to the north bridge by a bus that can
carry up to 20.8 GB/s in each direction.</p>
</blockquote>
<blockquote>
<p>From a program standpoint, normal x86 ordering applies to both reads and
writes. Stores are strongly ordered (becoming visible in program order with no
explicit memory barriers), and reads are out of order.</p>
</blockquote>
<blockquote>
<p>Keep in mind that if the CPU uses<i>Write Combined</i>memory writes, then a
memory synchronization instruction (SFENCE) must follow to ensure that the
writes are visible to the other client devices.</p>
</blockquote>
<h4>The GPU</h4>
<blockquote>
<p>The GPU can read at 170 GB/s and write at 102 GB/s through multiple
combinations of its clients. Examples of GPU clients are the<i>Color/Depth
Blocks</i>and the<i>GPU L2 cache</i>.</p>
</blockquote>
<blockquote>
<p>The GPU has a direct non-coherent connection to the DRAM memory controller
and to ESRAM. The GPU also has a coherent read/write path to the CPU’s L2
caches and to DRAM.</p>
</blockquote>
<blockquote>
<p>For each read and write request from the GPU, the request uses one path
depending on whether the accessed resource is located in “coherent” or
“non-coherent” memory.</p>
</blockquote>
<blockquote>
<p>Some GPU functions share a lower-bandwidth (25.6 GB/s), bidirectional
read/write path. Those GPU functions include:</p>
<ul><li>Command buffer and vertex index fetch</li>
<li>Move engines</li>
<li>Video encoding/decoding engines</li>
<li>Front buffer scan out</li>
</ul></blockquote>
<blockquote>
<p>As the GPU is I/O coherent, data in the GPU caches must be flushed before
that data is visible to other components of the system.</p>
</blockquote>
<blockquote>
<p>The available bandwidth and requirements of other memory clients limit the
total read and write bandwidth of the GPU.</p>
</blockquote>
<blockquote>
<p>This table shows an example of the maximum memory-bandwidths that the GPU
can attain with different types of memory transfers.</p>
<div align="center">
<table width="508" cellspacing="0" cellpadding="0" border="1"><tbody><tr><td width="74" valign="top"><b>Source memory</b></td>
<td width="94" valign="top"><b>Destination memory</b></td>
<td width="104" valign="top"><b>Maximum read bandwidth (GB/s)</b></td>
<td width="104" valign="top"><b>Maximum write bandwidth (GB/s)</b></td>
<td width="131" valign="top"><b>Maximum total bandwidth (GB/s)</b></td>
</tr><tr><td width="74" valign="top">ESRAM</td>
<td width="94" valign="top">ESRAM</td>
<td width="104" valign="top">51.2</td>
<td width="104" valign="top">51.2</td>
<td width="131" valign="top">102.4</td>
</tr><tr><td width="74" valign="top">ESRAM</td>
<td width="94" valign="top">DRAM</td>
<td width="104" valign="top">68.2*</td>
<td width="104" valign="top">68.2</td>
<td width="131" valign="top">136.4</td>
</tr><tr><td width="74" valign="top">DRAM</td>
<td width="94" valign="top">ESRAM</td>
<td width="104" valign="top">68.2</td>
<td width="104" valign="top">68.2*</td>
<td width="131" valign="top">136.4</td>
</tr><tr><td width="74" valign="top">DRAM</td>
<td width="94" valign="top">DRAM</td>
<td width="104" valign="top">34.1</td>
<td width="104" valign="top">34.1</td>
<td width="131" valign="top">68.2</td>
</tr></tbody></table></div>
</blockquote>
<blockquote>
<p>Although ESRAM has 102.4 GB/s of bandwidth available, in a transfer case,
the DRAM bandwidth limits the speed of the transfer.</p>
</blockquote>
<blockquote>
<p>ESRAM-to-DRAM and DRAM-to-ESRAM scenarios are symmetrical.</p>
</blockquote>
<h4>Move engines</h4>
<blockquote>
<p>The Durango console has 25.6 GB/s of read and 25.6 GB/s of write bandwidth
shared between:</p>
<ul><li>Four move engines</li>
<li>Display scan out and write-back</li>
<li>Video encoding and decoding</li>
</ul></blockquote>
<blockquote>
<p>The<i>display scan out</i>consumes a maximum of 3.9 GB/s of read bandwidth
(multiply 3 display planes × 4 bytes per pixel × HDMI limit of 300 megapixels
per second), and<i>display write-back</i>consumes a maximum of 1.1 GB/s of
write bandwidth (multiply 30 bits per pixel × 300 megapixels per second).</p>
</blockquote>
<blockquote>
<p>You may wonder what happens when the GPU is busy copying data and a move
engine is told to copy data from one type of memory to another. In this
situation, the memory system of the GPU shares bandwidth fairly between source
and destination clients. The maximum bandwidth can be calculated by using the
peak-bandwidth diagram at the start of this article.</p>
</blockquote>
<p> </p>
<blockquote>
<p>If you want to see how all of this works, just<a href="http://www.vgleaks.com/durango-memory-system-example">read the
example</a>we’ve written for all of you.</p>
</blockquote>
<p> </p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br /><div class="copyright-container container">
<div class="copyright-wrapper">
<div class="copyright-left"> </div>
<div class="copyright-right"> </div>
<div class="clear"> </div>
</div>
</div>
</div>
<div>
<blockquote>このページは<a href="http://www.vgleaks.com/durango-memory-system-overview/">http://www.vgleaks.com/durango-memory-system-overview/</a>からの引用です
<div class="gdl-navigation-wrapper"> </div>
</blockquote>
<blockquote>作業中・・・<br /></blockquote>
</div>
<div class="body-wrapper">
<div class="container-wrapper">
<div class="content-wrapper main container">
<div class="page-wrapper single-blog single-sidebar left-sidebar">
<div class="row">
<div class="gdl-page-left twelve columns">
<div class="row">
<div class="gdl-page-item mb20 gdl-blog-full eight columns">
<h1 class="blog-title"><a href="http://www.vgleaks.com/durango-memory-system-overview/">Durango Memory System
Overview</a></h1>
<div class="blog-content-wrapper">
<blockquote>
<div class="blog-content">We have read multiples replies and discussions
around<a title="Durango" href="http://www.vgleaks.com/durango/">Durango</a>’s
memory system throughout the internet, due to we would like to share this
information with all of you. In this article we expose the different types of
memories that Durango has and how this memories work together with the rest of
the system.</div>
</blockquote>
(前半省略)この記事で我々はDurangoが持つ異なるタイプのメモリーについてと、これらのメモリーがシステムの他の部分といかに協調して動作するかについて公開する。
<blockquote>
<p>The central elements of the Durango memory system are the<i>north
bridge</i>and the<i>GPU memory system</i>. The memory system supports multiple
clients (for example, the CPU and the GPU), coherent and non-coherent memory
access, and two types of memory (DRAM and ESRAM).</p>
</blockquote>
Durangoのメモリーシステムの中心的な要素はNorth
BridgeとGPUメモリーシステムだ。メモリーシステムは複数クライアント(例えばCPUとGPU)対応や、コヒーレント/ノンコヒーレントメモリーアクセス、2つのタイプのメモリー(DRAMとESRAM)をサポートする。
<h4>Memory clients</h4>
<blockquote>
<p>The following diagram shows you the Durango memory clients with the maximum
available bandwidth in every path.</p>
</blockquote>
次の図はDurangoのメモリークライアントとそれぞれのパスにおける最大利用可能帯域を示している。
<h4><img width="474" height="521" class="size-full wp-image-2208 aligncenter" alt="durango_memory" src="http://www.vgleaks.com/wp-content/uploads/2013/03/durango_memory.jpg" /></h4>
<h4>Memory</h4>
<blockquote>
<p>As you can see on the right side of the diagram, the Durango console
has:</p>
<ul><li>8 GB of DRAM.</li>
<li>32 MB of ESRAM.</li>
</ul></blockquote>
図の右側にあるとおり、Drangoコンソールは次のメモリーを持つ。
<ul><li>DRAM: 8 GB</li>
<li>ESRAM: 32 MB</li>
</ul><h4>DRAM</h4>
<blockquote>
<p>The maximum combined read and write bandwidth to DRAM is 68 GB/s (gigabytes
per second). In other words, the sum of read and write bandwidth to DRAM cannot
exceed 68 GB/s. You can realistically expect that about 80 – 85% of that
bandwidth will be achievable (54.4 GB/s – 57.8 GB/s).</p>
</blockquote>
DRAMへの最大read/write帯域は68GB/sだ。言い換えるとDRAMへのreadとwriteの帯域の合計は68GB/sを超えることはできない。その帯域の80~85%ぐらいが現実的に達成可能だろう。
<blockquote>
<p>DRAM bandwidth is shared between the following components:</p>
<ul><li>CPU</li>
<li>GPU</li>
<li>Display scan out</li>
<li>Move<a title="engines" href="http://www.vgleaks.com/engines/">engines</a></li>
<li>Audio system</li>
</ul></blockquote>
DRAM帯域は次のコンポーネントで共有される。
<ul><li>CPU</li>
<li>GPU</li>
<li>ディスプレイ出力</li>
<li>Move<a title="engines" href="http://www.vgleaks.com/engines/">engines</a></li>
<li>オーディオシステム</li>
</ul><h4>ESRAM</h4>
<blockquote>
<p>The maximum combined ESRAM read and write bandwidth is 102 GB/s. Having high
bandwidth and lower latency makes ESRAM a really valuable memory resource for
the GPU.</p>
</blockquote>
ESRAMの最大read/write帯域は102GB/sだ。高い帯域と低いレンテンシのためにGPUにとってESRAMはとても利用価値の高いメモリーだ。
<blockquote>
<p>ESRAM bandwidth is shared between the following components:</p>
<ul><li>GPU</li>
<li>Move engines</li>
</ul></blockquote>
ESRAM帯域は次のコンポーネントで共有される。
<ul><li>GPU</li>
<li>Move engines</li>
</ul><h4>Video encode/decode engine. System coherency</h4>
<blockquote>
<p>There are two types of coherency in the Durango memory system:</p>
<ul><li>Fully hardware coherent</li>
<li>I/O coherent</li>
</ul></blockquote>
Durangoメモリーシステムには2種類のコヒーレンシがある。
<ul><li>完全なハードウェアコヒーレント</li>
<li>I/O コヒーレント</li>
</ul><blockquote>
<p>The two CPU modules are<i>fully coherent</i>. The term<i>fully
coherent</i>means that the CPUs do not need to explicitly flush in order for
the latest copy of modified data to be available (except when using<i>Write
Combined</i>access).</p>
</blockquote>
二つのCPUモジュールは完全にコヒーレントだ。「完全にコヒーレント」とはCPUは明示的に順番にフラッシュする必要はないということを意味する。(ただしWrite
Combinedアクセスを使う場合は例外)
<blockquote>
<p>The rest of the Durango infrastructure (the GPU and I/O devices such as,
Audio and the Kinect Sensor) is<i>I/O coherent</i>. The term<i>I/O
coherent</i>means that those clients can access data in the CPU caches, but
that their own caches cannot be probed.</p>
</blockquote>
残りのDurangoのインフラ(GPUとオーディオやKinectセンサーのようなI/Oデバイス)はI/Oコヒーレントだ。「I/Oコヒーレント」とはこれらのクライアントはCPUキャッシュのデータにアクセスすることができるが、それらの自前のキャッシュはプルーブされない、ということだ。
<blockquote>
<p>When the CPU produces data, other system clients can choose to consume that
data without any extra synchronization work from the CPU.</p>
</blockquote>
CPUがデータを生成した場合、他のシステムクライアントはCPUとの同期処理なしにそのデータを使えるかどうかを選択できる。
<blockquote>
<p>The total coherent bandwidth through the north bridge is limited to about 30
GB/s.</p>
</blockquote>
North Bridgeを通るコヒーレント帯域の合計は約30GB/sに制限されている。
<blockquote>
<p>The CPU requests do not probe any other non-CPU clients, even if the clients
have caches. (For example, the GPU has its own cache hierarchy, but the GPU is
not probed by the CPU requests.) Therefore, I/O coherent clients must
explicitly flush modified data for any latest-modified copy to become visible
to the CPUs and to the other I/O coherent clients.</p>
</blockquote>
CPUリクエストはCPU以外のクライアントをプルーブしない。たとえクライアントがキャッシュを持っていたとしてもだ。(例えばGPUは自分のキャッシュ階層を持っているが、それらはプルーブされない。)
その結果 I/Oコヒーレントクライアントは更新されたデータを明示的にフラッシュしなければならない。
<blockquote>
<p>The GPU can perform both coherent and non-coherent memory access. Coherent
read-bandwidth of the GPU is limited to 30 GB/s when there is a cache miss, and
it’s limited to 10 – 15 GB/s when there is a hit. A GPU memory page attribute
determines the coherency of memory access.</p>
</blockquote>
GPUはコヒーレント・ノンコヒーレントの両方のメモリーアクセスを行える。GPUのコヒーレントread帯域はキャッシュミスした場合は30GB/s、キャッシュヒットした場合は15GB/sに制限されている。GPUメモリーページ属性によってメモリーアクセスのコヒーレンシーを設定される。
<h4>The CPU</h4>
<blockquote>
<p>The Durango console has two CPU modules, and each module has its own 2 MB L2
cache. Each module has four cores, and each of the four cores in each module
also has its own 32 KB L1 cache.</p>
</blockquote>
<blockquote>
<p>When a local L2 miss occurs, the Durango console probes the adjacent L2
cache via the north bridge. Since there is no fast path between the two L2
caches, to avoid cache thrashing, it’s important that you maximize the sharing
of data between cores in a module, and that you minimize the sharing between
the two CPU modules.</p>
</blockquote>
<blockquote>
<p>Typical latencies for local and remote cache hits are shown in this
table.</p>
<table cellspacing="0" cellpadding="0" border="1"><tbody><tr><td width="113" valign="top"><b>Remote L2 hit</b></td>
<td width="217" valign="top">approximately 100 cycles</td>
</tr><tr><td width="113" valign="top"><b>Remote L1 hit</b></td>
<td width="217" valign="top">approximately 120 cycles</td>
</tr><tr><td width="113" valign="top"><b>Local L1 Hit</b></td>
<td width="217" valign="top">3 cycles for 64-bit values<br />
5 cycles for 128-bit values</td>
</tr><tr><td width="113" valign="top"><b>Local L2 Hit</b></td>
<td width="217" valign="top">approximately 30 cycles</td>
</tr></tbody></table></blockquote>
<blockquote>
<p>Each of the two CPU modules connects to the north bridge by a bus that can
carry up to 20.8 GB/s in each direction.</p>
</blockquote>
<blockquote>
<p>From a program standpoint, normal x86 ordering applies to both reads and
writes. Stores are strongly ordered (becoming visible in program order with no
explicit memory barriers), and reads are out of order.</p>
</blockquote>
<blockquote>
<p>Keep in mind that if the CPU uses<i>Write Combined</i>memory writes, then a
memory synchronization instruction (SFENCE) must follow to ensure that the
writes are visible to the other client devices.</p>
</blockquote>
<h4>The GPU</h4>
<blockquote>
<p>The GPU can read at 170 GB/s and write at 102 GB/s through multiple
combinations of its clients. Examples of GPU clients are the<i>Color/Depth
Blocks</i>and the<i>GPU L2 cache</i>.</p>
</blockquote>
<blockquote>
<p>The GPU has a direct non-coherent connection to the DRAM memory controller
and to ESRAM. The GPU also has a coherent read/write path to the CPU’s L2
caches and to DRAM.</p>
</blockquote>
<blockquote>
<p>For each read and write request from the GPU, the request uses one path
depending on whether the accessed resource is located in “coherent” or
“non-coherent” memory.</p>
</blockquote>
<blockquote>
<p>Some GPU functions share a lower-bandwidth (25.6 GB/s), bidirectional
read/write path. Those GPU functions include:</p>
<ul><li>Command buffer and vertex index fetch</li>
<li>Move engines</li>
<li>Video encoding/decoding engines</li>
<li>Front buffer scan out</li>
</ul></blockquote>
<blockquote>
<p>As the GPU is I/O coherent, data in the GPU caches must be flushed before
that data is visible to other components of the system.</p>
</blockquote>
<blockquote>
<p>The available bandwidth and requirements of other memory clients limit the
total read and write bandwidth of the GPU.</p>
</blockquote>
<blockquote>
<p>This table shows an example of the maximum memory-bandwidths that the GPU
can attain with different types of memory transfers.</p>
<div align="center">
<table width="508" cellspacing="0" cellpadding="0" border="1"><tbody><tr><td width="74" valign="top"><b>Source memory</b></td>
<td width="94" valign="top"><b>Destination memory</b></td>
<td width="104" valign="top"><b>Maximum read bandwidth (GB/s)</b></td>
<td width="104" valign="top"><b>Maximum write bandwidth (GB/s)</b></td>
<td width="131" valign="top"><b>Maximum total bandwidth (GB/s)</b></td>
</tr><tr><td width="74" valign="top">ESRAM</td>
<td width="94" valign="top">ESRAM</td>
<td width="104" valign="top">51.2</td>
<td width="104" valign="top">51.2</td>
<td width="131" valign="top">102.4</td>
</tr><tr><td width="74" valign="top">ESRAM</td>
<td width="94" valign="top">DRAM</td>
<td width="104" valign="top">68.2*</td>
<td width="104" valign="top">68.2</td>
<td width="131" valign="top">136.4</td>
</tr><tr><td width="74" valign="top">DRAM</td>
<td width="94" valign="top">ESRAM</td>
<td width="104" valign="top">68.2</td>
<td width="104" valign="top">68.2*</td>
<td width="131" valign="top">136.4</td>
</tr><tr><td width="74" valign="top">DRAM</td>
<td width="94" valign="top">DRAM</td>
<td width="104" valign="top">34.1</td>
<td width="104" valign="top">34.1</td>
<td width="131" valign="top">68.2</td>
</tr></tbody></table></div>
</blockquote>
<blockquote>
<p>Although ESRAM has 102.4 GB/s of bandwidth available, in a transfer case,
the DRAM bandwidth limits the speed of the transfer.</p>
</blockquote>
<blockquote>
<p>ESRAM-to-DRAM and DRAM-to-ESRAM scenarios are symmetrical.</p>
</blockquote>
<h4>Move engines</h4>
<blockquote>
<p>The Durango console has 25.6 GB/s of read and 25.6 GB/s of write bandwidth
shared between:</p>
<ul><li>Four move engines</li>
<li>Display scan out and write-back</li>
<li>Video encoding and decoding</li>
</ul></blockquote>
<blockquote>
<p>The<i>display scan out</i>consumes a maximum of 3.9 GB/s of read bandwidth
(multiply 3 display planes × 4 bytes per pixel × HDMI limit of 300 megapixels
per second), and<i>display write-back</i>consumes a maximum of 1.1 GB/s of
write bandwidth (multiply 30 bits per pixel × 300 megapixels per second).</p>
</blockquote>
<blockquote>
<p>You may wonder what happens when the GPU is busy copying data and a move
engine is told to copy data from one type of memory to another. In this
situation, the memory system of the GPU shares bandwidth fairly between source
and destination clients. The maximum bandwidth can be calculated by using the
peak-bandwidth diagram at the start of this article.</p>
</blockquote>
<p> </p>
<blockquote>
<p>If you want to see how all of this works, just<a href="http://www.vgleaks.com/durango-memory-system-example">read the
example</a>we’ve written for all of you.</p>
</blockquote>
<p> </p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br /><div class="copyright-container container">
<div class="copyright-wrapper">
<div class="copyright-left"> </div>
<div class="copyright-right"> </div>
<div class="clear"> </div>
</div>
</div>
</div>