[翻訳] PlayStation 4 includes hUMA technology

「[翻訳] PlayStation 4 includes hUMA technology」の編集履歴(バックアップ)一覧はこちら

[翻訳] PlayStation 4 includes hUMA technology」(2015/05/17 (日) 12:11:44) の最新版変更点

追加された行は緑色になります。

削除された行は赤色になります。

<div> <blockquote>このページは<a href="http://www.vgleaks.com/playstation-4-includes-huma-technology">http://www.vgleaks.com/playstation-4-includes-huma-technology</a>からの引用です</blockquote> </div> <div class="body-wrapper"> <div class="container-wrapper"> <div class="content-wrapper main container"> <div class="page-wrapper single-blog single-sidebar left-sidebar"> <div class="row"> <div class="gdl-page-left twelve columns"> <div class="row"> <div class="gdl-page-item mb20 gdl-blog-full eight columns"> <h1 class="blog-title"> </h1> <blockquote> <h1 class="blog-title"><a href="http://www.vgleaks.com/playstation-4-includes-huma-technology">PlayStation 4 includes hUMA technology</a></h1> </blockquote> <div class="blog-content-wrapper"> <div class="blog-content">  <blockquote> <p>There has been a lot of controversy about this matter in the last days, but we will try to clarify that <strong>Playstation 4</strong> supports <strong><a class="st_tag internal_tag" href="http://www.vgleaks.com/tag/huma" title="Posts tagged with hUMA">hUMA</a> technology</strong>or at least it implements a first revision of it. We have to remember that<strong><a class="st_tag internal_tag" href="http://www.vgleaks.com/tag/amd" title="Posts tagged with AMD">AMD</a> haven’t released products with hUMA technology yet</strong>, so it is difficult to compare with something in the market. Besides, no finished specifications are settled yet, therefore PS4 implementation may differ a bit with finished hUMA implementations.</p> </blockquote> <p> 昨今この話題についてたくさんの論争があるが、我々はPlaystation4がhUMAをサポートしていることについて明確にしてみる。我々はAMDがhUMAを未だhUMAを搭載した製品をリリースしていないことを知っているが、そのため市場にある何かと比較することは難しい。それに加えて、hUMAの最終仕様は固まっていない。そのため、PS4の実装はhUMA仕様とは少し異なるかもしれない。</p> <blockquote> <p>But first of all,<strong>what is hUMA</strong>? hUMA is the acronym for<strong>Heterogeneous Uniform Memory Access</strong>. In the case of hUMA both processors no longer distinguish between the CPU and GPU memory areas. Maybe this picture could explain the concept in a easy way:</p> </blockquote> <p>しかし、まず最初にhUMAとはなんだろうか?hUMAとはHeterogeneous Uniform Memory Accessの頭文字をとったものだ。hUMAではCPUとGPUのメモリーエリアをもはや区別しない。次の図が簡潔にこのコンセプトを説明しているだろう。</p> <p><img alt="huma" class="aligncenter size-large wp-image-5453" height="532" src="http://cdn.vgleaks.netdna-cdn.com/wp-content/uploads/2013/08/huma-1024x568.jpeg" width="960" /></p> <blockquote> <p>If you want to learn more about this tech, this <a href="http://mygaming.co.za/news/hardware/53750-amds-plan-for-the-future-huma-fully-detailed.html" target="_blank">article</a> explains how hUMA works.</p> </blockquote> <p>もしこの技術についてもっと知りたいならば、この<a href="http://mygaming.co.za/news/hardware/53750-amds-plan-for-the-future-huma-fully-detailed.html">記事</a>がhUMAがいかに動作するか説明している。</p> <blockquote> <p>PS4 has <strong>enhancements</strong> in the memory architecture that no other “retail” product has, as <strong>Mark Cerny</strong> pointed in different interviews. We will try to show the new parts in PS4 components in the next pages.</p> </blockquote> <p> マーク・サーニーが様々なインタビューで説明した通り、PS4はメモリーアーキテクチャに手を加えられており、これは他の市販の製品にはないものだ。PS4のこの新しいパーツについて次のページで説明してみよう。</p> <blockquote> <p>We need to put our diagram about PS4 memory architecture to explain how it works.</p> </blockquote> <p>次のPS4メモリーアーキテクチャの図がこれがどのように動くかを説明している。</p> <p> </p> <p><img alt="lvp2" class="aligncenter size-large wp-image-2572" height="606" src="http://cdn.vgleaks.netdna-cdn.com/wp-content/uploads/2013/03/lvp2-1024x647.jpg" width="960" /></p> <blockquote> <p><strong>Mapping of memory in Liverpool</strong></p> <p>–   Adresses are 40 bit. This size allows pages of memory mapped on both CPU and GPU to have the same virtual address</p> <p>–   Pages of memory are freely set up by the<a href="http://www.vgleaks.com/tag/application">application</a></p> <p>–   Pages of memory do not need to be both mapped on CPU and GPU</p> <ul><li>If only the CPU will use, the GPU does not need to have it mapped</li> <li>If only the GPU will use, it will access via Garlic</li> </ul><p> </p> <p>–   If both the CPU and GPU will access the memory page, a determination needs to be made whether the GPU should access  it via Onion or Garlic</p> <ul><li>If the GPU needs very high bandwidth , the page should be accessed via Garlic; the CPU will need to access it as uncached memory</li> <li>If the CPU needs  frequent access to the page,  it should be mapped as cached memory on the CPU; the GPU will need access it via Onion.</li> </ul></blockquote> <p><strong>Liverpoolのメモリーマッピング</strong><br /> – アドレスサイズは40bitで、CPUとGPUのメモリーマップページに同じ仮想アドレスを持たせることができる</p> <p>– メモリーページはアプリケーションによって自由に設定される</p> <p>– メモリーページは必ずしもCPUとGPUの両方にマップされる必要はない</p> <ul><li>もしCPUだけが使うのであれば、GPUにはマップする必要はない</li> <li>もしGPUだけが使うのであれば、Garlicアクセスを使用する</li> </ul><p>– もしCPUとGPUがメモリーページにアクセスするのであれば、GPUがOnionかGarlicのどちらを使うかを決めなければならない</p> <ul><li> もしGPUが広帯域が必要であれば、ページはGarlicでアクセスすべきである。そのときCPUはアンキャッシュメモリとしてアクセスする必要がある</li> <li> もしCPUが頻繁にそのページにアクセスする必要があるのであれば、キャッシュメモリーとしてマップする必要がある。そのときGPUはOnionを使う。</li> </ul>   <p><strong>Five Type of Buffers</strong></p> <p>–   System memory buffers that the GPU uses are tagged as one of five memory types</p> <p>–   These first three types have very limited CPU access; primary access is by the GPU</p> <p>–   Read Only (RO)</p> <ul><li>A “RO” buffer is memory that is read by CU’s but never written to them, e.g a texture or vertex table</li> <li>Access to RO buffers can never cause L1 caches to lose coherency with each other, as it is<span style="text-decoration:underline;">write</span>operations that cause coherency problems.</li> </ul><p>–   Private (PV)</p> <ul><li>A “PV” buffer is private memory read from and written to by a single threadgroup, e.g. a  scratch buffer.</li> <li>Access to PV buffers can never cause L1 caches to lose coherency, because it is writes to<span style="text-decoration:underline;">shared</span>memory areas that cause the problems</li> </ul><p> </p> <p>–   GPU coherent (GC)</p> <ul><li>A “GC” buffer is memory read from and written to by the CU’s as a result of draw calls or dispatches, e.g. outputs from vertex/shaders that are later read by geometry shaders. Depth buffers and render targets are not GC memory as they are not written to by the CU, but by dedicated hardware in the DBs and CBs.</li> <li>As writes are permitted to GC buffers, access to them can cause L1 caches to lose coherency with each other</li> </ul><p> </p> <p>–   The last two types are accessible by both CPU and GPU</p> <p>–   System coherent (SC)</p> <ul><li>A “SC” buffer is memory read from and written to by both CPU and GPU, e.g. CPU structure GPU reads, or structures used for CPU-GPU communication</li> <li>SC buffers present the largest coherency issues. Not only can L1 caches lose coherency with other, but both L1 and L2 can lose coherency with system memory and the CPU caches.</li> </ul><p> </p> <p>–   Uncached (UC)</p> <ul><li>A “UC” buffer is memory that is read from and written to by both CPU and GPU, just as the SC was</li> <li>UC buffers are never cached in the GPU L1 or L2, so they present no coherency issues</li> <li>UC accesses use the new Onion+ bus, a limited bandwidth bus similar to the Onion bus</li> <li>UC accesses may have significant inefficiencies due to repeated reads of the same line, or incremental updates of lines</li> </ul><p> </p> <p>–   The first three types (RO, PV, GC) may also be accessed by the CPU, but care must be taken. For example, when copying a texture to a new location</p> <ul><li>The CPU can write the texture data in an uncached fashion, then manually flush the GPU caches. The GPU can then subsequently access the texture as RO memory through Garlic at high speed</li> <li>Two dangers are avoided here. As the CPU worte the texture data using uncached writes, no data remains in the CPU caches and the GPU is free to use Garlic rather than Onion. As the CPU flushed the GPU caches after the texture setup, there is no possibility of stale data in the GPU L1 and L2.</li> </ul><p> </p> <p><strong>Tracking of Type in Memory Accesses</strong></p> <p>–   Memory accesses are made via V# and T# definitions that contain the base address and other parameters of the buffer or texture</p> <p>–   Three bits have been added to V# and T# to specify the memory type</p> <p>–   And extra bit has been added to the L1 tags</p> <ul><li>It is set if the line was loaded from either GC or SC memory (as opposed to RO or PV memory)</li> <li>A new type of packet-based L1 invalidate has been added that only invalidates the GC and SC lines</li> <li>A simple strategy is for application code to use this invalidate before any draw call or dispatch that accesses GC or SC buffers</li> </ul><p> </p> <p>–   An extra bit has been added to the L2 tags</p> <ul><li>It indicates if the line was loaded from SC memory</li> <li>A new L2 invalidate of just the SC lines has been added</li> <li>A new L2 writeback of just the SC lines has been added. These both are packet-based.</li> <li>A simple strategy is for application code to use the L2 invalidate before any draw call or dispatch that uses SC buffers, and use the L2 writeback after any draw call or dispatch that uses SC buffers</li> <li>The combination of these features allows for efficient acquisition and release of buffers by draw calls and dispatches</li> </ul><p> </p> <p><strong>Simple Example:</strong></p> <p>–   Let’s take the case where most of the GPU is being used for graphics (vertex shaders, pixel shaders and so on)</p> <p>–   Additionally, let’s say that we have an asynchronous compute dispatch that uses a buffer SC memory for:</p> <ul><li>Dispatch inputs, with are created by the CPU and read by the GPU</li> <li>Dispatch outputs, which are created by the GPU and read by the CPU</li> </ul><p> </p> <p>–   The GPU can:</p> <p>1)      Acquire the SC buffer by performing an L1 invalidate (GC and SC) and an L2 invalidate (SC lines only). This eliminates the possibility of stale data in the caches. Any SC address encountered will properly go offchip (to either system memory or CPU caches) to fetch the data.</p> <p>2)      Run the compute shader</p> <p>3)      Release the SC buffer by performing an L2 writeback (SC lines only). This writes all dirty bytes back to system memory where the CPU can see them</p> <p>–   The graphics processing is much less impacted by this strategy</p> <ul><li>On the R10xx, the complete L2 was flushed, so any data in use by the graphics shaders (e.g. the current textures) would need to be reloaded</li> <li>On Liverpool, that RO data stays in place – as does PV and GC data</li> </ul><p><b> </b></p> <p>This technical information can be a bit overwhelming and confuse, therefore<strong>we will disclose more information and examples of use of this architecture in a new article this week</strong>.</p> <p> </p> ¥</div> </div> </div> </div> </div> </div> </div> </div> </div> </div>
<div> <blockquote>このページは<a href="http://www.vgleaks.com/playstation-4-includes-huma-technology">http://www.vgleaks.com/playstation-4-includes-huma-technology</a>からの引用です</blockquote> </div> <div class="body-wrapper"> <div class="container-wrapper"> <div class="content-wrapper main container"> <div class="page-wrapper single-blog single-sidebar left-sidebar"> <div class="row"> <div class="gdl-page-left twelve columns"> <div class="row"> <div class="gdl-page-item mb20 gdl-blog-full eight columns"> <h1 class="blog-title"> </h1> <blockquote> <h1 class="blog-title"><a href="http://www.vgleaks.com/playstation-4-includes-huma-technology">PlayStation 4 includes hUMA technology</a></h1> </blockquote> <div class="blog-content-wrapper"> <div class="blog-content">  <blockquote> <p>There has been a lot of controversy about this matter in the last days, but we will try to clarify that<strong>Playstation 4</strong>supports<strong><a class="st_tag internal_tag" href="http://www.vgleaks.com/tag/huma" title="Posts tagged with hUMA">hUMA</a>technology</strong>or at least it implements a first revision of it. We have to remember that<strong><a class="st_tag internal_tag" href="http://www.vgleaks.com/tag/amd" title="Posts tagged with AMD">AMD</a>haven’t released products with hUMA technology yet</strong>, so it is difficult to compare with something in the market. Besides, no finished specifications are settled yet, therefore PS4 implementation may differ a bit with finished hUMA implementations.</p> </blockquote> <p> 昨今この話題についてたくさんの論争があるが、我々はPlaystation4がhUMAをサポートしていることについて明確にしてみる。我々はAMDがhUMAを未だhUMAを搭載した製品をリリースしていないことを知っているが、そのため市場にある何かと比較することは難しい。それに加えて、hUMAの最終仕様は固まっていない。そのため、PS4の実装はhUMA仕様とは少し異なるかもしれない。</p> <blockquote> <p>But first of all,<strong>what is hUMA</strong>? hUMA is the acronym for<strong>Heterogeneous Uniform Memory Access</strong>. In the case of hUMA both processors no longer distinguish between the CPU and GPU memory areas. Maybe this picture could explain the concept in a easy way:</p> </blockquote> <p>しかし、まず最初にhUMAとはなんだろうか?hUMAとはHeterogeneous Uniform Memory Accessの頭文字をとったものだ。hUMAではCPUとGPUのメモリーエリアをもはや区別しない。次の図が簡潔にこのコンセプトを説明しているだろう。</p> <p><img alt="huma" class="aligncenter size-large wp-image-5453" height="532" src="http://cdn.vgleaks.netdna-cdn.com/wp-content/uploads/2013/08/huma-1024x568.jpeg" width="960" /></p> <blockquote> <p>If you want to learn more about this tech, this<a href="http://mygaming.co.za/news/hardware/53750-amds-plan-for-the-future-huma-fully-detailed.html" target="_blank">article</a>explains how hUMA works.</p> </blockquote> <p>もしこの技術についてもっと知りたいならば、この<a href="http://mygaming.co.za/news/hardware/53750-amds-plan-for-the-future-huma-fully-detailed.html">記事</a>がhUMAがいかに動作するか説明している。</p> <blockquote> <p>PS4 has<strong>enhancements</strong>in the memory architecture that no other “retail” product has, as<strong>Mark Cerny</strong>pointed in different interviews. We will try to show the new parts in PS4 components in the next pages.</p> </blockquote> <p> マーク・サーニーが様々なインタビューで説明した通り、PS4はメモリーアーキテクチャに手を加えられており、これは他の市販の製品にはないものだ。PS4のこの新しいパーツについて次のページで説明してみよう。</p> <blockquote> <p>We need to put our diagram about PS4 memory architecture to explain how it works.</p> </blockquote> <p>次のPS4メモリーアーキテクチャの図がこれがどのように動くかを説明している。</p> <p> </p> <p><img alt="lvp2" class="aligncenter size-large wp-image-2572" height="606" src="http://cdn.vgleaks.netdna-cdn.com/wp-content/uploads/2013/03/lvp2-1024x647.jpg" width="960" /></p> <blockquote> <p><strong>Mapping of memory in Liverpool</strong></p> <p>–   Adresses are 40 bit. This size allows pages of memory mapped on both CPU and GPU to have the same virtual address</p> <p>–   Pages of memory are freely set up by the<a href="http://www.vgleaks.com/tag/application">application</a></p> <p>–   Pages of memory do not need to be both mapped on CPU and GPU</p> <ul><li>If only the CPU will use, the GPU does not need to have it mapped</li> <li>If only the GPU will use, it will access via Garlic</li> </ul><p> </p> <p>–   If both the CPU and GPU will access the memory page, a determination needs to be made whether the GPU should access  it via Onion or Garlic</p> <ul><li>If the GPU needs very high bandwidth , the page should be accessed via Garlic; the CPU will need to access it as uncached memory</li> <li>If the CPU needs  frequent access to the page,  it should be mapped as cached memory on the CPU; the GPU will need access it via Onion.</li> </ul></blockquote> <p><strong>Liverpoolのメモリーマッピング</strong><br /> – アドレスサイズは40bitで、CPUとGPUのメモリーマップページに同じ仮想アドレスを持たせることができる</p> <p>– メモリーページはアプリケーションによって自由に設定される</p> <p>– メモリーページは必ずしもCPUとGPUの両方にマップされる必要はない</p> <ul><li>もしCPUだけが使うのであれば、GPUにはマップする必要はない</li> <li>もしGPUだけが使うのであれば、Garlicアクセスを使用する</li> </ul><p>– もしCPUとGPUがメモリーページにアクセスするのであれば、GPUがOnionかGarlicのどちらを使うかを決めなければならない</p> <ul><li> もしGPUが広帯域が必要であれば、ページはGarlicでアクセスすべきである。そのときCPUはアンキャッシュメモリとしてアクセスする必要がある</li> <li> もしCPUが頻繁にそのページにアクセスする必要があるのであれば、キャッシュメモリーとしてマップする必要がある。そのときGPUはOnionを使う。</li> </ul>   <p><strong>Five Type of Buffers</strong></p> <p>–   System memory buffers that the GPU uses are tagged as one of five memory types</p> <p>–   These first three types have very limited CPU access; primary access is by the GPU</p> <p>–   Read Only (RO)</p> <ul><li>A “RO” buffer is memory that is read by CU’s but never written to them, e.g a texture or vertex table</li> <li>Access to RO buffers can never cause L1 caches to lose coherency with each other, as it is<span style="text-decoration:underline;">write</span>operations that cause coherency problems.</li> </ul><p>–   Private (PV)</p> <ul><li>A “PV” buffer is private memory read from and written to by a single threadgroup, e.g. a  scratch buffer.</li> <li>Access to PV buffers can never cause L1 caches to lose coherency, because it is writes to<span style="text-decoration:underline;">shared</span>memory areas that cause the problems</li> </ul><p> </p> <p>–   GPU coherent (GC)</p> <ul><li>A “GC” buffer is memory read from and written to by the CU’s as a result of draw calls or dispatches, e.g. outputs from vertex/shaders that are later read by geometry shaders. Depth buffers and render targets are not GC memory as they are not written to by the CU, but by dedicated hardware in the DBs and CBs.</li> <li>As writes are permitted to GC buffers, access to them can cause L1 caches to lose coherency with each other</li> </ul><p> </p> <p>–   The last two types are accessible by both CPU and GPU</p> <p>–   System coherent (SC)</p> <ul><li>A “SC” buffer is memory read from and written to by both CPU and GPU, e.g. CPU structure GPU reads, or structures used for CPU-GPU communication</li> <li>SC buffers present the largest coherency issues. Not only can L1 caches lose coherency with other, but both L1 and L2 can lose coherency with system memory and the CPU caches.</li> </ul><p> </p> <p>–   Uncached (UC)</p> <ul><li>A “UC” buffer is memory that is read from and written to by both CPU and GPU, just as the SC was</li> <li>UC buffers are never cached in the GPU L1 or L2, so they present no coherency issues</li> <li>UC accesses use the new Onion+ bus, a limited bandwidth bus similar to the Onion bus</li> <li>UC accesses may have significant inefficiencies due to repeated reads of the same line, or incremental updates of lines</li> </ul><p> </p> <p>–   The first three types (RO, PV, GC) may also be accessed by the CPU, but care must be taken. For example, when copying a texture to a new location</p> <ul><li>The CPU can write the texture data in an uncached fashion, then manually flush the GPU caches. The GPU can then subsequently access the texture as RO memory through Garlic at high speed</li> <li>Two dangers are avoided here. As the CPU worte the texture data using uncached writes, no data remains in the CPU caches and the GPU is free to use Garlic rather than Onion. As the CPU flushed the GPU caches after the texture setup, there is no possibility of stale data in the GPU L1 and L2.</li> </ul><p> </p> <p><strong>Tracking of Type in Memory Accesses</strong></p> <p>–   Memory accesses are made via V# and T# definitions that contain the base address and other parameters of the buffer or texture</p> <p>–   Three bits have been added to V# and T# to specify the memory type</p> <p>–   And extra bit has been added to the L1 tags</p> <ul><li>It is set if the line was loaded from either GC or SC memory (as opposed to RO or PV memory)</li> <li>A new type of packet-based L1 invalidate has been added that only invalidates the GC and SC lines</li> <li>A simple strategy is for application code to use this invalidate before any draw call or dispatch that accesses GC or SC buffers</li> </ul><p> </p> <p>–   An extra bit has been added to the L2 tags</p> <ul><li>It indicates if the line was loaded from SC memory</li> <li>A new L2 invalidate of just the SC lines has been added</li> <li>A new L2 writeback of just the SC lines has been added. These both are packet-based.</li> <li>A simple strategy is for application code to use the L2 invalidate before any draw call or dispatch that uses SC buffers, and use the L2 writeback after any draw call or dispatch that uses SC buffers</li> <li>The combination of these features allows for efficient acquisition and release of buffers by draw calls and dispatches</li> </ul><p> </p> <p><strong>Simple Example:</strong></p> <p>–   Let’s take the case where most of the GPU is being used for graphics (vertex shaders, pixel shaders and so on)</p> <p>–   Additionally, let’s say that we have an asynchronous compute dispatch that uses a buffer SC memory for:</p> <ul><li>Dispatch inputs, with are created by the CPU and read by the GPU</li> <li>Dispatch outputs, which are created by the GPU and read by the CPU</li> </ul><p> </p> <p>–   The GPU can:</p> <p>1)      Acquire the SC buffer by performing an L1 invalidate (GC and SC) and an L2 invalidate (SC lines only). This eliminates the possibility of stale data in the caches. Any SC address encountered will properly go offchip (to either system memory or CPU caches) to fetch the data.</p> <p>2)      Run the compute shader</p> <p>3)      Release the SC buffer by performing an L2 writeback (SC lines only). This writes all dirty bytes back to system memory where the CPU can see them</p> <p>–   The graphics processing is much less impacted by this strategy</p> <ul><li>On the R10xx, the complete L2 was flushed, so any data in use by the graphics shaders (e.g. the current textures) would need to be reloaded</li> <li>On Liverpool, that RO data stays in place – as does PV and GC data</li> </ul><p><b> </b></p> <p>This technical information can be a bit overwhelming and confuse, therefore<strong>we will disclose more information and examples of use of this architecture in a new article this week</strong>.</p> <p> </p> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div>

表示オプション

横に並べて表示:
変化行の前後のみ表示:
ツールボックス

下から選んでください:

新しいページを作成する
ヘルプ / FAQ もご覧ください。