{"id":2753,"date":"2013-07-16T16:09:32","date_gmt":"2013-07-16T07:09:32","guid":{"rendered":"http:\/\/peta.okechan.net\/blog\/?p=2753"},"modified":"2013-07-17T13:37:32","modified_gmt":"2013-07-17T04:37:32","slug":"geforce-gt-520-%e3%81%a8-gt-640%ef%bc%88gk208%ef%bc%89-%e3%81%a7cuda%e3%81%ae%e3%82%b5%e3%83%b3%e3%83%97%e3%83%ab%e3%82%92%e5%b9%be%e3%81%a4%e3%81%8b%e5%ae%9f%e8%a1%8c%e3%81%97%e3%81%a6%e3%81%bf","status":"publish","type":"post","link":"https:\/\/peta.okechan.net\/blog\/archives\/2753","title":{"rendered":"GeForce GT 520 \u3068 GT 640(GK208) \u3067CUDA\u306e\u30b5\u30f3\u30d7\u30eb\u3092\u5e7e\u3064\u304b\u5b9f\u884c\u3057\u3066\u307f\u305f"},"content":{"rendered":"<p>\u30b0\u30ec\u30fc\u30c9\u304c\u9055\u3046\u306e\u3067\u6bd4\u8f03\u3057\u3066\u3082\u4ed5\u65b9\u306a\u3044\u6c17\u304c\u3059\u308b\u304c\u3001<a href=\"https:\/\/peta.okechan.net\/blog\/archives\/2745\" title=\"Compute Capability 3.5 \u306a\u6f14\u7b97\u30e6\u30cb\u30c3\u30c8\u3092\u624b\u306b\u5165\u308c\u305f\">\u6628\u65e5\u66f8\u3044\u305f\u901a\u308a<\/a>GT 520\u3092GT 640(GK208)\u306b\u63db\u88c5\u3057\u305f\u74b0\u5883\u3067\u3044\u304f\u3064\u304bCUDA\u306e\u30b5\u30f3\u30d7\u30eb\u3092\u5b9f\u884c\u3057\u3066\u307f\u305f\u3002<\/p>\n<p>\u5b9f\u884c\u3057\u305f\u306e\u306f\u4ee5\u4e0b\u306e5\u3064\u306e\u30b5\u30f3\u30d7\u30eb\u306e\u307f\u3002<br \/>\n\u3055\u3089\u3063\u3068\u64ab\u3067\u305f\u3060\u3051\u3063\u3066\u611f\u3058\u3067\u3042\u308b\u3002<\/p>\n<ol>\n<li>deviceQuery<\/li>\n<li>bandwidthTest<\/li>\n<li>radixSortThrust<\/li>\n<li>smokeParticles<\/li>\n<li>simpleHyperQ<\/li>\n<\/ol>\n<h3>deviceQuery<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n CUDA Device Query (Runtime API) version (CUDART static linking)\r\n\r\nDetected 1 CUDA Capable device(s)\r\n\r\nDevice 0: &quot;GeForce GT 520&quot;\r\n  CUDA Driver Version \/ Runtime Version          5.5 \/ 5.0\r\n  CUDA Capability Major\/Minor version number:    2.1\r\n  Total amount of global memory:                 1024 MBytes (1073741824 bytes)\r\n  ( 1) Multiprocessors x ( 48) CUDA Cores\/MP:    48 CUDA Cores\r\n  GPU Clock rate:                                1620 MHz (1.62 GHz)\r\n  Memory Clock rate:                             533 Mhz\r\n  Memory Bus Width:                              64-bit\r\n  L2 Cache Size:                                 65536 bytes\r\n  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)\r\n  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048\r\n  Total amount of constant memory:               65536 bytes\r\n  Total amount of shared memory per block:       49152 bytes\r\n  Total number of registers available per block: 32768\r\n  Warp size:                                     32\r\n  Maximum number of threads per multiprocessor:  1536\r\n  Maximum number of threads per block:           1024\r\n  Maximum sizes of each dimension of a block:    1024 x 1024 x 64\r\n  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535\r\n  Maximum memory pitch:                          2147483647 bytes\r\n  Texture alignment:                             512 bytes\r\n  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)\r\n  Run time limit on kernels:                     Yes\r\n  Integrated GPU sharing Host Memory:            No\r\n  Support host page-locked memory mapping:       Yes\r\n  Alignment requirement for Surfaces:            Yes\r\n  Device has ECC support:                        Disabled\r\n  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)\r\n  Device supports Unified Addressing (UVA):      Yes\r\n  Device PCI Bus ID \/ PCI location ID:           1 \/ 0\r\n  Compute Mode:\r\n     &lt; Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) &gt;\r\n\r\ndeviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GT 520\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n CUDA Device Query (Runtime API) version (CUDART static linking)\r\n\r\nDetected 1 CUDA Capable device(s)\r\n\r\nDevice 0: &quot;GeForce GT 640&quot;\r\n  CUDA Driver Version \/ Runtime Version          5.5 \/ 5.0\r\n  CUDA Capability Major\/Minor version number:    3.5\r\n  Total amount of global memory:                 1024 MBytes (1073741824 bytes)\r\n  ( 2) Multiprocessors x (192) CUDA Cores\/MP:    384 CUDA Cores\r\n  GPU Clock rate:                                1046 MHz (1.05 GHz)\r\n  Memory Clock rate:                             2505 Mhz\r\n  Memory Bus Width:                              64-bit\r\n  L2 Cache Size:                                 524288 bytes\r\n  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)\r\n  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048\r\n  Total amount of constant memory:               65536 bytes\r\n  Total amount of shared memory per block:       49152 bytes\r\n  Total number of registers available per block: 65536\r\n  Warp size:                                     32\r\n  Maximum number of threads per multiprocessor:  2048\r\n  Maximum number of threads per block:           1024\r\n  Maximum sizes of each dimension of a block:    1024 x 1024 x 64\r\n  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535\r\n  Maximum memory pitch:                          2147483647 bytes\r\n  Texture alignment:                             512 bytes\r\n  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)\r\n  Run time limit on kernels:                     Yes\r\n  Integrated GPU sharing Host Memory:            No\r\n  Support host page-locked memory mapping:       Yes\r\n  Alignment requirement for Surfaces:            Yes\r\n  Device has ECC support:                        Disabled\r\n  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)\r\n  Device supports Unified Addressing (UVA):      Yes\r\n  Device PCI Bus ID \/ PCI location ID:           1 \/ 0\r\n  Compute Mode:\r\n     &lt; Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) &gt;\r\n\r\ndeviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GT 640\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>\u5dee\u5206<br \/>\nCompute Capability: 2.1 \u2192 3.5<br \/>\nCUDA Cores: 48 \u2192 384<br \/>\nGPU Clock rate: 1620 MHz \u2192 1046 MHz<br \/>\nMemory Clock rate: 533 Mhz \u2192 2505 Mhz<br \/>\nL2 Cache Size: 65536 bytes \u2192 524288 bytes<br \/>\nMax Texture Dimension Size(3D): 2048 x 2048 x 2048 \u2192 4096 x 4096 x 4096<br \/>\nTotal number of registers available per block: 32768 \u2192 65536<br \/>\nMaximum number of threads per multiprocessor: 1536 \u2192 2048<br \/>\nMaximum sizes of each dimension of a grid: 65535 x 65535 x 65535 \u2192 2147483647 x 65535 x 65535<\/p>\n<p>GPU Clock rate\u304c\u4e0b\u304c\u3063\u3066\u308b\u304c\u3001CUDA\u30b3\u30a2\u6570\u304c\u3081\u3063\u3061\u3083\u5897\u3048\u3066\u308b\u306e\u3067\u3060\u3044\u3076\u6027\u80fd\u304c\u9ad8\u305d\u3046\u3002<br \/>\n\u30ab\u30bf\u30ed\u30b0\u30b9\u30da\u30c3\u30af\u4e0a\u3067\u3082\u3001 GT 520 \u304c 155.5 GFLOPS \u306b\u5bfe\u3057\u3066 GT 640(GK208) \u306f 803.3 GFLOPS \u3068\u306a\u3063\u3066\u304a\u308a\u5727\u5012\u7684\u3002<br \/>\n\u3057\u304b\u3057\u3053\u306e\u30af\u30e9\u30b9\u306eGPU\u30671TFLOPS\u76ee\u524d\u3063\u3066\u306a\u3093\u304b\u611f\u6168\u6df1\u3044\u3082\u306e\u304c\u3042\u308b\u3002<\/p>\n<p>\u6700\u5927\u30c6\u30af\u30b9\u30c1\u30e3\u30b5\u30a4\u30ba\u304c\u5897\u3048\u3066\u308b\u3051\u3069\u30014096 x 4096 x 4096 x float4\u306e\u30b5\u30a4\u30ba(16\u30d0\u30a4\u30c8)\u3067\u8a08\u7b97\u3059\u308b\u30681T\u30d0\u30a4\u30c8\u306b\u306a\u308b\u306e\u3067\u5b9f\u8cea\u305d\u3053\u307e\u3067\u306f\u4f7f\u3048\u306a\u3044\u3060\u308d\u3046\u3002<\/p>\n<p>L2\u30ad\u30e3\u30c3\u30b7\u30e5\u3068\u30ec\u30b8\u30b9\u30bf\u6570\u304c\u5897\u3048\u305f\u306e\u306f\u3059\u3054\u304f\u3044\u3044\u3068\u601d\u3046\u3002<br \/>\n\u30b0\u30ea\u30c3\u30c9\u306e\u6700\u5927\u5206\u5272\u6570\u306ex\u6210\u5206\u304c\u5897\u3048\u3066\u308b\u306e\u3082\u3044\u3044\u3002\n<\/p><\/div>\n<h3>bandwidthTest<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;CUDA Bandwidth Test] - Starting...\r\nRunning on...\r\n\r\n Device 0: GeForce GT 520\r\n Quick Mode\r\n\r\n Host to Device Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     6294.4\r\n\r\n Device to Host Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     6319.3\r\n\r\n Device to Device Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     6559.3\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;CUDA Bandwidth Test] - Starting...\r\nRunning on...\r\n\r\n Device 0: GeForce GT 640\r\n Quick Mode\r\n\r\n Host to Device Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     3145.1\r\n\r\n Device to Host Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     3246.2\r\n\r\n Device to Device Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     31320.5\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>Host to Device\u3068Device to Host\u306e\u5e2f\u57df\u5e45\u304c\u534a\u6e1b\u3057\u3066\u3057\u307e\u3063\u3066\u308b\u304c\u3001\u3053\u308c\u306f GT 520 \u304c PCIe 2.0 x 16 \u5bfe\u5fdc\u306a\u306e\u306b\u5bfe\u3057\u3001 GT 640(GK208) \u306f PCIe 2.0 x 8 \u5bfe\u5fdc\u6b62\u307e\u308a\u3060\u304b\u3089\u3067\u3042\u308b\u3002<br \/>\n\u3053\u306e\u30af\u30e9\u30b9\u306a\u3089\u6027\u80fd\u7684\u306a\u30c7\u30e1\u30ea\u30c3\u30c8\u3088\u308a\u6d88\u8cbb\u96fb\u529b\u7684\u306a\u30e1\u30ea\u30c3\u30c8\u306e\u65b9\u304c\u5927\u304d\u3044\u3068\u5224\u65ad\u3057\u3066\u306e\u4ed5\u69d8\u304b\u306a\u3068\u601d\u3046\u3002<br \/>\n\u30ab\u30fc\u30cd\u30eb\u304c\u8d85\u8efd\u304f\u3066\u3001\u30c7\u30d0\u30a4\u30b9-\u30db\u30b9\u30c8\u9593\u306e\u30c7\u30fc\u30bf\u8ee2\u9001\u304c\u30dc\u30c8\u30eb\u30cd\u30c3\u30af\u306b\u306a\u308b\u3088\u3046\u306a\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3\u3060\u3068\u3082\u3057\u304b\u3057\u305f\u3089GT 520\u306b\u9006\u8ee2\u3055\u308c\u308b\u5834\u5408\u3082\u3042\u308b\u304b\u3082\u3057\u308c\u306a\u3044\u3002<\/p>\n<p>Device to Device\u306b\u95a2\u3057\u3066\u306f\u3001GT 520\u304c\u5c11\u3057\u9045\u3059\u304e\u306a\u6c17\u306f\u3059\u308b\u304c\u3001\u307e\u3041\u7d0d\u5f97\u306e\u5024\u306b\u306a\u3063\u3066\u308b\u3002<br \/>\n\u30ab\u30bf\u30ed\u30b0\u30b9\u30da\u30c3\u30af\u4e0a\u3067\u306f\u30d3\u30c7\u30aa\u30e1\u30e2\u30ea\u306e\u5e2f\u57df\u5e45\u306f\u3001 GT 520 \u304c 14.4 GB\/s\u3001 GT 640(GK208) \u304c 40.1 GB\/s \u3068\u306a\u3063\u3066\u3044\u308b\u3002\n<\/p><\/div>\n<h3>radixSortThrust<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nGPU Device 0: &quot;GeForce GT 520&quot; with compute capability 2.1\r\n\r\n\r\nSorting 1048576 32-bit unsigned int keys and values\r\n\r\nradixSort, Throughput = 34.3259 MElements\/s, Time = 0.03055 s, Size = 1048576 elements\r\nTest passed\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nGPU Device 0: &quot;GeForce GT 640&quot; with compute capability 3.5\r\n\r\n\r\nSorting 1048576 32-bit unsigned int keys and values\r\n\r\nradixSort, Throughput = 98.2300 MElements\/s, Time = 0.01067 s, Size = 1048576 elements\r\nTest passed\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>GT 640(GK208) \u306e\u307b\u3046\u304c3\u500d\u304f\u3089\u3044\u901f\u3044\u3002<br \/>\nThrust\u306e\u30bd\u30fc\u30c8\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306b\u3064\u3044\u3066\u306f\u3088\u304f\u7406\u89e3\u3057\u3066\u306a\u3044\u306e\u3067\u3053\u308c\u4ee5\u4e0a\u306f\u30ce\u30fc\u30b3\u30e1\u3002\n<\/p><\/div>\n<h3>smokeParticles<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<p>65535 particles<br \/>\n15-30fps\n<\/p><\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<p>65535 particles<br \/>\n65-94fps\n<\/p><\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>\u7159\u306e\u7403\u304c\u3075\u308f\u3075\u308f\u52d5\u304f\u3084\u3064\u3002<br \/>\nfps\u306e\u6570\u5024\u306f\u3001\u753b\u9762\u4e0a\u306b\u8868\u793a\u3055\u308c\u3066\u308b\u3082\u306e\u3092\u76ee\u8996\u3067\u8a08\u6e2c\u3002<br \/>\n\u7159\u306e\u7403\u304c\u8996\u70b9\u306b\u8fd1\u3044\u3068\u91cd\u304f\u3001\u9060\u3044\u3068\u8efd\u304f\u306a\u308b\u3002<br \/>\n\u7159\u306e\u30b7\u30df\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3\u3082\u3055\u308b\u3053\u3068\u306a\u304c\u3089\u3001\u5f71\uff08\u534a\u900f\u660e\u5f71\uff1f\uff09\u306e\u63cf\u753b\u3082\u3053\u306e\u30af\u30e9\u30b9\u306eGPU\u306b\u3068\u3063\u3066\u306f\u91cd\u3044\u90e8\u985e\u3067\u306f\u306a\u3044\u304b\u3068\u601d\u3046\u3002<\/p>\n<p>GT 640(GK208) \u306e\u307b\u3046\u304c3\u22124\u500d\u901f\u3044\u611f\u3058\u3002\n<\/p><\/div>\n<h3>simpleHyperQ<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nGPU Device 0: &quot;GeForce GT 520&quot; with compute capability 2.1\r\n\r\n&gt; GPU does not support HyperQ\r\n  CUDA kernel runs will have limited concurrency\r\n&gt; Detected Compute SM 2.1 hardware with 1 multi-processors\r\nExpected time for serial execution of 32 sets of kernels = 0.640s\r\nExpected time for fully concurrent execution of 32 sets of kernels = 0.020s\r\nMeasured time for sample = 0.351s\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nGPU Device 0: &quot;GeForce GT 640&quot; with compute capability 3.5\r\n\r\n&gt; Detected Compute SM 3.5 hardware with 2 multi-processors\r\nExpected time for serial execution of 32 sets of kernels = 0.640s\r\nExpected time for fully concurrent execution of 32 sets of kernels = 0.020s\r\nMeasured time for sample = 0.050s\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>HyperQ\u306e\u30c6\u30b9\u30c8\u3002<br \/>\nHyperQ\u306b\u3064\u3044\u3066\u306f<a href=\"http:\/\/news.mynavi.jp\/series\/nvidia_kepler_gpu\/002\/index.html\" target=\"_blank\">\u3053\u3053\u306e\u8aac\u660e<\/a>\u304c\u5206\u304b\u308a\u3084\u3059\u3044\u3002<\/p>\n<p>\u30bd\u30fc\u30b9\u898b\u3066\u306a\u3044\u3051\u3069\u3001serial execution\u3068fully concurrent execution\u306b\u5dee\u304c\u306a\u3044\u3053\u3068\u304b\u3089\u3001\u30ab\u30fc\u30cd\u30eb\u81ea\u4f53\u306fGPU\u306e\u6027\u80fd\u5dee\u3067\u901f\u5ea6\u5dee\u304c\u51fa\u306a\u3044\u3088\u3046\u306a\u3082\u306e\u306b\u306a\u3063\u3066\u308b\u306e\u3067\u306f\u306a\u304b\u308d\u3046\u304b\u3002<br \/>\n\u3067\u3001HyperQ\u306e\u6709\u308a\u7121\u3057\u30677\u500d\u7a0b\u5ea6\u306e\u901f\u5ea6\u304c\u304c\u51fa\u3066\u308b\u307f\u305f\u3044\u3002<\/p>\n<p>HyperQ\u304c\u7121\u52b9\u3067\u3082\u3001serial execution\u306e2\u500d\u8fd1\u3044\u901f\u5ea6\u304c\u51fa\u3066\u308b\u308f\u3051\u3067\u3001\u8907\u6570\u30b9\u30c8\u30ea\u30fc\u30e0\u306e\u540c\u6642\u5b9f\u884c\u304c\u3044\u304b\u306b\u5927\u4e8b\u304b\u304c\u5206\u304b\u308b\u3002\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>\u30b0\u30ec\u30fc\u30c9\u304c\u9055\u3046\u306e\u3067\u6bd4\u8f03\u3057\u3066\u3082\u4ed5\u65b9\u306a\u3044\u6c17\u304c\u3059\u308b\u304c\u3001<a href=\"https:\/\/peta.okechan.net\/blog\/archives\/2745\" title=\"Compute Capability 3.5 \u306a\u6f14\u7b97\u30e6\u30cb\u30c3\u30c8\u3092\u624b\u306b\u5165\u308c\u305f\">\u6628\u65e5\u66f8\u3044\u305f\u901a\u308a<\/a>GT 520\u3092GT 640(GK208)\u306b\u63db\u88c5\u3057\u305f\u74b0\u5883\u3067\u3044\u304f\u3064\u304bCUDA\u306e\u30b5\u30f3\u30d7\u30eb\u3092\u5b9f\u884c\u3057\u3066\u307f\u305f\u3002<\/p>\n<p>\u5b9f\u884c\u3057\u305f\u306e\u306f\u4ee5\u4e0b\u306e5\u3064\u306e\u30b5\u30f3\u30d7\u30eb\u306e\u307f\u3002<br \/>\n\u3055\u3089\u3063\u3068\u64ab\u3067\u305f\u3060\u3051\u3063\u3066\u611f\u3058\u3067\u3042\u308b\u3002<\/p>\n<ol>\n<li>deviceQuery<\/li>\n<li>bandwidthTest<\/li>\n<li>radixSortThrust<\/li>\n<li>smokeParticles<\/li>\n<li>simpleHyperQ<\/li>\n<\/ol>\n<h3>deviceQuery<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n CUDA Device Query (Runtime API) version (CUDART static linking)\r\n\r\nDetected 1 CUDA Capable device(s)\r\n\r\nDevice 0: &quot;GeForce GT 520&quot;\r\n  CUDA Driver Version \/ Runtime Version          5.5 \/ 5.0\r\n  CUDA Capability Major\/Minor version number:    2.1\r\n  Total amount of global memory:                 1024 MBytes (1073741824 bytes)\r\n  ( 1) Multiprocessors x ( 48) CUDA Cores\/MP:    48 CUDA Cores\r\n  GPU Clock rate:                                1620 MHz (1.62 GHz)\r\n  Memory Clock rate:                             533 Mhz\r\n  Memory Bus Width:                              64-bit\r\n  L2 Cache Size:                                 65536 bytes\r\n  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)\r\n  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048\r\n  Total amount of constant memory:               65536 bytes\r\n  Total amount of shared memory per block:       49152 bytes\r\n  Total number of registers available per block: 32768\r\n  Warp size:                                     32\r\n  Maximum number of threads per multiprocessor:  1536\r\n  Maximum number of threads per block:           1024\r\n  Maximum sizes of each dimension of a block:    1024 x 1024 x 64\r\n  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535\r\n  Maximum memory pitch:                          2147483647 bytes\r\n  Texture alignment:                             512 bytes\r\n  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)\r\n  Run time limit on kernels:                     Yes\r\n  Integrated GPU sharing Host Memory:            No\r\n  Support host page-locked memory mapping:       Yes\r\n  Alignment requirement for Surfaces:            Yes\r\n  Device has ECC support:                        Disabled\r\n  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)\r\n  Device supports Unified Addressing (UVA):      Yes\r\n  Device PCI Bus ID \/ PCI location ID:           1 \/ 0\r\n  Compute Mode:\r\n     &lt; Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) &gt;\r\n\r\ndeviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GT 520\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n CUDA Device Query (Runtime API) version (CUDART static linking)\r\n\r\nDetected 1 CUDA Capable device(s)\r\n\r\nDevice 0: &quot;GeForce GT 640&quot;\r\n  CUDA Driver Version \/ Runtime Version          5.5 \/ 5.0\r\n  CUDA Capability Major\/Minor version number:    3.5\r\n  Total amount of global memory:                 1024 MBytes (1073741824 bytes)\r\n  ( 2) Multiprocessors x (192) CUDA Cores\/MP:    384 CUDA Cores\r\n  GPU Clock rate:                                1046 MHz (1.05 GHz)\r\n  Memory Clock rate:                             2505 Mhz\r\n  Memory Bus Width:                              64-bit\r\n  L2 Cache Size:                                 524288 bytes\r\n  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)\r\n  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048\r\n  Total amount of constant memory:               65536 bytes\r\n  Total amount of shared memory per block:       49152 bytes\r\n  Total number of registers available per block: 65536\r\n  Warp size:                                     32\r\n  Maximum number of threads per multiprocessor:  2048\r\n  Maximum number of threads per block:           1024\r\n  Maximum sizes of each dimension of a block:    1024 x 1024 x 64\r\n  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535\r\n  Maximum memory pitch:                          2147483647 bytes\r\n  Texture alignment:                             512 bytes\r\n  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)\r\n  Run time limit on kernels:                     Yes\r\n  Integrated GPU sharing Host Memory:            No\r\n  Support host page-locked memory mapping:       Yes\r\n  Alignment requirement for Surfaces:            Yes\r\n  Device has ECC support:                        Disabled\r\n  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)\r\n  Device supports Unified Addressing (UVA):      Yes\r\n  Device PCI Bus ID \/ PCI location ID:           1 \/ 0\r\n  Compute Mode:\r\n     &lt; Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) &gt;\r\n\r\ndeviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GT 640\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>\u5dee\u5206<br \/>\nCompute Capability: 2.1 \u2192 3.5<br \/>\nCUDA Cores: 48 \u2192 384<br \/>\nGPU Clock rate: 1620 MHz \u2192 1046 MHz<br \/>\nMemory Clock rate: 533 Mhz \u2192 2505 Mhz<br \/>\nL2 Cache Size: 65536 bytes \u2192 524288 bytes<br \/>\nMax Texture Dimension Size(3D): 2048 x 2048 x 2048 \u2192 4096 x 4096 x 4096<br \/>\nTotal number of registers available per block: 32768 \u2192 65536<br \/>\nMaximum number of threads per multiprocessor: 1536 \u2192 2048<br \/>\nMaximum sizes of each dimension of a grid: 65535 x 65535 x 65535 \u2192 2147483647 x 65535 x 65535<\/p>\n<p>GPU Clock rate\u304c\u4e0b\u304c\u3063\u3066\u308b\u304c\u3001CUDA\u30b3\u30a2\u6570\u304c\u3081\u3063\u3061\u3083\u5897\u3048\u3066\u308b\u306e\u3067\u3060\u3044\u3076\u6027\u80fd\u304c\u9ad8\u305d\u3046\u3002<br \/>\n\u30ab\u30bf\u30ed\u30b0\u30b9\u30da\u30c3\u30af\u4e0a\u3067\u3082\u3001 GT 520 \u304c 155.5 GFLOPS \u306b\u5bfe\u3057\u3066 GT 640(GK208) \u306f 803.3 GFLOPS \u3068\u306a\u3063\u3066\u304a\u308a\u5727\u5012\u7684\u3002<br \/>\n\u3057\u304b\u3057\u3053\u306e\u30af\u30e9\u30b9\u306eGPU\u30671TFLOPS\u76ee\u524d\u3063\u3066\u306a\u3093\u304b\u611f\u6168\u6df1\u3044\u3082\u306e\u304c\u3042\u308b\u3002<\/p>\n<p>\u6700\u5927\u30c6\u30af\u30b9\u30c1\u30e3\u30b5\u30a4\u30ba\u304c\u5897\u3048\u3066\u308b\u3051\u3069\u30014096 x 4096 x 4096 x float4\u306e\u30b5\u30a4\u30ba(16\u30d0\u30a4\u30c8)\u3067\u8a08\u7b97\u3059\u308b\u30681T\u30d0\u30a4\u30c8\u306b\u306a\u308b\u306e\u3067\u5b9f\u8cea\u305d\u3053\u307e\u3067\u306f\u4f7f\u3048\u306a\u3044\u3060\u308d\u3046\u3002<\/p>\n<p>L2\u30ad\u30e3\u30c3\u30b7\u30e5\u3068\u30ec\u30b8\u30b9\u30bf\u6570\u304c\u5897\u3048\u305f\u306e\u306f\u3059\u3054\u304f\u3044\u3044\u3068\u601d\u3046\u3002<br \/>\n\u30b0\u30ea\u30c3\u30c9\u306e\u6700\u5927\u5206\u5272\u6570\u306ex\u6210\u5206\u304c\u5897\u3048\u3066\u308b\u306e\u3082\u3044\u3044\u3002\n<\/p><\/div>\n<h3>bandwidthTest<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;CUDA Bandwidth Test] - Starting...\r\nRunning on...\r\n\r\n Device 0: GeForce GT 520\r\n Quick Mode\r\n\r\n Host to Device Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     6294.4\r\n\r\n Device to Host Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     6319.3\r\n\r\n Device to Device Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     6559.3\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;CUDA Bandwidth Test] - Starting...\r\nRunning on...\r\n\r\n Device 0: GeForce GT 640\r\n Quick Mode\r\n\r\n Host to Device Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     3145.1\r\n\r\n Device to Host Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     3246.2\r\n\r\n Device to Device Bandwidth, 1 Device(s)\r\n PINNED Memory Transfers\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     31320.5\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>Host to Device\u3068Device to Host\u306e\u5e2f\u57df\u5e45\u304c\u534a\u6e1b\u3057\u3066\u3057\u307e\u3063\u3066\u308b\u304c\u3001\u3053\u308c\u306f GT 520 \u304c PCIe 2.0 x 16 \u5bfe\u5fdc\u306a\u306e\u306b\u5bfe\u3057\u3001 GT 640(GK208) \u306f PCIe 2.0 x 8 \u5bfe\u5fdc\u6b62\u307e\u308a\u3060\u304b\u3089\u3067\u3042\u308b\u3002<br \/>\n\u3053\u306e\u30af\u30e9\u30b9\u306a\u3089\u6027\u80fd\u7684\u306a\u30c7\u30e1\u30ea\u30c3\u30c8\u3088\u308a\u6d88\u8cbb\u96fb\u529b\u7684\u306a\u30e1\u30ea\u30c3\u30c8\u306e\u65b9\u304c\u5927\u304d\u3044\u3068\u5224\u65ad\u3057\u3066\u306e\u4ed5\u69d8\u304b\u306a\u3068\u601d\u3046\u3002<br \/>\n\u30ab\u30fc\u30cd\u30eb\u304c\u8d85\u8efd\u304f\u3066\u3001\u30c7\u30d0\u30a4\u30b9-\u30db\u30b9\u30c8\u9593\u306e\u30c7\u30fc\u30bf\u8ee2\u9001\u304c\u30dc\u30c8\u30eb\u30cd\u30c3\u30af\u306b\u306a\u308b\u3088\u3046\u306a\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3\u3060\u3068\u3082\u3057\u304b\u3057\u305f\u3089GT 520\u306b\u9006\u8ee2\u3055\u308c\u308b\u5834\u5408\u3082\u3042\u308b\u304b\u3082\u3057\u308c\u306a\u3044\u3002<\/p>\n<p>Device to Device\u306b\u95a2\u3057\u3066\u306f\u3001GT 520\u304c\u5c11\u3057\u9045\u3059\u304e\u306a\u6c17\u306f\u3059\u308b\u304c\u3001\u307e\u3041\u7d0d\u5f97\u306e\u5024\u306b\u306a\u3063\u3066\u308b\u3002<br \/>\n\u30ab\u30bf\u30ed\u30b0\u30b9\u30da\u30c3\u30af\u4e0a\u3067\u306f\u30d3\u30c7\u30aa\u30e1\u30e2\u30ea\u306e\u5e2f\u57df\u5e45\u306f\u3001 GT 520 \u304c 14.4 GB\/s\u3001 GT 640(GK208) \u304c 40.1 GB\/s \u3068\u306a\u3063\u3066\u3044\u308b\u3002\n<\/p><\/div>\n<h3>radixSortThrust<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nGPU Device 0: &quot;GeForce GT 520&quot; with compute capability 2.1\r\n\r\n\r\nSorting 1048576 32-bit unsigned int keys and values\r\n\r\nradixSort, Throughput = 34.3259 MElements\/s, Time = 0.03055 s, Size = 1048576 elements\r\nTest passed\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nGPU Device 0: &quot;GeForce GT 640&quot; with compute capability 3.5\r\n\r\n\r\nSorting 1048576 32-bit unsigned int keys and values\r\n\r\nradixSort, Throughput = 98.2300 MElements\/s, Time = 0.01067 s, Size = 1048576 elements\r\nTest passed\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>GT 640(GK208) \u306e\u307b\u3046\u304c3\u500d\u304f\u3089\u3044\u901f\u3044\u3002<br \/>\nThrust\u306e\u30bd\u30fc\u30c8\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306b\u3064\u3044\u3066\u306f\u3088\u304f\u7406\u89e3\u3057\u3066\u306a\u3044\u306e\u3067\u3053\u308c\u4ee5\u4e0a\u306f\u30ce\u30fc\u30b3\u30e1\u3002\n<\/p><\/div>\n<h3>smokeParticles<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<p>65535 particles<br \/>\n15-30fps\n<\/p><\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<p>65535 particles<br \/>\n65-94fps\n<\/p><\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>\u7159\u306e\u7403\u304c\u3075\u308f\u3075\u308f\u52d5\u304f\u3084\u3064\u3002<br \/>\nfps\u306e\u6570\u5024\u306f\u3001\u753b\u9762\u4e0a\u306b\u8868\u793a\u3055\u308c\u3066\u308b\u3082\u306e\u3092\u76ee\u8996\u3067\u8a08\u6e2c\u3002<br \/>\n\u7159\u306e\u7403\u304c\u8996\u70b9\u306b\u8fd1\u3044\u3068\u91cd\u304f\u3001\u9060\u3044\u3068\u8efd\u304f\u306a\u308b\u3002<br \/>\n\u7159\u306e\u30b7\u30df\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3\u3082\u3055\u308b\u3053\u3068\u306a\u304c\u3089\u3001\u5f71\uff08\u534a\u900f\u660e\u5f71\uff1f\uff09\u306e\u63cf\u753b\u3082\u3053\u306e\u30af\u30e9\u30b9\u306eGPU\u306b\u3068\u3063\u3066\u306f\u91cd\u3044\u90e8\u985e\u3067\u306f\u306a\u3044\u304b\u3068\u601d\u3046\u3002<\/p>\n<p>GT 640(GK208) \u306e\u307b\u3046\u304c3\u22124\u500d\u901f\u3044\u611f\u3058\u3002\n<\/p><\/div>\n<h3>simpleHyperQ<\/h3>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 520<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nGPU Device 0: &quot;GeForce GT 520&quot; with compute capability 2.1\r\n\r\n&gt; GPU does not support HyperQ\r\n  CUDA kernel runs will have limited concurrency\r\n&gt; Detected Compute SM 2.1 hardware with 1 multi-processors\r\nExpected time for serial execution of 32 sets of kernels = 0.640s\r\nExpected time for fully concurrent execution of 32 sets of kernels = 0.020s\r\nMeasured time for sample = 0.351s\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>GT 640(GK208)<\/h4>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nGPU Device 0: &quot;GeForce GT 640&quot; with compute capability 3.5\r\n\r\n&gt; Detected Compute SM 3.5 hardware with 2 multi-processors\r\nExpected time for serial execution of 32 sets of kernels = 0.640s\r\nExpected time for fully concurrent execution of 32 sets of kernels = 0.020s\r\nMeasured time for sample = 0.050s\r\n<\/pre>\n<\/div>\n<div style=\"padding-left:30px;padding-bottom:30px;\">\n<h4>\u30b3\u30e1\u30f3\u30c8<\/h4>\n<p>HyperQ\u306e\u30c6\u30b9\u30c8\u3002<br \/>\nHyperQ\u306b\u3064\u3044\u3066\u306f<a href=\"http:\/\/news.mynavi.jp\/series\/nvidia_kepler_gpu\/002\/index.html\" target=\"_blank\">\u3053\u3053\u306e\u8aac\u660e<\/a>\u304c\u5206\u304b\u308a\u3084\u3059\u3044\u3002<\/p>\n<p>\u30bd\u30fc\u30b9\u898b\u3066\u306a\u3044\u3051\u3069\u3001serial execution\u3068fully concurrent execution\u306b\u5dee\u304c\u306a\u3044\u3053\u3068\u304b\u3089\u3001\u30ab\u30fc\u30cd\u30eb\u81ea\u4f53\u306fGPU\u306e\u6027\u80fd\u5dee\u3067\u901f\u5ea6\u5dee\u304c\u51fa\u306a\u3044\u3088\u3046\u306a\u3082\u306e\u306b\u306a\u3063\u3066\u308b\u306e\u3067\u306f\u306a\u304b\u308d\u3046\u304b\u3002<br \/>\n\u3067\u3001HyperQ\u306e\u6709\u308a\u7121\u3057\u30677\u500d\u7a0b\u5ea6\u306e\u901f\u5ea6\u304c\u304c\u51fa\u3066\u308b\u307f\u305f\u3044\u3002<\/p>\n<p>HyperQ\u304c\u7121\u52b9\u3067\u3082\u3001serial execution\u306e2\u500d\u8fd1\u3044\u901f\u5ea6\u304c\u51fa\u3066\u308b\u308f\u3051\u3067\u3001\u8907\u6570\u30b9\u30c8\u30ea\u30fc\u30e0\u306e\u540c\u6642\u5b9f\u884c\u304c\u3044\u304b\u306b\u5927\u4e8b\u304b\u304c\u5206\u304b\u308b\u3002\n<\/p><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[289,462,461],"class_list":["post-2753","post","type-post","status-publish","format-standard","hentry","category-tech","tag-cuda","tag-geforce-gt-520","tag-geforce-gt-640"],"_links":{"self":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts\/2753","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/comments?post=2753"}],"version-history":[{"count":0,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts\/2753\/revisions"}],"wp:attachment":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/media?parent=2753"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/categories?post=2753"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/tags?post=2753"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}