{"id":776,"date":"2010-08-13T19:36:58","date_gmt":"2010-08-13T10:36:58","guid":{"rendered":"http:\/\/peta.okechan.net\/blog\/?p=776"},"modified":"2010-08-13T19:36:58","modified_gmt":"2010-08-13T10:36:58","slug":"geforce-gtx-460-%e3%81%a8-gtx-260-%e3%81%ae-cuda-%e6%80%a7%e8%83%bd%e6%af%94%e8%bc%83","status":"publish","type":"post","link":"https:\/\/peta.okechan.net\/blog\/archives\/776","title":{"rendered":"GeForce GTX 460 \u3068 GTX 260 \u306e CUDA \u6027\u80fd\u6bd4\u8f03"},"content":{"rendered":"<p>\u3082\u3046\u3059\u3050\u8a95\u751f\u65e5\u3068\u3044\u3046\u3053\u3068\u3067\uff08\u6ce3\u3001GTX 460\u3092\u8cb7\u3063\u3066\u3044\u305f\u3060\u304d\u307e\u3057\u305f\u306e\u3067\uff08\u559c\u3001\u4eca\u307e\u3067\u4f7f\u3063\u3066\u305fGTX 260\u3068\u306e\u6027\u80fd\u6bd4\u8f03\u3092\u3057\u307e\u3059\u3002<br \/>\n\u5358\u306b3D\u63cf\u753b\u306e\u30d9\u30f3\u30c1\u3068\u3063\u3066\u3082\u4ed6\u3067\u6563\u3005\u3084\u3089\u308c\u3066\u308b\u3057\u3064\u307e\u3089\u306a\u3044\u306e\u3067\uff08\u672c\u5f53\u306f\u3061\u3083\u3093\u3068\u30d9\u30f3\u30c1\u3092\u3068\u308b\u582a\u3048\u6027\u304c\u306a\u3044\u3060\u3051w\uff09\u3001NVIDIA\u306e\u30b5\u30a4\u30c8\u304b\u3089CUDA Toolkit\u3068\u30b5\u30f3\u30d7\u30eb\u30b3\u30fc\u30c9\u3092\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3057\u3066\u5b9f\u884c\u7d50\u679c\u3092\u3066\u304d\u3068\u3046\u306b\u898b\u3066\u3044\u304d\u307e\u3059\u3002<\/p>\n<p>\u4f7f\u7528\u3057\u305f\u74b0\u5883\u306f\u4ee5\u4e0b\u306e\u3068\u304a\u308a\u3002<\/p>\n<ul>\n<li>CPU: Intel Core 2 Duo E8400<\/li>\n<li>\u30de\u30b6\u30fc\u30dc\u30fc\u30c9: Gigabyte Technology Co., Ltd. GA-E7AUM-DS2H (NVIDIA GeForce 9400, BIOS F4)<\/li>\n<li>\u30e1\u30a4\u30f3\u30e1\u30e2\u30ea: PC2-6400 2GB x 2<\/li>\n<li>HDD: \u578b\u756a\u5931\u5ff5\uff08IBM\u306e1U\u30b5\u30fc\u30d0\u30fc\u304b\u3089\u629c\u3044\u305f80GB\u306e3.5\u30a4\u30f3\u30c1SATA\u30c9\u30e9\u30a4\u30d6\uff09x 2 \u306b\u3088\u308b\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2RAID 0<\/li>\n<li>OS: Windows 7 Ultimate 64bit<\/li>\n<li>\u30b0\u30e9\u30d5\u30a3\u30c3\u30af\u30b9\u30c9\u30e9\u30a4\u30d0: GeForce Driver 258.96<\/li>\n<li>CUDA Toolkit: \u30d0\u30fc\u30b8\u30e7\u30f33.1<\/li>\n<li>GTX 260: Galaxy GF PGTX260+\/896D3<\/li>\n<li>GTX 460: ASUS ENGTX460 DirectCU\/2DI\/1GD5<\/li>\n<\/ul>\n<p>\u30d9\u30fc\u30b9\u304c\u53e4\u3044\u3067\u3059\u306d\u3047\uff08\u6c57\u3000\u4e00\u5fdcClarkdale Xeon + H57\u306e\u30de\u30b7\u30f3\u3082\u3042\u308b\u306e\u3067\u3059\u304c\u9762\u5012\u306a\u306e\u3067\u653e\u7f6e\u4e2d\u3002<\/p>\n<h5>Device Query<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nDevice 0: &quot;GeForce GTX 260&quot;\r\n  CUDA Driver Version:                           3.10\r\n  CUDA Runtime Version:                          3.10\r\n  CUDA Capability Major revision number:         1\r\n  CUDA Capability Minor revision number:         3\r\n  Total amount of global memory:                 922091520 bytes\r\n  Number of multiprocessors:                     27\r\n  Number of cores:                               216\r\n  Total amount of constant memory:               65536 bytes\r\n  Total amount of shared memory per block:       16384 bytes\r\n  Total number of registers available per block: 16384\r\n  Warp size:                                     32\r\n  Maximum number of threads per block:           512\r\n  Maximum sizes of each dimension of a block:    512 x 512 x 64\r\n  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1\r\n  Maximum memory pitch:                          2147483647 bytes\r\n  Texture alignment:                             256 bytes\r\n  Clock rate:                                    1.24 GHz\r\n  Concurrent copy and execution:                 Yes\r\n  Run time limit on kernels:                     Yes\r\n  Integrated:                                    No\r\n  Support host page-locked memory mapping:       Yes\r\n  Compute mode:                                  Default (multiple host threads\r\ncan use this device simultaneously)\r\n  Concurrent kernel execution:                   No\r\n  Device has ECC support enabled:                No\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nDevice 0: &quot;GeForce GTX 460&quot;\r\n  CUDA Driver Version:                           3.10\r\n  CUDA Runtime Version:                          3.10\r\n  CUDA Capability Major revision number:         2\r\n  CUDA Capability Minor revision number:         1\r\n  Total amount of global memory:                 1041694720 bytes\r\n  Number of multiprocessors:                     7\r\n  Number of cores:                               224\r\n  Total amount of constant memory:               65536 bytes\r\n  Total amount of shared memory per block:       49152 bytes\r\n  Total number of registers available per block: 32768\r\n  Warp size:                                     32\r\n  Maximum number of threads per block:           1024\r\n  Maximum sizes of each dimension of a block:    1024 x 1024 x 64\r\n  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1\r\n  Maximum memory pitch:                          2147483647 bytes\r\n  Texture alignment:                             512 bytes\r\n  Clock rate:                                    0.81 GHz\r\n  Concurrent copy and execution:                 Yes\r\n  Run time limit on kernels:                     Yes\r\n  Integrated:                                    No\r\n  Support host page-locked memory mapping:       Yes\r\n  Compute mode:                                  Default (multiple host threads\r\ncan use this device simultaneously)\r\n  Concurrent kernel execution:                   Yes\r\n  Device has ECC support enabled:                No\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\nshared memory \u3084 registers\u306e\u5897\u52a0\u3092\u898b\u308b\u9650\u308a\u6027\u80fd\u304c\u4e0a\u304c\u3063\u3066\u305d\u3046\u3067\u3059\u3002<br \/>\ndimension of a block\u3082\u5897\u52a0\u3057\u3066\u3066\u3059\u3070\u3089\u3057\u3044\u3002\u305f\u3060\u3057512 x 512 x 64\u4ee5\u4e0b\u3067\u6c7a\u3081\u6253\u3061\u3067\u66f8\u304b\u308c\u305f\u65e2\u5b58\u306e\u30d7\u30ed\u30b0\u30e9\u30e0\u306f\u5b9f\u884c\u52b9\u7387\u304c\u60aa\u304f\u306a\u308b\u304b\u3082\u3002<br \/>\n\u30b3\u30a2\u6570\u306e224\u3068\u3044\u3046\u306e\u306f336\u306e\u9593\u9055\u3044\u3060\u3068\u601d\u3046\u3093\u3067\u3059\u304c\u3001\u3053\u306e224\u3068\u3044\u3046\u6570\u5b57GPU-Z 0.4.4\u3067\u3082\u51fa\u3066\u304f\u308b\u3093\u3067\u3059\u3088\u306d\u3002<br \/>\n\u3042\u3068Clock rate\u306e0.81 GHz\u3068\u3044\u3046\u306e\u3082\u6c17\u306b\u306a\u308a\u307e\u3059\u304c\u3001\u8ca0\u8377\u3092\u304b\u3051\u308b\u3068\u3061\u3083\u3093\u30681.35GHz\u307e\u3067\u4e0a\u304c\u308b\u3088\u3046\u3067\u3059\u3002<\/p>\n<h5>Bandwidth Test<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n Host to Device Bandwidth, 1 Device(s), Paged memory\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     1892.9\r\n\r\n Device to Host Bandwidth, 1 Device(s), Paged memory\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     1647.0\r\n\r\n Device to Device Bandwidth, 1 Device(s)\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     91015.4\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n Host to Device Bandwidth, 1 Device(s), Paged memory\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     1826.8\r\n\r\n Device to Host Bandwidth, 1 Device(s), Paged memory\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     1785.1\r\n\r\n Device to Device Bandwidth, 1 Device(s)\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     59815.6\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\nGTX 460\u306f260\u306b\u6bd4\u3079\u3066\u30d0\u30b9\u5e45\u306f\u6e1b\u3063\u3066\u307e\u3059\u304c\u30e1\u30e2\u30ea\u30af\u30ed\u30c3\u30af\u304c\u305d\u308c\u4ee5\u4e0a\u306b\u4e0a\u304c\u3063\u305f\u305f\u3081\u3001\u5358\u7d14\u8a08\u7b97\u3067\u306f\u5e2f\u57df\u5e45\u306f\u5fae\u5999\u306b\u5897\u3048\u3066\u3044\u308b\u306f\u305a\u3067\u3059\u304c\u3001Device to Device\u304c\u3084\u305f\u3089\u3068\u9045\u3044\u3067\u3059\u306d\u3002\u306a\u3093\u3060\u308d\u3046\u3053\u308c\u306f\u3002<\/p>\n<h5>Simple Multi Copy and Conpute<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nMeasured timings (throughput):\r\n Memcpy host to device  : 3.138688 ms (5.345296 GB\/s)\r\n Memcpy device to host  : 3.127328 ms (5.364713 GB\/s)\r\n Kernel                 : 1.700704 ms (98.648655 GB\/s)\r\n\r\nTheoretical limits for overlaps (* capability of this device):\r\n          c &lt;  1.0      : 7.966720 ms (No overlap, fully serial)\r\n * 1.1 &lt;= c &lt;  2.0      : 6.266016 ms (Compute overlaps with one memcopy)\r\n          c &gt;= 2.0      : 3.138688 ms (Compute overlaps with two memcopies)\r\n\r\nAverage measured timings over 10 repetitions:\r\n Avg. time when execution fully serialized      : 8.540992 ms\r\n Avg. time when overlapped using 4 streams      : 6.622292 ms\r\n Avg. latency hidden (serialized - overlapped)  : 1.918700 ms\r\n\r\nMeasured throughput:\r\n Fully serialized execution             : 3.928634 GB\/s\r\n Overlapped using 4 streams             : 5.066891 GB\/s\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nMeasured timings (throughput):\r\n Memcpy host to device  : 3.040512 ms (5.517892 GB\/s)\r\n Memcpy device to host  : 2.953472 ms (5.680506 GB\/s)\r\n Kernel                 : 0.894176 ms (187.627669 GB\/s)\r\n\r\nTheoretical limits for overlaps (* capability of this device):\r\n          c &lt;  1.0      : 6.888160 ms (No overlap, fully serial)\r\n * 1.1 &lt;= c &lt;  2.0      : 5.993984 ms (Compute overlaps with one memcopy)\r\n          c &gt;= 2.0      : 3.040512 ms (Compute overlaps with two memcopies)\r\n\r\nAverage measured timings over 10 repetitions:\r\n Avg. time when execution fully serialized      : 6.694787 ms\r\n Avg. time when overlapped using 4 streams      : 5.950045 ms\r\n Avg. latency hidden (serialized - overlapped)  : 0.744742 ms\r\n\r\nMeasured throughput:\r\n Fully serialized execution             : 5.012024 GB\/s\r\n Overlapped using 4 streams             : 5.639358 GB\/s\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u3053\u3061\u3089\u306fBandwidth Test\u3068\u9055\u3063\u3066\u3061\u3083\u3093\u3068\u5e2f\u57df\u5e45\u306e\u673a\u4e0a\u8a08\u7b97\u5024\u306b\u8fd1\u3044\u5dee\u304c\u51fa\u3066\u308b\u3088\u3046\u3067\u3059\u3002<\/p>\n<h5>Pitch Linear Texture<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nBandwidth (GB\/s) for pitch linear: 6.35e+001; for array: 6.42e+001\r\n\r\nTexture fetch rate (Mpix\/s) for pitch linear: 7.94e+003; for array: 8.02e+003\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nBandwidth (GB\/s) for pitch linear: 2.48e+001; for array: 2.48e+001\r\n\r\nTexture fetch rate (Mpix\/s) for pitch linear: 3.10e+003; for array: 3.10e+003\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\nGTX 460\u306f260\u3088\u308a\u30c6\u30af\u30b9\u30c1\u30e3\u30e6\u30cb\u30c3\u30c8\u6570\u304c\u5c11\u306a\u3044\u305f\u3081\u3053\u308c\u3060\u3051\u306e\u5dee\u304c\u51fa\u3066\u3057\u307e\u3063\u305f\u3093\u3067\u3057\u3087\u3046\u304b\u3002<\/p>\n<h5>Simple Texture<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nLoaded 'lena_bw.pgm', 512 x 512 pixels\r\nProcessing time: 0.183976 (ms)\r\n1424.88 Mpixels\/sec\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nLoaded 'lena_bw.pgm', 512 x 512 pixels\r\nProcessing time: 0.300369 (ms)\r\n872.74 Mpixels\/sec\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u3046\u3093\u3002\u3084\u3063\u3071\u30c6\u30af\u30b9\u30c1\u30e3\u7cfb\u306f\u9045\u3044\u3063\u307d\u3044w<\/p>\n<h5>Aligned Types<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nCUDA device &#x5B;GeForce GTX 260] has 27 Multi-Processors\r\nSM scaling value = 1.00\r\n&gt; Memory Size = 49999872\r\nAllocating memory...\r\nGenerating host input data array...\r\nUploading input data to GPU memory...\r\nTesting misaligned types...\r\nuint8...\r\nAvg. time: 4.047190 ms \/ Copy throughput: 11.505764 GB\/s.\r\n        TEST OK\r\nuint16...\r\nAvg. time: 2.135780 ms \/ Copy throughput: 21.802813 GB\/s.\r\n        TEST OK\r\nRGBA8_misaligned...\r\nAvg. time: 5.493528 ms \/ Copy throughput: 8.476522 GB\/s.\r\n        TEST OK\r\nLA32_misaligned...\r\nAvg. time: 2.169678 ms \/ Copy throughput: 21.462173 GB\/s.\r\n        TEST OK\r\nRGB32_misaligned...\r\nAvg. time: 3.312490 ms \/ Copy throughput: 14.057707 GB\/s.\r\n        TEST OK\r\nRGBA32_misaligned...\r\nAvg. time: 4.400230 ms \/ Copy throughput: 10.582630 GB\/s.\r\n        TEST OK\r\nTesting aligned types...\r\nRGBA8...\r\nAvg. time: 1.295567 ms \/ Copy throughput: 35.942574 GB\/s.\r\n        TEST OK\r\nI32...\r\nAvg. time: 1.266223 ms \/ Copy throughput: 36.775511 GB\/s.\r\n        TEST OK\r\nLA32...\r\nAvg. time: 1.096871 ms \/ Copy throughput: 42.453501 GB\/s.\r\n        TEST OK\r\nRGB32...\r\nAvg. time: 1.322788 ms \/ Copy throughput: 35.202927 GB\/s.\r\n        TEST OK\r\nRGBA32...\r\nAvg. time: 1.295364 ms \/ Copy throughput: 35.948197 GB\/s.\r\n        TEST OK\r\nRGBA32_2...\r\nAvg. time: 2.611731 ms \/ Copy throughput: 17.829558 GB\/s.\r\n        TEST OK\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nCUDA device &#x5B;GeForce GTX 460] has 7 Multi-Processors\r\nSM scaling value = 3.43\r\n&gt; Memory Size = 14583296\r\nAllocating memory...\r\nGenerating host input data array...\r\nUploading input data to GPU memory...\r\nTesting misaligned types...\r\nuint8...\r\nAvg. time: 0.907252 ms \/ Copy throughput: 14.970209 GB\/s.\r\n        TEST OK\r\nuint16...\r\nAvg. time: 0.512793 ms \/ Copy throughput: 26.485826 GB\/s.\r\n        TEST OK\r\nRGBA8_misaligned...\r\nAvg. time: 0.612984 ms \/ Copy throughput: 22.156784 GB\/s.\r\n        TEST OK\r\nLA32_misaligned...\r\nAvg. time: 0.376848 ms \/ Copy throughput: 36.040365 GB\/s.\r\n        TEST OK\r\nRGB32_misaligned...\r\nAvg. time: 0.489722 ms \/ Copy throughput: 27.733607 GB\/s.\r\n        TEST OK\r\nRGBA32_misaligned...\r\nAvg. time: 0.654871 ms \/ Copy throughput: 20.739572 GB\/s.\r\n        TEST OK\r\nTesting aligned types...\r\nRGBA8...\r\nAvg. time: 0.364849 ms \/ Copy throughput: 37.225730 GB\/s.\r\n        TEST OK\r\nI32...\r\nAvg. time: 0.348742 ms \/ Copy throughput: 38.944976 GB\/s.\r\n        TEST OK\r\nLA32...\r\nAvg. time: 0.315452 ms \/ Copy throughput: 43.054914 GB\/s.\r\n        TEST OK\r\nRGB32...\r\nAvg. time: 0.318769 ms \/ Copy throughput: 42.606860 GB\/s.\r\n        TEST OK\r\nRGBA32...\r\nAvg. time: 0.319548 ms \/ Copy throughput: 42.503039 GB\/s.\r\n        TEST OK\r\nRGBA32_2...\r\nAvg. time: 0.518223 ms \/ Copy throughput: 26.208342 GB\/s.\r\n        TEST OK\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u3053\u306e\u30c6\u30b9\u30c8\u3067\u306f\u78ba\u5b9f\u306b460\u306e\u307b\u3046\u304c\u901f\u3044\u3088\u3046\u3067\u3059\u3002<\/p>\n<h5>Post-Process In OpenGL<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nradius=16\r\nCUDA GL Post Processing (512 x 512): 28.7 fps\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nradius=16\r\nCUDA GL Post Processing (512 x 512): 20.1 fps\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u304f\u308b\u304f\u308b\u56de\u308b\u30c6\u30a3\u30fc\u30dd\u30c3\u30c8\u306b\u30dd\u30b9\u30c8\u30d7\u30ed\u30bb\u30b9\u3067\u307c\u304b\u3057\u3092\u5165\u308c\u308b\u30e4\u30c4\u3067\u3059\u306d\u3002GTX 260\u304c\u901f\u3044\u3068\u3044\u3046\u7d50\u679c\u306b\uff08\u6c57<\/p>\n<h5>Fluids (Direct3D Version)<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n1404.2fps\u301c1442.9fps\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n1441.7fps\u301c1535.0fps\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u7dd1\u306e\u7c92\u5b50\u304c\u3046\u3054\u3081\u304f\u30e4\u30c4\u3067\u3059\u306d\u3002\u3053\u3061\u3089\u306fGTX 460\u306e\u52dd\u5229\u3002<\/p>\n<h5>MersenneTwister<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nInitializing data for 24000000 samples...\r\nLoading CPU and GPU twisters configurations...\r\nGenerating random numbers on GPU...\r\n\r\nMersenneTwister, Throughput = 1.6773 GNumbers\/s, Time = 0.01431 s, Size = 240025\r\n60 Numbers, NumDevsUsed = 1, Workgroup = 128\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nInitializing data for 24000000 samples...\r\nLoading CPU and GPU twisters configurations...\r\nGenerating random numbers on GPU...\r\n\r\nMersenneTwister, Throughput = 2.2123 GNumbers\/s, Time = 0.01085 s, Size = 240025\r\n60 Numbers, NumDevsUsed = 1, Workgroup = 128\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u3053\u3061\u3089\u306fGTX 460\u304c\u901f\u3044\u3067\u3059\u306d\u3002<\/p>\n<h5>Mandelbrot<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nanimate colors on\r\nHardware Single Precision 60fps\u524d\u5f8c\r\nEmulated Double-Single Precision 32fps\u524d\u5f8c\r\nHardware Double Precision 56fps\u524d\u5f8c\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nanimate colors on\r\nHardware Single Precision 68fps\u524d\u5f8c\r\nEmulated Double-Single Precision 26fps\u524d\u5f8c\r\nHardware Double Precision 45fps\u524d\u5f8c\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\nHardware Single Precision\u3067\u52dd\u3063\u3066\u308b\u306e\u306bHardware Double Precision\u3067\u8ca0\u3051\u308b\u306e\u306fGTX 460\uff08GF104\uff09\u306e\u8a2d\u8a08\u4e0a\u3001\u500d\u7cbe\u5ea6\u6f14\u7b97\u306e\u30d4\u30fc\u30af\u6027\u80fd\u304c\u5358\u7cbe\u5ea6\u306e1\/12\u306b\u6291\u3048\u3089\u308c\u3066\u308b\u304b\u3089\u3067\u3057\u3087\u3046\u304b\u3002<\/p>\n<h5>\u307e\u3068\u3081<\/h5>\n<p>GTX 460\u306f260\u306b\u6bd4\u3079\u3066\u30c6\u30af\u30b9\u30c1\u30e3\u30e6\u30cb\u30c3\u30c8\u304c\u5c11\u306a\u3044\u305b\u3044\u304b\u3001\u30c6\u30af\u30b9\u30c1\u30e3\u7cfb\u306e\u51e6\u7406\u304c\u9045\u3044\u3088\u3046\u3067\u3059\u3002<br \/>\n\u305f\u3060\u3057\u305d\u308c\u4ee5\u5916\u306e\u70b9\u3067\u306f\u78ba\u5b9f\u306b\u6027\u80fdUP\u3057\u6c4e\u7528\u8a08\u7b97\u5411\u304d\u306b\u306a\u3063\u3066\u308b\u3088\u3046\u3067\u3059\u3002<br \/>\n\u3057\u304b\u3057\u305d\u3046\u306f\u8a00\u3063\u3066\u3082\u5143\u3005\u601d\u3044\u5207\u3063\u3066\u6c4e\u7528\u8a08\u7b97\u306b\u632f\u3063\u305fGF100\u304b\u3089\u30b2\u30fc\u30e0\u5411\u304d\u306b\u30d0\u30e9\u30f3\u30b9\u3092\u3068\u308a\u306a\u304a\u3057\u305f\u306e\u304cGF104\u306a\u306e\u3067\u3001GPU\u30b3\u30f3\u30d4\u30e5\u30fc\u30c6\u30a3\u30f3\u30b0\u547d\u306a\u4eba\u306fGTX 480\u3092\u3001\u8ca1\u5e03\u304c\u8a31\u305b\u3070Tesla\u3092\u8cb7\u3063\u305f\u307b\u3046\u304c\u3088\u308d\u3057\u3044\u304b\u3068\u3002<\/p>\n<p>\u3053\u3053\u307e\u3067\u66f8\u3044\u3066\u601d\u3063\u305f\u3093\u3067\u3059\u304c\u3001GTX 460\uff08GF104)\u306fSM\u3042\u305f\u308a\u306eCUDA Core\u6570\u304c\u5f93\u6765\u306e32\u500b\u304b\u308948\u500b\u3078\u5897\u52a0\u3057\u3066\u308b\uff08\u69cb\u6210\u304c\u5909\u308f\u3063\u3066\u308b\uff09\u306e\u3067\u3001\u30b3\u30f3\u30d1\u30a4\u30eb\u3057\u306a\u304a\u3057\u3066\u6700\u9069\u5316\u3059\u308b\u3068\u3082\u3063\u3068\u65e9\u304f\u306a\u308b\u306e\u304b\u3082\u3057\u308c\u307e\u305b\u3093\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u3082\u3046\u3059\u3050\u8a95\u751f\u65e5\u3068\u3044\u3046\u3053\u3068\u3067\uff08\u6ce3\u3001GTX 460\u3092\u8cb7\u3063\u3066\u3044\u305f\u3060\u304d\u307e\u3057\u305f\u306e\u3067\uff08\u559c\u3001\u4eca\u307e\u3067\u4f7f\u3063\u3066\u305fGTX 260\u3068\u306e\u6027\u80fd\u6bd4\u8f03\u3092\u3057\u307e\u3059\u3002<br \/>\n\u5358\u306b3D\u63cf\u753b\u306e\u30d9\u30f3\u30c1\u3068\u3063\u3066\u3082\u4ed6\u3067\u6563\u3005\u3084\u3089\u308c\u3066\u308b\u3057\u3064\u307e\u3089\u306a\u3044\u306e\u3067\uff08\u672c\u5f53\u306f\u3061\u3083\u3093\u3068\u30d9\u30f3\u30c1\u3092\u3068\u308b\u582a\u3048\u6027\u304c\u306a\u3044\u3060\u3051w\uff09\u3001NVIDIA\u306e\u30b5\u30a4\u30c8\u304b\u3089CUDA Toolkit\u3068\u30b5\u30f3\u30d7\u30eb\u30b3\u30fc\u30c9\u3092\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3057\u3066\u5b9f\u884c\u7d50\u679c\u3092\u3066\u304d\u3068\u3046\u306b\u898b\u3066\u3044\u304d\u307e\u3059\u3002<\/p>\n<p>\u4f7f\u7528\u3057\u305f\u74b0\u5883\u306f\u4ee5\u4e0b\u306e\u3068\u304a\u308a\u3002<\/p>\n<ul>\n<li>CPU: Intel Core 2 Duo E8400<\/li>\n<li>\u30de\u30b6\u30fc\u30dc\u30fc\u30c9: Gigabyte Technology Co., Ltd. GA-E7AUM-DS2H (NVIDIA GeForce 9400, BIOS F4)<\/li>\n<li>\u30e1\u30a4\u30f3\u30e1\u30e2\u30ea: PC2-6400 2GB x 2<\/li>\n<li>HDD: \u578b\u756a\u5931\u5ff5\uff08IBM\u306e1U\u30b5\u30fc\u30d0\u30fc\u304b\u3089\u629c\u3044\u305f80GB\u306e3.5\u30a4\u30f3\u30c1SATA\u30c9\u30e9\u30a4\u30d6\uff09x 2 \u306b\u3088\u308b\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2RAID 0<\/li>\n<li>OS: Windows 7 Ultimate 64bit<\/li>\n<li>\u30b0\u30e9\u30d5\u30a3\u30c3\u30af\u30b9\u30c9\u30e9\u30a4\u30d0: GeForce Driver 258.96<\/li>\n<li>CUDA Toolkit: \u30d0\u30fc\u30b8\u30e7\u30f33.1<\/li>\n<li>GTX 260: Galaxy GF PGTX260+\/896D3<\/li>\n<li>GTX 460: ASUS ENGTX460 DirectCU\/2DI\/1GD5<\/li>\n<\/ul>\n<p>\u30d9\u30fc\u30b9\u304c\u53e4\u3044\u3067\u3059\u306d\u3047\uff08\u6c57\u3000\u4e00\u5fdcClarkdale Xeon + H57\u306e\u30de\u30b7\u30f3\u3082\u3042\u308b\u306e\u3067\u3059\u304c\u9762\u5012\u306a\u306e\u3067\u653e\u7f6e\u4e2d\u3002<\/p>\n<h5>Device Query<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nDevice 0: &quot;GeForce GTX 260&quot;\r\n  CUDA Driver Version:                           3.10\r\n  CUDA Runtime Version:                          3.10\r\n  CUDA Capability Major revision number:         1\r\n  CUDA Capability Minor revision number:         3\r\n  Total amount of global memory:                 922091520 bytes\r\n  Number of multiprocessors:                     27\r\n  Number of cores:                               216\r\n  Total amount of constant memory:               65536 bytes\r\n  Total amount of shared memory per block:       16384 bytes\r\n  Total number of registers available per block: 16384\r\n  Warp size:                                     32\r\n  Maximum number of threads per block:           512\r\n  Maximum sizes of each dimension of a block:    512 x 512 x 64\r\n  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1\r\n  Maximum memory pitch:                          2147483647 bytes\r\n  Texture alignment:                             256 bytes\r\n  Clock rate:                                    1.24 GHz\r\n  Concurrent copy and execution:                 Yes\r\n  Run time limit on kernels:                     Yes\r\n  Integrated:                                    No\r\n  Support host page-locked memory mapping:       Yes\r\n  Compute mode:                                  Default (multiple host threads\r\ncan use this device simultaneously)\r\n  Concurrent kernel execution:                   No\r\n  Device has ECC support enabled:                No\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nDevice 0: &quot;GeForce GTX 460&quot;\r\n  CUDA Driver Version:                           3.10\r\n  CUDA Runtime Version:                          3.10\r\n  CUDA Capability Major revision number:         2\r\n  CUDA Capability Minor revision number:         1\r\n  Total amount of global memory:                 1041694720 bytes\r\n  Number of multiprocessors:                     7\r\n  Number of cores:                               224\r\n  Total amount of constant memory:               65536 bytes\r\n  Total amount of shared memory per block:       49152 bytes\r\n  Total number of registers available per block: 32768\r\n  Warp size:                                     32\r\n  Maximum number of threads per block:           1024\r\n  Maximum sizes of each dimension of a block:    1024 x 1024 x 64\r\n  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1\r\n  Maximum memory pitch:                          2147483647 bytes\r\n  Texture alignment:                             512 bytes\r\n  Clock rate:                                    0.81 GHz\r\n  Concurrent copy and execution:                 Yes\r\n  Run time limit on kernels:                     Yes\r\n  Integrated:                                    No\r\n  Support host page-locked memory mapping:       Yes\r\n  Compute mode:                                  Default (multiple host threads\r\ncan use this device simultaneously)\r\n  Concurrent kernel execution:                   Yes\r\n  Device has ECC support enabled:                No\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\nshared memory \u3084 registers\u306e\u5897\u52a0\u3092\u898b\u308b\u9650\u308a\u6027\u80fd\u304c\u4e0a\u304c\u3063\u3066\u305d\u3046\u3067\u3059\u3002<br \/>\ndimension of a block\u3082\u5897\u52a0\u3057\u3066\u3066\u3059\u3070\u3089\u3057\u3044\u3002\u305f\u3060\u3057512 x 512 x 64\u4ee5\u4e0b\u3067\u6c7a\u3081\u6253\u3061\u3067\u66f8\u304b\u308c\u305f\u65e2\u5b58\u306e\u30d7\u30ed\u30b0\u30e9\u30e0\u306f\u5b9f\u884c\u52b9\u7387\u304c\u60aa\u304f\u306a\u308b\u304b\u3082\u3002<br \/>\n\u30b3\u30a2\u6570\u306e224\u3068\u3044\u3046\u306e\u306f336\u306e\u9593\u9055\u3044\u3060\u3068\u601d\u3046\u3093\u3067\u3059\u304c\u3001\u3053\u306e224\u3068\u3044\u3046\u6570\u5b57GPU-Z 0.4.4\u3067\u3082\u51fa\u3066\u304f\u308b\u3093\u3067\u3059\u3088\u306d\u3002<br \/>\n\u3042\u3068Clock rate\u306e0.81 GHz\u3068\u3044\u3046\u306e\u3082\u6c17\u306b\u306a\u308a\u307e\u3059\u304c\u3001\u8ca0\u8377\u3092\u304b\u3051\u308b\u3068\u3061\u3083\u3093\u30681.35GHz\u307e\u3067\u4e0a\u304c\u308b\u3088\u3046\u3067\u3059\u3002<\/p>\n<h5>Bandwidth Test<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n Host to Device Bandwidth, 1 Device(s), Paged memory\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     1892.9\r\n\r\n Device to Host Bandwidth, 1 Device(s), Paged memory\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     1647.0\r\n\r\n Device to Device Bandwidth, 1 Device(s)\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     91015.4\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n Host to Device Bandwidth, 1 Device(s), Paged memory\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     1826.8\r\n\r\n Device to Host Bandwidth, 1 Device(s), Paged memory\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     1785.1\r\n\r\n Device to Device Bandwidth, 1 Device(s)\r\n   Transfer Size (Bytes)        Bandwidth(MB\/s)\r\n   33554432                     59815.6\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\nGTX 460\u306f260\u306b\u6bd4\u3079\u3066\u30d0\u30b9\u5e45\u306f\u6e1b\u3063\u3066\u307e\u3059\u304c\u30e1\u30e2\u30ea\u30af\u30ed\u30c3\u30af\u304c\u305d\u308c\u4ee5\u4e0a\u306b\u4e0a\u304c\u3063\u305f\u305f\u3081\u3001\u5358\u7d14\u8a08\u7b97\u3067\u306f\u5e2f\u57df\u5e45\u306f\u5fae\u5999\u306b\u5897\u3048\u3066\u3044\u308b\u306f\u305a\u3067\u3059\u304c\u3001Device to Device\u304c\u3084\u305f\u3089\u3068\u9045\u3044\u3067\u3059\u306d\u3002\u306a\u3093\u3060\u308d\u3046\u3053\u308c\u306f\u3002<\/p>\n<h5>Simple Multi Copy and Conpute<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nMeasured timings (throughput):\r\n Memcpy host to device  : 3.138688 ms (5.345296 GB\/s)\r\n Memcpy device to host  : 3.127328 ms (5.364713 GB\/s)\r\n Kernel                 : 1.700704 ms (98.648655 GB\/s)\r\n\r\nTheoretical limits for overlaps (* capability of this device):\r\n          c &lt;  1.0      : 7.966720 ms (No overlap, fully serial)\r\n * 1.1 &lt;= c &lt;  2.0      : 6.266016 ms (Compute overlaps with one memcopy)\r\n          c &gt;= 2.0      : 3.138688 ms (Compute overlaps with two memcopies)\r\n\r\nAverage measured timings over 10 repetitions:\r\n Avg. time when execution fully serialized      : 8.540992 ms\r\n Avg. time when overlapped using 4 streams      : 6.622292 ms\r\n Avg. latency hidden (serialized - overlapped)  : 1.918700 ms\r\n\r\nMeasured throughput:\r\n Fully serialized execution             : 3.928634 GB\/s\r\n Overlapped using 4 streams             : 5.066891 GB\/s\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nMeasured timings (throughput):\r\n Memcpy host to device  : 3.040512 ms (5.517892 GB\/s)\r\n Memcpy device to host  : 2.953472 ms (5.680506 GB\/s)\r\n Kernel                 : 0.894176 ms (187.627669 GB\/s)\r\n\r\nTheoretical limits for overlaps (* capability of this device):\r\n          c &lt;  1.0      : 6.888160 ms (No overlap, fully serial)\r\n * 1.1 &lt;= c &lt;  2.0      : 5.993984 ms (Compute overlaps with one memcopy)\r\n          c &gt;= 2.0      : 3.040512 ms (Compute overlaps with two memcopies)\r\n\r\nAverage measured timings over 10 repetitions:\r\n Avg. time when execution fully serialized      : 6.694787 ms\r\n Avg. time when overlapped using 4 streams      : 5.950045 ms\r\n Avg. latency hidden (serialized - overlapped)  : 0.744742 ms\r\n\r\nMeasured throughput:\r\n Fully serialized execution             : 5.012024 GB\/s\r\n Overlapped using 4 streams             : 5.639358 GB\/s\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u3053\u3061\u3089\u306fBandwidth Test\u3068\u9055\u3063\u3066\u3061\u3083\u3093\u3068\u5e2f\u57df\u5e45\u306e\u673a\u4e0a\u8a08\u7b97\u5024\u306b\u8fd1\u3044\u5dee\u304c\u51fa\u3066\u308b\u3088\u3046\u3067\u3059\u3002<\/p>\n<h5>Pitch Linear Texture<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nBandwidth (GB\/s) for pitch linear: 6.35e+001; for array: 6.42e+001\r\n\r\nTexture fetch rate (Mpix\/s) for pitch linear: 7.94e+003; for array: 8.02e+003\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nBandwidth (GB\/s) for pitch linear: 2.48e+001; for array: 2.48e+001\r\n\r\nTexture fetch rate (Mpix\/s) for pitch linear: 3.10e+003; for array: 3.10e+003\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\nGTX 460\u306f260\u3088\u308a\u30c6\u30af\u30b9\u30c1\u30e3\u30e6\u30cb\u30c3\u30c8\u6570\u304c\u5c11\u306a\u3044\u305f\u3081\u3053\u308c\u3060\u3051\u306e\u5dee\u304c\u51fa\u3066\u3057\u307e\u3063\u305f\u3093\u3067\u3057\u3087\u3046\u304b\u3002<\/p>\n<h5>Simple Texture<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nLoaded 'lena_bw.pgm', 512 x 512 pixels\r\nProcessing time: 0.183976 (ms)\r\n1424.88 Mpixels\/sec\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nLoaded 'lena_bw.pgm', 512 x 512 pixels\r\nProcessing time: 0.300369 (ms)\r\n872.74 Mpixels\/sec\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u3046\u3093\u3002\u3084\u3063\u3071\u30c6\u30af\u30b9\u30c1\u30e3\u7cfb\u306f\u9045\u3044\u3063\u307d\u3044w<\/p>\n<h5>Aligned Types<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nCUDA device &#x5B;GeForce GTX 260] has 27 Multi-Processors\r\nSM scaling value = 1.00\r\n&gt; Memory Size = 49999872\r\nAllocating memory...\r\nGenerating host input data array...\r\nUploading input data to GPU memory...\r\nTesting misaligned types...\r\nuint8...\r\nAvg. time: 4.047190 ms \/ Copy throughput: 11.505764 GB\/s.\r\n        TEST OK\r\nuint16...\r\nAvg. time: 2.135780 ms \/ Copy throughput: 21.802813 GB\/s.\r\n        TEST OK\r\nRGBA8_misaligned...\r\nAvg. time: 5.493528 ms \/ Copy throughput: 8.476522 GB\/s.\r\n        TEST OK\r\nLA32_misaligned...\r\nAvg. time: 2.169678 ms \/ Copy throughput: 21.462173 GB\/s.\r\n        TEST OK\r\nRGB32_misaligned...\r\nAvg. time: 3.312490 ms \/ Copy throughput: 14.057707 GB\/s.\r\n        TEST OK\r\nRGBA32_misaligned...\r\nAvg. time: 4.400230 ms \/ Copy throughput: 10.582630 GB\/s.\r\n        TEST OK\r\nTesting aligned types...\r\nRGBA8...\r\nAvg. time: 1.295567 ms \/ Copy throughput: 35.942574 GB\/s.\r\n        TEST OK\r\nI32...\r\nAvg. time: 1.266223 ms \/ Copy throughput: 36.775511 GB\/s.\r\n        TEST OK\r\nLA32...\r\nAvg. time: 1.096871 ms \/ Copy throughput: 42.453501 GB\/s.\r\n        TEST OK\r\nRGB32...\r\nAvg. time: 1.322788 ms \/ Copy throughput: 35.202927 GB\/s.\r\n        TEST OK\r\nRGBA32...\r\nAvg. time: 1.295364 ms \/ Copy throughput: 35.948197 GB\/s.\r\n        TEST OK\r\nRGBA32_2...\r\nAvg. time: 2.611731 ms \/ Copy throughput: 17.829558 GB\/s.\r\n        TEST OK\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nCUDA device &#x5B;GeForce GTX 460] has 7 Multi-Processors\r\nSM scaling value = 3.43\r\n&gt; Memory Size = 14583296\r\nAllocating memory...\r\nGenerating host input data array...\r\nUploading input data to GPU memory...\r\nTesting misaligned types...\r\nuint8...\r\nAvg. time: 0.907252 ms \/ Copy throughput: 14.970209 GB\/s.\r\n        TEST OK\r\nuint16...\r\nAvg. time: 0.512793 ms \/ Copy throughput: 26.485826 GB\/s.\r\n        TEST OK\r\nRGBA8_misaligned...\r\nAvg. time: 0.612984 ms \/ Copy throughput: 22.156784 GB\/s.\r\n        TEST OK\r\nLA32_misaligned...\r\nAvg. time: 0.376848 ms \/ Copy throughput: 36.040365 GB\/s.\r\n        TEST OK\r\nRGB32_misaligned...\r\nAvg. time: 0.489722 ms \/ Copy throughput: 27.733607 GB\/s.\r\n        TEST OK\r\nRGBA32_misaligned...\r\nAvg. time: 0.654871 ms \/ Copy throughput: 20.739572 GB\/s.\r\n        TEST OK\r\nTesting aligned types...\r\nRGBA8...\r\nAvg. time: 0.364849 ms \/ Copy throughput: 37.225730 GB\/s.\r\n        TEST OK\r\nI32...\r\nAvg. time: 0.348742 ms \/ Copy throughput: 38.944976 GB\/s.\r\n        TEST OK\r\nLA32...\r\nAvg. time: 0.315452 ms \/ Copy throughput: 43.054914 GB\/s.\r\n        TEST OK\r\nRGB32...\r\nAvg. time: 0.318769 ms \/ Copy throughput: 42.606860 GB\/s.\r\n        TEST OK\r\nRGBA32...\r\nAvg. time: 0.319548 ms \/ Copy throughput: 42.503039 GB\/s.\r\n        TEST OK\r\nRGBA32_2...\r\nAvg. time: 0.518223 ms \/ Copy throughput: 26.208342 GB\/s.\r\n        TEST OK\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u3053\u306e\u30c6\u30b9\u30c8\u3067\u306f\u78ba\u5b9f\u306b460\u306e\u307b\u3046\u304c\u901f\u3044\u3088\u3046\u3067\u3059\u3002<\/p>\n<h5>Post-Process In OpenGL<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nradius=16\r\nCUDA GL Post Processing (512 x 512): 28.7 fps\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nradius=16\r\nCUDA GL Post Processing (512 x 512): 20.1 fps\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u304f\u308b\u304f\u308b\u56de\u308b\u30c6\u30a3\u30fc\u30dd\u30c3\u30c8\u306b\u30dd\u30b9\u30c8\u30d7\u30ed\u30bb\u30b9\u3067\u307c\u304b\u3057\u3092\u5165\u308c\u308b\u30e4\u30c4\u3067\u3059\u306d\u3002GTX 260\u304c\u901f\u3044\u3068\u3044\u3046\u7d50\u679c\u306b\uff08\u6c57<\/p>\n<h5>Fluids (Direct3D Version)<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n1404.2fps\u301c1442.9fps\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n1441.7fps\u301c1535.0fps\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u7dd1\u306e\u7c92\u5b50\u304c\u3046\u3054\u3081\u304f\u30e4\u30c4\u3067\u3059\u306d\u3002\u3053\u3061\u3089\u306fGTX 460\u306e\u52dd\u5229\u3002<\/p>\n<h5>MersenneTwister<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nInitializing data for 24000000 samples...\r\nLoading CPU and GPU twisters configurations...\r\nGenerating random numbers on GPU...\r\n\r\nMersenneTwister, Throughput = 1.6773 GNumbers\/s, Time = 0.01431 s, Size = 240025\r\n60 Numbers, NumDevsUsed = 1, Workgroup = 128\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nInitializing data for 24000000 samples...\r\nLoading CPU and GPU twisters configurations...\r\nGenerating random numbers on GPU...\r\n\r\nMersenneTwister, Throughput = 2.2123 GNumbers\/s, Time = 0.01085 s, Size = 240025\r\n60 Numbers, NumDevsUsed = 1, Workgroup = 128\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\n\u3053\u3061\u3089\u306fGTX 460\u304c\u901f\u3044\u3067\u3059\u306d\u3002<\/p>\n<h5>Mandelbrot<\/h5>\n<p>GTX 260<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nanimate colors on\r\nHardware Single Precision 60fps\u524d\u5f8c\r\nEmulated Double-Single Precision 32fps\u524d\u5f8c\r\nHardware Double Precision 56fps\u524d\u5f8c\r\n<\/pre>\n<p>GTX 460<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nanimate colors on\r\nHardware Single Precision 68fps\u524d\u5f8c\r\nEmulated Double-Single Precision 26fps\u524d\u5f8c\r\nHardware Double Precision 45fps\u524d\u5f8c\r\n<\/pre>\n<p>\u30b3\u30e1\u30f3\u30c8<br \/>\nHardware Single Precision\u3067\u52dd\u3063\u3066\u308b\u306e\u306bHardware Double Precision\u3067\u8ca0\u3051\u308b\u306e\u306fGTX 460\uff08GF104\uff09\u306e\u8a2d\u8a08\u4e0a\u3001\u500d\u7cbe\u5ea6\u6f14\u7b97\u306e\u30d4\u30fc\u30af\u6027\u80fd\u304c\u5358\u7cbe\u5ea6\u306e1\/12\u306b\u6291\u3048\u3089\u308c\u3066\u308b\u304b\u3089\u3067\u3057\u3087\u3046\u304b\u3002<\/p>\n<h5>\u307e\u3068\u3081<\/h5>\n<p>GTX 460\u306f260\u306b\u6bd4\u3079\u3066\u30c6\u30af\u30b9\u30c1\u30e3\u30e6\u30cb\u30c3\u30c8\u304c\u5c11\u306a\u3044\u305b\u3044\u304b\u3001\u30c6\u30af\u30b9\u30c1\u30e3\u7cfb\u306e\u51e6\u7406\u304c\u9045\u3044\u3088\u3046\u3067\u3059\u3002<br \/>\n\u305f\u3060\u3057\u305d\u308c\u4ee5\u5916\u306e\u70b9\u3067\u306f\u78ba\u5b9f\u306b\u6027\u80fdUP\u3057\u6c4e\u7528\u8a08\u7b97\u5411\u304d\u306b\u306a\u3063\u3066\u308b\u3088\u3046\u3067\u3059\u3002<br \/>\n\u3057\u304b\u3057\u305d\u3046\u306f\u8a00\u3063\u3066\u3082\u5143\u3005\u601d\u3044\u5207\u3063\u3066\u6c4e\u7528\u8a08\u7b97\u306b\u632f\u3063\u305fGF100\u304b\u3089\u30b2\u30fc\u30e0\u5411\u304d\u306b\u30d0\u30e9\u30f3\u30b9\u3092\u3068\u308a\u306a\u304a\u3057\u305f\u306e\u304cGF104\u306a\u306e\u3067\u3001GPU\u30b3\u30f3\u30d4\u30e5\u30fc\u30c6\u30a3\u30f3\u30b0\u547d\u306a\u4eba\u306fGTX 480\u3092\u3001\u8ca1\u5e03\u304c\u8a31\u305b\u3070Tesla\u3092\u8cb7\u3063\u305f\u307b\u3046\u304c\u3088\u308d\u3057\u3044\u304b\u3068\u3002<\/p>\n<p>\u3053\u3053\u307e\u3067\u66f8\u3044\u3066\u601d\u3063\u305f\u3093\u3067\u3059\u304c\u3001GTX 460\uff08GF104)\u306fSM\u3042\u305f\u308a\u306eCUDA Core\u6570\u304c\u5f93\u6765\u306e32\u500b\u304b\u308948\u500b\u3078\u5897\u52a0\u3057\u3066\u308b\uff08\u69cb\u6210\u304c\u5909\u308f\u3063\u3066\u308b\uff09\u306e\u3067\u3001\u30b3\u30f3\u30d1\u30a4\u30eb\u3057\u306a\u304a\u3057\u3066\u6700\u9069\u5316\u3059\u308b\u3068\u3082\u3063\u3068\u65e9\u304f\u306a\u308b\u306e\u304b\u3082\u3057\u308c\u307e\u305b\u3093\u3002<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[289,343,347,348,346,345,344,288],"class_list":["post-776","post","type-post","status-publish","format-standard","hentry","category-tech","tag-cuda","tag-geforce","tag-gf104","tag-gpgpu","tag-gt200","tag-gtx-260","tag-gtx-460","tag-nvidia"],"_links":{"self":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts\/776","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/comments?post=776"}],"version-history":[{"count":0,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts\/776\/revisions"}],"wp:attachment":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/media?parent=776"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/categories?post=776"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/tags?post=776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}