{"id":2596,"date":"2013-05-26T00:58:38","date_gmt":"2013-05-25T15:58:38","guid":{"rendered":"http:\/\/peta.okechan.net\/blog\/?p=2596"},"modified":"2016-03-03T16:53:37","modified_gmt":"2016-03-03T07:53:37","slug":"2596","status":"publish","type":"post","link":"https:\/\/peta.okechan.net\/blog\/archives\/2596","title":{"rendered":"CUDA\u3067atomicAdd\u3092\u6e1b\u3089\u3057\u3066\u9ad8\u901f\u5316"},"content":{"rendered":"<p><ins datetime=\"2013-05-26T04:56:52+00:00\">\u3053\u306e\u8a18\u4e8b\u306b\u306f\u8aa4\u308a\u304c\u3042\u308a\u307e\u3059\u3002\u305d\u308c\u306b\u3064\u3044\u3066\u6700\u5f8c\u306e\u307b\u3046\u306b\u8ffd\u8a18\u304c\u3042\u308a\u307e\u3059\u3002<\/ins><\/p>\n<p><a href=\"https:\/\/peta.okechan.net\/blog\/archives\/2499\" title=\"CUDA\u3092\u4f7f\u3063\u3066\u30ec\u30a4\u3068\u30b7\u30fc\u30f3\u3092\u69cb\u6210\u3059\u308b\u30dd\u30ea\u30b4\u30f3\u3068\u306e\u4ea4\u5dee\u5224\u5b9a\u3092\u3059\u308b\u30b3\u30fc\u30c9\u3092\u66f8\u3044\u3066\u307f\u305f\">\u4f8b\u306eGPU\u30ec\u30a4\u30ad\u30e3\u30b9\u30c8<\/a>\u306e\u30ab\u30fc\u30cd\u30eb\u3092\u5f04\u3063\u3066\u305f\u3089\u3001Geforce 9400M \u3067100\u4e07\u30ec\u30a4\u306e\u51e6\u7406\u306b1.7\u79d2\u3050\u3089\u3044\u639b\u304b\u3063\u3066\u305f\u306e\u304c0.9\u79d2\u3050\u3089\u3044\u306b\u77ed\u7e2e\u3057\u305f\u3002<\/p>\n<p>\u30ec\u30a4\u3054\u3068\u306bBVH\u3092\u30c8\u30e9\u30d0\u30fc\u30b9\u3059\u308b\u30ab\u30fc\u30cd\u30eb\u3067\u3001\u672b\u7aef\u30ce\u30fc\u30c9\u304c\u898b\u3064\u304b\u3063\u305f\u3068\u304d\u306b\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306b\u5bfe\u3057\u3066atomicAdd\u3092\u3057\u3066\u308b\u90e8\u5206\u304c\u3042\u3063\u305f\u3051\u3069\u3001\u305d\u308c\u304c\u9045\u304b\u3063\u305f\u307f\u305f\u3044\u3002<\/p>\n<p>\u5171\u6709\u30e1\u30e2\u30ea\u306b1warp\u5206\u3001\u3064\u307e\u308a32\u500b\u306e\u8981\u7d20\u3092\u6301\u3064int\u306e\u914d\u5217\u3092\u78ba\u4fdd\u3057\u3001atomicAdd\u3092\u3057\u3066\u305f\u90e8\u5206\u3067\u5404\u30b9\u30ec\u30c3\u30c9\u3067\u81ea\u5206\u304c\u8a72\u5f53\u3059\u308b\u5834\u6240\u306b1\u3092\u30bb\u30c3\u30c8\u3057\u3001\u6700\u5f8c\u306b0\u756a\u76ee\u306e\u30b9\u30ec\u30c3\u30c9\u3067\u96c6\u8a08\u3057\u30661warp\u30671\u56de\u3060\u3051atomicAdd\u3092\u5b9f\u884c\u3059\u308b\u3088\u3046\u306b\u3057\u305f\u3089\u30012\u500d\u8fd1\u304f\u9ad8\u901f\u5316\u3057\u305f\u3002<br \/>\n\u57fa\u672c\u7684\u306a\u9ad8\u901f\u5316\u624b\u6cd5\u3060\u3051\u3069\u3053\u3053\u307e\u3067\u52b9\u679c\u304c\u3042\u308b\u3068\u306f\u601d\u3063\u3066\u306a\u304b\u3063\u305f\u306e\u3067\u5f8c\u56de\u3057\u306b\u3057\u3066\u305f\u3002<br \/>\n\u4ea4\u5dee\u5224\u5b9a\u306e\u65b9\u306e\u30ab\u30fc\u30cd\u30eb\u306f\u305d\u306e\u307e\u307e\u306a\u306e\u3067\u3001\u30c8\u30e9\u30d0\u30fc\u30b9\u306e\u30ab\u30fc\u30cd\u30eb\u3060\u3051\u3067\u3044\u3046\u30682\u500d\u4ee5\u4e0a\u9ad8\u901f\u5316\u3057\u3066\u308b\u304b\u3082\u3057\u308c\u306a\u3044\u3002<\/p>\n<p>\u5909\u66f4\u3057\u305f\u90e8\u5206\u306f\u4ee5\u4e0b\u306e\u3068\u304a\u308a\u3002<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u5916\u3067\u30de\u30af\u30ed\u5b9a\u7fa9\r\n#define WARP_SIZE 32\r\n#define WARP_INDEX (threadIdx.x % WARP_SIZE)\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u306f\u3058\u3081\u306e\u307b\u3046\r\n__shared__ int staged&#x5B;WARP_SIZE];\r\nstaged&#x5B;WARP_INDEX] = 0;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u76f4\u63a5\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306batomicAdd\u3057\u3066\u305f\u90e8\u5206\r\nstaged&#x5B;WARP_INDEX] = 1;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u6700\u5f8c\u306e\u307b\u3046\r\n__syncthreads();\r\n\u2026\u7701\u7565\u2026\r\nif (WARP_INDEX == 0) {\r\n    int sum = 0;\r\n    for (int i = 0; i &lt; WARP_SIZE; i++) sum += staged&amp;#91;i&amp;#93;;\r\n    if (sum &gt; 0) atomicAdd(stagedCount, sum);\r\n}<\/pre>\n<p>Geforce 9400M\u304cCompute Capability 1.1\u6b62\u307e\u308a\u3060\u304b\u3089\u5c11\u3057\u56de\u308a\u304f\u3069\u3044\u66f8\u304d\u65b9\u306b\u306a\u3063\u3066\u308b\u3051\u3069\u30011.2\u4ee5\u4e0a\u306a\u3089\u5171\u6709\u30e1\u30e2\u30ea\u306b\u5bfe\u3057\u3066\u76f4\u63a5atomicAdd\u304c\u547c\u3079\u308b\u306e\u3067\u3082\u3063\u3068\u30b9\u30c3\u30ad\u30ea\u66f8\u3051\u308b\u3068\u601d\u3046\u3002\uff08\u53b3\u5bc6\u306b\u306f\u540c\u3058\u51e6\u7406\u5185\u5bb9\u306b\u306f\u306a\u3089\u306a\u3044\u3051\u3069\uff09<\/p>\n<p>GTX 460\u3067\u8a66\u3057\u3066\u307f\u305f\u3089\u3001\u30c7\u30d0\u30a4\u30b9\u30e1\u30e2\u30ea\u306e\u5e2f\u57df\u5e45\u304c9400M\u3068\u6bd4\u3079\u3066\u5e83\u3044\u304b\u3089\u304b\u3001\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306b\u5bfe\u3059\u308batomicAdd\u304c\u52b9\u7387\u5316\u3055\u308c\u3066\u308b\u304b\u3089\u304b\u30010.06\u79d2\u304c0.055\u79d2\u306b\u306a\u308b\u7a0b\u5ea6\u3060\u3063\u305f\u3002<\/p>\n<p>\u3044\u307e\u306e\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u7684\u306b\u306f\u3001\u4ea4\u5dee\u5224\u5b9a\u304c\u5fc5\u8981\u306a\u3068\u304d\u306bstagedCount\u304c\u5168\u4f53\u3068\u3057\u30661\u4ee5\u4e0a\u306b\u306a\u308a\u3055\u3048\u3059\u308c\u3070\u3088\u304f\u3001\u6b63\u78ba\u306a\u5024\u3067\u3042\u308b\u5fc5\u8981\u306f\u306a\u3044\u306e\u3067\u3001\u4e0a\u306e\u30b3\u30fc\u30c9\u306f\u3082\u3063\u3068\u52b9\u7387\u5316\u51fa\u6765\u308b\u3002<br \/>\n\u4f8b\u3048\u3070__shared__ int staged[WARP_SIZE];\u306e\u4ee3\u308f\u308a\u306b\u3001__shared__ bool staged;\u3068\u3057\u3066\uff08\u914d\u5217\u3058\u3083\u306a\u304f\u3059\u308b\uff09\u3001\u5404\u30b9\u30ec\u30c3\u30c9\u304b\u3089atomicAdd\u306e\u4ee3\u308f\u308a\u306bstaged\u306btrue\u3092\u66f8\u304d\u8fbc\u307f\u3001\u6700\u5f8c\u306b0\u756a\u76ee\u306e\u30b9\u30ec\u30c3\u30c9\u3067staged\u304ctrue\u306e\u5834\u5408\u306b\u5b9f\u969b\u306batomicAdd\u3059\u308b\u3088\u3046\u306b\u3059\u308c\u3070\u3044\u3044\u3002<br \/>\n<ins datetime=\"2013-06-11T07:07:20+00:00\">\uff08\u305d\u3046\u3059\u308b\u3068\u5909\u6570\u540d\u3068\u3057\u3066stagedCount\u3068\u3044\u3046\u540d\u524d\u306f\u3075\u3055\u308f\u3057\u304f\u306a\u304f\u306a\u3063\u3066\u3057\u307e\u3046\u304c\u3001\u4ee3\u308f\u308a\u306e\u3044\u3044\u540d\u524d\u304c\u601d\u3044\u3064\u304b\u306a\u3044\uff09<\/ins><\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u5916\u3067\u30de\u30af\u30ed\u5b9a\u7fa9\r\n#define WARP_SIZE 32\r\n#define WARP_INDEX (threadIdx.x % WARP_SIZE)\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u306f\u3058\u3081\u306e\u307b\u3046\r\n__shared__ bool staged;\r\nstaged = false;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u76f4\u63a5\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306batomicAdd\u3057\u3066\u305f\u90e8\u5206\r\nstaged = true;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u6700\u5f8c\u306e\u307b\u3046\r\n__syncthreads();\r\n\u2026\u7701\u7565\u2026\r\nif (WARP_INDEX == 0) {\r\n    if (staged) atomicAdd(stagedCount, 1);\r\n}<\/pre>\n<p>\u305f\u3060\u3001\u3053\u306e\u65b9\u6cd5\u3067\u306f\u5171\u6709\u30e1\u30e2\u30ea\u306e\u540c\u3058\u5834\u6240\u306b\u8907\u6570\u306e\u30b9\u30ec\u30c3\u30c9\u304b\u3089\u540c\u6642\u306b\u66f8\u304d\u8fbc\u307e\u308c\u308b\u53ef\u80fd\u6027\u304c\u3042\u308a\u3001\u307e\u3041\u540c\u3058\u5024\u3092\u66f8\u304d\u8fbc\u3080\u3093\u306a\u3089\u5927\u4e08\u592b\u3060\u3068\u601d\u3046\u3051\u3069\u3001\u5c11\u3005\u4e0d\u5b89\u306a\u65b9\u6cd5\u3067\u3082\u3042\u308b\u3002<br \/>\n\u5b9f\u969b\u30019400M\uff08Mac\uff09\u3068GTX 460\uff08Ubuntu\uff09\u3067\u5168\u4f53\u7684\u306astagedCount\u306e\u5024\u306b\u9055\u3044\u304c\u51fa\u305f\u3002<br \/>\n\u6700\u7d42\u7684\u306a\u51fa\u529b\u753b\u50cf\u306b\u9055\u3044\u306f\u7121\u304b\u3063\u305f\u3051\u3069\u3001\u914d\u5217\u3092\u4f7f\u3046\u5834\u5408\u3068\u6bd4\u3079\u3066\u307b\u3068\u3093\u3069\u51e6\u7406\u6642\u9593\u306b\u5dee\u304c\u51fa\u306a\u304b\u3063\u305f\u3002<br \/>\n\u305f\u3060\u3001\u5171\u6709\u30e1\u30e2\u30ea\u306e\u7bc0\u7d04\u306b\u306f\u306a\u308b\u3068\u601d\u3046\u306e\u3067\u3001\u3053\u3053\u307e\u3067\u3084\u308b\u306e\u3082\u7121\u610f\u5473\u3067\u306f\u306a\u3044\u3068\u601d\u3046\u3002<\/p>\n<div class=\"ins\" datetime=\"2013-05-26T04:56:52+00:00\">\u4ee5\u4e0b\u8ffd\u8a18<br \/>\n\u4e0a\u306e2\u3064\u306e\u4f8b\u306f\u3001\u300c\u5171\u6709\u30e1\u30e2\u30ea\u306fwarp\u5358\u4f4d\u3067\u5171\u6709\u3055\u308c\u308b\u300d\u3068\u3044\u3046\u52d8\u9055\u3044\u306b\u57fa\u3065\u3044\u3066\u8003\u3048\u305f\u3082\u306e\u306a\u306e\u3067\u6b63\u3057\u304f\u306a\u3044\u3002<br \/>\n\u300c\u5171\u6709\u30e1\u30e2\u30ea\u306f\u30d6\u30ed\u30c3\u30af\u5358\u4f4d\u3067\u5171\u6709\u3055\u308c\u308b\u300d\u3068\u3044\u3046\u306e\u304c\u6b63\u3057\u3044\u3002<br \/>\n1\u756a\u76ee\u306e\u4f8b\u3067\u306f\u5076\u7136\u554f\u984c\u304c\u9855\u5728\u5316\u3057\u3066\u306a\u3044\u304c\u30012\u756a\u76ee\u306e\u4f8b\u3067\u306fstagedCount\u306e\u5024\u306e\u9055\u3044\u3068\u3057\u3066\u73fe\u308c\u3066\u3044\u308b\u3002<br \/>\nGPU\u306b\u3088\u3063\u30661\u3064\u306eSM\u5185\u3067\u540c\u6642\u306b\u5b9f\u884c\u3055\u308c\u308b\u30b9\u30ec\u30c3\u30c9\u6570\uff08warp\u6570\uff09\u304c\u9055\u3063\u3066\u304f\u308b\u305f\u3081\u3001\u4e0a\u306e\u3088\u3046\u306a\u3084\u308a\u65b9\u3060\u3068\u6700\u7d42\u7684\u306a\u7d50\u679c\u3082\u5f71\u97ff\u3092\u53d7\u3051\u308b\u53ef\u80fd\u6027\u304c\u3042\u308b\u3002<br \/>\n\u6b63\u3057\u304f\u51e6\u7406\u3059\u308b\u305f\u3081\u306b\u306f\u30b7\u30f3\u30d7\u30eb\u306b\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u3059\u308c\u3070\u3044\u3044\u3002<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\/\/ \u30de\u30af\u30ed\u4e0d\u8981\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u306f\u3058\u3081\u306e\u307b\u3046\r\n__shared__ bool staged;\r\nstaged = false;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u76f4\u63a5\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306batomicAdd\u3057\u3066\u305f\u90e8\u5206\r\nstaged = true;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u6700\u5f8c\u306e\u307b\u3046\r\n__syncthreads();\r\n\u2026\u7701\u7565\u2026\r\nif (threadIdx.x == 0) {\r\n    if (staged) atomicAdd(stagedCount, 1);\r\n}<\/pre>\n<p>\u3053\u308c\u3067GPU\u306b\u3088\u3063\u3066stagedCount\u306b\u9055\u3044\u304c\u51fa\u308b\u3068\u3044\u3046\u4e8b\u304c\u306a\u304f\u306a\u3063\u305f\u3002<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p><ins datetime=\"2013-05-26T04:56:52+00:00\">\u3053\u306e\u8a18\u4e8b\u306b\u306f\u8aa4\u308a\u304c\u3042\u308a\u307e\u3059\u3002\u305d\u308c\u306b\u3064\u3044\u3066\u6700\u5f8c\u306e\u307b\u3046\u306b\u8ffd\u8a18\u304c\u3042\u308a\u307e\u3059\u3002<\/ins><\/p>\n<p><a href=\"https:\/\/peta.okechan.net\/blog\/archives\/2499\" title=\"CUDA\u3092\u4f7f\u3063\u3066\u30ec\u30a4\u3068\u30b7\u30fc\u30f3\u3092\u69cb\u6210\u3059\u308b\u30dd\u30ea\u30b4\u30f3\u3068\u306e\u4ea4\u5dee\u5224\u5b9a\u3092\u3059\u308b\u30b3\u30fc\u30c9\u3092\u66f8\u3044\u3066\u307f\u305f\">\u4f8b\u306eGPU\u30ec\u30a4\u30ad\u30e3\u30b9\u30c8<\/a>\u306e\u30ab\u30fc\u30cd\u30eb\u3092\u5f04\u3063\u3066\u305f\u3089\u3001Geforce 9400M \u3067100\u4e07\u30ec\u30a4\u306e\u51e6\u7406\u306b1.7\u79d2\u3050\u3089\u3044\u639b\u304b\u3063\u3066\u305f\u306e\u304c0.9\u79d2\u3050\u3089\u3044\u306b\u77ed\u7e2e\u3057\u305f\u3002<\/p>\n<p>\u30ec\u30a4\u3054\u3068\u306bBVH\u3092\u30c8\u30e9\u30d0\u30fc\u30b9\u3059\u308b\u30ab\u30fc\u30cd\u30eb\u3067\u3001\u672b\u7aef\u30ce\u30fc\u30c9\u304c\u898b\u3064\u304b\u3063\u305f\u3068\u304d\u306b\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306b\u5bfe\u3057\u3066atomicAdd\u3092\u3057\u3066\u308b\u90e8\u5206\u304c\u3042\u3063\u305f\u3051\u3069\u3001\u305d\u308c\u304c\u9045\u304b\u3063\u305f\u307f\u305f\u3044\u3002<\/p>\n<p>\u5171\u6709\u30e1\u30e2\u30ea\u306b1warp\u5206\u3001\u3064\u307e\u308a32\u500b\u306e\u8981\u7d20\u3092\u6301\u3064int\u306e\u914d\u5217\u3092\u78ba\u4fdd\u3057\u3001atomicAdd\u3092\u3057\u3066\u305f\u90e8\u5206\u3067\u5404\u30b9\u30ec\u30c3\u30c9\u3067\u81ea\u5206\u304c\u8a72\u5f53\u3059\u308b\u5834\u6240\u306b1\u3092\u30bb\u30c3\u30c8\u3057\u3001\u6700\u5f8c\u306b0\u756a\u76ee\u306e\u30b9\u30ec\u30c3\u30c9\u3067\u96c6\u8a08\u3057\u30661warp\u30671\u56de\u3060\u3051atomicAdd\u3092\u5b9f\u884c\u3059\u308b\u3088\u3046\u306b\u3057\u305f\u3089\u30012\u500d\u8fd1\u304f\u9ad8\u901f\u5316\u3057\u305f\u3002<br \/>\n\u57fa\u672c\u7684\u306a\u9ad8\u901f\u5316\u624b\u6cd5\u3060\u3051\u3069\u3053\u3053\u307e\u3067\u52b9\u679c\u304c\u3042\u308b\u3068\u306f\u601d\u3063\u3066\u306a\u304b\u3063\u305f\u306e\u3067\u5f8c\u56de\u3057\u306b\u3057\u3066\u305f\u3002<br \/>\n\u4ea4\u5dee\u5224\u5b9a\u306e\u65b9\u306e\u30ab\u30fc\u30cd\u30eb\u306f\u305d\u306e\u307e\u307e\u306a\u306e\u3067\u3001\u30c8\u30e9\u30d0\u30fc\u30b9\u306e\u30ab\u30fc\u30cd\u30eb\u3060\u3051\u3067\u3044\u3046\u30682\u500d\u4ee5\u4e0a\u9ad8\u901f\u5316\u3057\u3066\u308b\u304b\u3082\u3057\u308c\u306a\u3044\u3002<\/p>\n<p>\u5909\u66f4\u3057\u305f\u90e8\u5206\u306f\u4ee5\u4e0b\u306e\u3068\u304a\u308a\u3002<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u5916\u3067\u30de\u30af\u30ed\u5b9a\u7fa9\r\n#define WARP_SIZE 32\r\n#define WARP_INDEX (threadIdx.x % WARP_SIZE)\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u306f\u3058\u3081\u306e\u307b\u3046\r\n__shared__ int staged&#x5B;WARP_SIZE];\r\nstaged&#x5B;WARP_INDEX] = 0;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u76f4\u63a5\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306batomicAdd\u3057\u3066\u305f\u90e8\u5206\r\nstaged&#x5B;WARP_INDEX] = 1;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u6700\u5f8c\u306e\u307b\u3046\r\n__syncthreads();\r\n\u2026\u7701\u7565\u2026\r\nif (WARP_INDEX == 0) {\r\n    int sum = 0;\r\n    for (int i = 0; i &lt; WARP_SIZE; i++) sum += staged&amp;#91;i&amp;#93;;\r\n    if (sum &gt; 0) atomicAdd(stagedCount, sum);\r\n}<\/pre>\n<p>Geforce 9400M\u304cCompute Capability 1.1\u6b62\u307e\u308a\u3060\u304b\u3089\u5c11\u3057\u56de\u308a\u304f\u3069\u3044\u66f8\u304d\u65b9\u306b\u306a\u3063\u3066\u308b\u3051\u3069\u30011.2\u4ee5\u4e0a\u306a\u3089\u5171\u6709\u30e1\u30e2\u30ea\u306b\u5bfe\u3057\u3066\u76f4\u63a5atomicAdd\u304c\u547c\u3079\u308b\u306e\u3067\u3082\u3063\u3068\u30b9\u30c3\u30ad\u30ea\u66f8\u3051\u308b\u3068\u601d\u3046\u3002\uff08\u53b3\u5bc6\u306b\u306f\u540c\u3058\u51e6\u7406\u5185\u5bb9\u306b\u306f\u306a\u3089\u306a\u3044\u3051\u3069\uff09<\/p>\n<p>GTX 460\u3067\u8a66\u3057\u3066\u307f\u305f\u3089\u3001\u30c7\u30d0\u30a4\u30b9\u30e1\u30e2\u30ea\u306e\u5e2f\u57df\u5e45\u304c9400M\u3068\u6bd4\u3079\u3066\u5e83\u3044\u304b\u3089\u304b\u3001\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306b\u5bfe\u3059\u308batomicAdd\u304c\u52b9\u7387\u5316\u3055\u308c\u3066\u308b\u304b\u3089\u304b\u30010.06\u79d2\u304c0.055\u79d2\u306b\u306a\u308b\u7a0b\u5ea6\u3060\u3063\u305f\u3002<\/p>\n<p>\u3044\u307e\u306e\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u7684\u306b\u306f\u3001\u4ea4\u5dee\u5224\u5b9a\u304c\u5fc5\u8981\u306a\u3068\u304d\u306bstagedCount\u304c\u5168\u4f53\u3068\u3057\u30661\u4ee5\u4e0a\u306b\u306a\u308a\u3055\u3048\u3059\u308c\u3070\u3088\u304f\u3001\u6b63\u78ba\u306a\u5024\u3067\u3042\u308b\u5fc5\u8981\u306f\u306a\u3044\u306e\u3067\u3001\u4e0a\u306e\u30b3\u30fc\u30c9\u306f\u3082\u3063\u3068\u52b9\u7387\u5316\u51fa\u6765\u308b\u3002<br \/>\n\u4f8b\u3048\u3070__shared__ int staged[WARP_SIZE];\u306e\u4ee3\u308f\u308a\u306b\u3001__shared__ bool staged;\u3068\u3057\u3066\uff08\u914d\u5217\u3058\u3083\u306a\u304f\u3059\u308b\uff09\u3001\u5404\u30b9\u30ec\u30c3\u30c9\u304b\u3089atomicAdd\u306e\u4ee3\u308f\u308a\u306bstaged\u306btrue\u3092\u66f8\u304d\u8fbc\u307f\u3001\u6700\u5f8c\u306b0\u756a\u76ee\u306e\u30b9\u30ec\u30c3\u30c9\u3067staged\u304ctrue\u306e\u5834\u5408\u306b\u5b9f\u969b\u306batomicAdd\u3059\u308b\u3088\u3046\u306b\u3059\u308c\u3070\u3044\u3044\u3002<br \/>\n<ins datetime=\"2013-06-11T07:07:20+00:00\">\uff08\u305d\u3046\u3059\u308b\u3068\u5909\u6570\u540d\u3068\u3057\u3066stagedCount\u3068\u3044\u3046\u540d\u524d\u306f\u3075\u3055\u308f\u3057\u304f\u306a\u304f\u306a\u3063\u3066\u3057\u307e\u3046\u304c\u3001\u4ee3\u308f\u308a\u306e\u3044\u3044\u540d\u524d\u304c\u601d\u3044\u3064\u304b\u306a\u3044\uff09<\/ins><\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u5916\u3067\u30de\u30af\u30ed\u5b9a\u7fa9\r\n#define WARP_SIZE 32\r\n#define WARP_INDEX (threadIdx.x % WARP_SIZE)\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u306f\u3058\u3081\u306e\u307b\u3046\r\n__shared__ bool staged;\r\nstaged = false;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u76f4\u63a5\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306batomicAdd\u3057\u3066\u305f\u90e8\u5206\r\nstaged = true;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u6700\u5f8c\u306e\u307b\u3046\r\n__syncthreads();\r\n\u2026\u7701\u7565\u2026\r\nif (WARP_INDEX == 0) {\r\n    if (staged) atomicAdd(stagedCount, 1);\r\n}<\/pre>\n<p>\u305f\u3060\u3001\u3053\u306e\u65b9\u6cd5\u3067\u306f\u5171\u6709\u30e1\u30e2\u30ea\u306e\u540c\u3058\u5834\u6240\u306b\u8907\u6570\u306e\u30b9\u30ec\u30c3\u30c9\u304b\u3089\u540c\u6642\u306b\u66f8\u304d\u8fbc\u307e\u308c\u308b\u53ef\u80fd\u6027\u304c\u3042\u308a\u3001\u307e\u3041\u540c\u3058\u5024\u3092\u66f8\u304d\u8fbc\u3080\u3093\u306a\u3089\u5927\u4e08\u592b\u3060\u3068\u601d\u3046\u3051\u3069\u3001\u5c11\u3005\u4e0d\u5b89\u306a\u65b9\u6cd5\u3067\u3082\u3042\u308b\u3002<br \/>\n\u5b9f\u969b\u30019400M\uff08Mac\uff09\u3068GTX 460\uff08Ubuntu\uff09\u3067\u5168\u4f53\u7684\u306astagedCount\u306e\u5024\u306b\u9055\u3044\u304c\u51fa\u305f\u3002<br \/>\n\u6700\u7d42\u7684\u306a\u51fa\u529b\u753b\u50cf\u306b\u9055\u3044\u306f\u7121\u304b\u3063\u305f\u3051\u3069\u3001\u914d\u5217\u3092\u4f7f\u3046\u5834\u5408\u3068\u6bd4\u3079\u3066\u307b\u3068\u3093\u3069\u51e6\u7406\u6642\u9593\u306b\u5dee\u304c\u51fa\u306a\u304b\u3063\u305f\u3002<br \/>\n\u305f\u3060\u3001\u5171\u6709\u30e1\u30e2\u30ea\u306e\u7bc0\u7d04\u306b\u306f\u306a\u308b\u3068\u601d\u3046\u306e\u3067\u3001\u3053\u3053\u307e\u3067\u3084\u308b\u306e\u3082\u7121\u610f\u5473\u3067\u306f\u306a\u3044\u3068\u601d\u3046\u3002<\/p>\n<div class=\"ins\" datetime=\"2013-05-26T04:56:52+00:00\">\u4ee5\u4e0b\u8ffd\u8a18<br \/>\n\u4e0a\u306e2\u3064\u306e\u4f8b\u306f\u3001\u300c\u5171\u6709\u30e1\u30e2\u30ea\u306fwarp\u5358\u4f4d\u3067\u5171\u6709\u3055\u308c\u308b\u300d\u3068\u3044\u3046\u52d8\u9055\u3044\u306b\u57fa\u3065\u3044\u3066\u8003\u3048\u305f\u3082\u306e\u306a\u306e\u3067\u6b63\u3057\u304f\u306a\u3044\u3002<br \/>\n\u300c\u5171\u6709\u30e1\u30e2\u30ea\u306f\u30d6\u30ed\u30c3\u30af\u5358\u4f4d\u3067\u5171\u6709\u3055\u308c\u308b\u300d\u3068\u3044\u3046\u306e\u304c\u6b63\u3057\u3044\u3002<br \/>\n1\u756a\u76ee\u306e\u4f8b\u3067\u306f\u5076\u7136\u554f\u984c\u304c\u9855\u5728\u5316\u3057\u3066\u306a\u3044\u304c\u30012\u756a\u76ee\u306e\u4f8b\u3067\u306fstagedCount\u306e\u5024\u306e\u9055\u3044\u3068\u3057\u3066\u73fe\u308c\u3066\u3044\u308b\u3002<br \/>\nGPU\u306b\u3088\u3063\u30661\u3064\u306eSM\u5185\u3067\u540c\u6642\u306b\u5b9f\u884c\u3055\u308c\u308b\u30b9\u30ec\u30c3\u30c9\u6570\uff08warp\u6570\uff09\u304c\u9055\u3063\u3066\u304f\u308b\u305f\u3081\u3001\u4e0a\u306e\u3088\u3046\u306a\u3084\u308a\u65b9\u3060\u3068\u6700\u7d42\u7684\u306a\u7d50\u679c\u3082\u5f71\u97ff\u3092\u53d7\u3051\u308b\u53ef\u80fd\u6027\u304c\u3042\u308b\u3002<br \/>\n\u6b63\u3057\u304f\u51e6\u7406\u3059\u308b\u305f\u3081\u306b\u306f\u30b7\u30f3\u30d7\u30eb\u306b\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u3059\u308c\u3070\u3044\u3044\u3002<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\/\/ \u30de\u30af\u30ed\u4e0d\u8981\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u306f\u3058\u3081\u306e\u307b\u3046\r\n__shared__ bool staged;\r\nstaged = false;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u76f4\u63a5\u30b0\u30ed\u30fc\u30d0\u30eb\u30e1\u30e2\u30ea\u306batomicAdd\u3057\u3066\u305f\u90e8\u5206\r\nstaged = true;\r\n\r\n\u2026\u7701\u7565\u2026\r\n\r\n\/\/ \u30ab\u30fc\u30cd\u30eb\u306e\u6700\u5f8c\u306e\u307b\u3046\r\n__syncthreads();\r\n\u2026\u7701\u7565\u2026\r\nif (threadIdx.x == 0) {\r\n    if (staged) atomicAdd(stagedCount, 1);\r\n}<\/pre>\n<p>\u3053\u308c\u3067GPU\u306b\u3088\u3063\u3066stagedCount\u306b\u9055\u3044\u304c\u51fa\u308b\u3068\u3044\u3046\u4e8b\u304c\u306a\u304f\u306a\u3063\u305f\u3002<\/p><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[289],"class_list":["post-2596","post","type-post","status-publish","format-standard","hentry","category-tech","tag-cuda"],"_links":{"self":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts\/2596","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/comments?post=2596"}],"version-history":[{"count":0,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts\/2596\/revisions"}],"wp:attachment":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/media?parent=2596"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/categories?post=2596"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/tags?post=2596"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}