{"id":2668,"date":"2013-06-27T19:32:00","date_gmt":"2013-06-27T10:32:00","guid":{"rendered":"http:\/\/peta.okechan.net\/blog\/?p=2668"},"modified":"2013-06-27T19:45:53","modified_gmt":"2013-06-27T10:45:53","slug":"%e3%83%90%e3%82%a4%e3%83%88%e3%83%8b%e3%83%83%e3%82%af%e3%82%bd%e3%83%bc%e3%83%88","status":"publish","type":"post","link":"https:\/\/peta.okechan.net\/blog\/archives\/2668","title":{"rendered":"\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8"},"content":{"rendered":"<p>GPU\u3067\u30bd\u30fc\u30c8\u3092\u884c\u3046\u5fc5\u8981\u304c\u3042\u308a\u3001GPU\u30bd\u30fc\u30c8\u306b\u304a\u3051\u308b\u4ee3\u8868\u683c\u3067\u3042\u308b\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8\u306b\u3064\u3044\u3066\u8abf\u3079\u305f\u3002<\/p>\n<p>\u30b0\u30b0\u3063\u3066\u307f\u308b\u3068\u8272\u3093\u306a\u30b5\u30a4\u30c8\u306b\u30bd\u30fc\u30b9\u3084\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306e\u56f3\u304c\u8f09\u3063\u3066\u308b\u3051\u3069\u3001\u3044\u307e\u3044\u3061\u3088\u304f\u89e3\u3089\u306a\u3044\u3002<br \/>\n\u56f3\u3092\u898b\u308b\u3068\u6982\u5ff5\u30ec\u30d9\u30eb\u306e\u4e8b\u306f\u3059\u3050\u5206\u304b\u308b\u3051\u3069\u3001\u56f3\u3068\u3044\u3046\u3082\u306e\u306e\u5bbf\u547d\u3068\u3057\u3066\u305d\u306e\u30b9\u30da\u30fc\u30b9\u304c\u9650\u3089\u308c\u3066\u308b\u305f\u3081\u304b\u3001\u3069\u3053\u306e\u56f3\u3082\u8981\u7d20\u65708\u500b\u3068\u304b\u3067\u5c11\u306a\u3059\u304e\u3066\u3001\u8981\u7d20\u6570\u30922^n\u306b\u62bd\u8c61\u5316\u3057\u305f\u969b\u306e\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u3092\u30a4\u30e1\u30fc\u30b8\u3057\u3065\u3089\u3044\u3002<br \/>\n\u3042\u3068\u3001\u51e6\u7406\u306e\u6d41\u308c\u306e\u56f3\u304c\u3001\u524d\u534a\u3068\u5f8c\u534a\u3067\u5206\u3051\u3066\u3042\u3063\u3066\u3001\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u304c\u5927\u304d\u304f\u5206\u3051\u30662\u30d5\u30a7\u30fc\u30ba\u304b\u3089\u306a\u308b\u3068\u52d8\u9055\u3044\u3057\u3084\u3059\u3044\u56f3\u3082\u591a\u3044\u3002<br \/>\n\u3044\u3084\u307e\u3041\u3053\u308c\u306f\u3042\u304f\u307e\u3067\u3082\u79c1\u306e\u982d\u306e\u554f\u984c\u3067\u3059\u3051\u3069\u306d\u3002<\/p>\n<p>\u500b\u4eba\u7684\u306b\u4e00\u756a\u5206\u304b\u308a\u3084\u3059\u304b\u3063\u305f\u306e\u306f<a href=\"http:\/\/en.wikipedia.org\/wiki\/Bitonic_sorter\" target=\"_blank\">\u7c73Wikipedia\u306e\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8\u306e\u30da\u30fc\u30b8<\/a>\u3002<br \/>\n\u3053\u3053\u306b\u3082\u56f3\u304c\u8f09\u3063\u3066\u308b\u3051\u3069\u3001\u5206\u304b\u308a\u3084\u3059\u304b\u3063\u305f\u306e\u306f\u56f3\u3067\u306f\u306a\u304f\u3066\u3001\u305d\u306e\u4e0b\u306b\u8f09\u3063\u3066\u308bPython\u306e\u30bd\u30fc\u30b9\u3002<\/p>\n<p>\u3057\u304b\u3057\u305d\u306e\u30bd\u30fc\u30b9\u3001\u7d14\u7c8b\u306a\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306e\u7406\u89e3\u306b\u306f\u304b\u306a\u308a\u5f79\u7acb\u3064\u3068\u306f\u3044\u3048\u3001\u30d0\u30ea\u30d0\u30ea\u518d\u5e30\u3064\u304b\u3063\u3066\u308b\u3057\u3001\u30ea\u30b9\u30c8\u3092\u9023\u7d50\u3057\u305f\u308a\u3057\u3066\u308b\u304b\u3089\u3001GPU\u3060\u3068\u3069\u3093\u306a\u6d41\u308c\u3067\u51e6\u7406\u3059\u308c\u3070\u3044\u3044\u306e\u304b\u3068\u3044\u3046\u3053\u3068\u304c\u5206\u304b\u308a\u306b\u304f\u3044\u3002<\/p>\n<p>\u3068\u3044\u3046\u3053\u3068\u3067\u3001\u307e\u305a\u518d\u5e30\u3068\u30ea\u30b9\u30c8\u306e\u9023\u7d50\u3092\u53d6\u308a\u9664\u3044\u305f\u30d0\u30fc\u30b8\u30e7\u30f3\u3092\u66f8\u3044\u3066\u307f\u3066\u3001\u3055\u3089\u306b\u305d\u3053\u304b\u3089GPU\u3060\u3068\u3053\u3093\u306a\u611f\u3058\u306b\u306a\u308b\u3088\u306d\u3068\u3044\u3046\u30d0\u30fc\u30b8\u30e7\u30f3\u3092\u66f8\u3044\u3066\u307f\u305f\u3002<br \/>\n\uff08\u30bd\u30fc\u30b9\u306f\u672c\u6587\u306e\u6700\u5f8c\u306b\u8f09\u305b\u3066\u307e\u3059\u3002\uff09<\/p>\n<p>\u4ee5\u4e0b\u30bd\u30fc\u30b9\u306e\u89e3\u8aac\u3002<\/p>\n<p>WikipediaBitonicSorter \u304cWikipedia\u306b\u8f09\u3063\u3066\u305f\u30bd\u30fc\u30b9\u3092\u30af\u30e9\u30b9\u5316\u3057\u305f\u3082\u306e\u3067\u3001 MyFirstBitonicSorter \u304c\u305d\u306e\u540d\u306e\u3068\u304a\u308a\u79c1\u304c\u6700\u521d\u306b\u66f8\u3044\u305f\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8\uff08\u518d\u5e30\u3068\u30ea\u30b9\u30c8\u306e\u9023\u7d50\u306a\u3057\uff09\u3067\u3001 GPULikeBitonicSorter \u304cGPU\u3060\u3068\u3053\u3093\u306a\u51e6\u7406\u306e\u6d41\u308c\u306b\u306a\u308b\u3088\u306d\u30d0\u30fc\u30b8\u30e7\u30f3\u306e\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8\u3067\u3042\u308b\u3002<\/p>\n<p>\u3082\u3061\u308d\u3093\u300cGPU Like\u300d\u3068\u3044\u3046\u4e8b\u3067\u5b9f\u969b\u306fCPU\u3067\u51e6\u7406\u3092\u884c\u3046\u3051\u3069\u3001\u540c\u3058\u3088\u3046\u306a\u51e6\u7406\u3092CUDA\u3084\u3089OpenCL\u3084\u3089OpenGL\u5411\u3051\u306b\u66f8\u3051\u3070\u3061\u3083\u3093\u3068\u52d5\u304f\u3068\u601d\u3046\u3002<br \/>\n\u305f\u3060\u3057\u3001\u5171\u6709\u30e1\u30e2\u30ea\u3092\u4f7f\u3063\u305f\u308a\u3001\u30e1\u30e2\u30ea\u30a2\u30af\u30bb\u30b9\u3092\u30b3\u30a2\u30ec\u30c3\u30b7\u30f3\u30b0\u3057\u305f\u308a\u3001\u30b9\u30ec\u30c3\u30c9\u306e\u5206\u5c90\u65b9\u5411\u3092\u306a\u308b\u3079\u304f\u63c3\u3048\u305f\u308a\u3059\u308b\u3088\u3046\u306a\u5de5\u592b\u306f\u3057\u3066\u306a\u3044\u306e\u3067\u3001\u3053\u306e\u307e\u307eGPU\u306b\u6301\u3063\u3066\u884c\u3063\u3066\u3082\u6027\u80fd\u306f\u51fa\u3057\u5207\u308c\u306a\u3044\u3068\u601d\u3046\u3002<br \/>\n\u305d\u3046\u3044\u3046\u6700\u9069\u5316\u306f\u5b9f\u969b\u306bGPU\u306b\u6301\u3063\u3066\u3044\u3063\u305f\u5f8c\u3057\u304b\u3067\u304d\u306a\u3044\u3002<\/p>\n<p>\u4eca\u56de\u306e\u30bd\u30fc\u30b9\u306fCPU\u3067\u52d5\u4f5c\u3059\u308b\u306e\u3067\u901f\u5ea6\u3092\u6e2c\u3063\u3066\u3082\u4ed5\u65b9\u306a\u3044\u3093\u3060\u3051\u3069\u3001\u4e00\u5fdc\u901f\u5ea6\u3092\u6e2c\u308b\u3088\u3046\u306b\u3057\u305f\u3002<br \/>\n\u3046\u3061\u306e\u74b0\u5883\u3067\u306f\u3001\u8981\u7d20\u6570 1048576 (1024 * 1024) \u3067\u3069\u308c\u30823\u5206\u524d\u5f8c\u304b\u304b\u308b\u4e2d\u3001\u7d44\u307f\u8fbc\u307f\u95a2\u6570\u306esorted\uff08\u4e2d\u8eab\u306fC\u3067\u66f8\u304b\u308c\u305f\u30af\u30a4\u30c3\u30af\u30bd\u30fc\u30c8\u304b\u306a\uff1f\uff09\u306f2\u79d2\u5f31\u3068\u306a\u304b\u306a\u304b\u306e\u30bf\u30a4\u30e0\u3060\u3063\u305f\u3002<br \/>\n\u5b9f\u969b\u306bGPU\u306b\u6301\u3063\u3066\u884c\u3063\u3066\u3069\u306e\u304f\u3089\u3044\u306e\u30bf\u30a4\u30e0\u304c\u51fa\u308b\u304b\u305d\u306e\u3046\u3061\u8a66\u3057\u305f\u3044\u3002<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">import random\r\nimport timeit\r\n\r\n\r\nclass WikipediaBitonicSorter(object):\r\n    def __init__(self):\r\n        pass\r\n    \r\n    def bitonic_sort(self, data):\r\n        self.comparison_count = 0\r\n        self.swap_count = 0\r\n        return self.__bitonic_sort(True, data)\r\n    \r\n    def __bitonic_sort(self, up, x):\r\n        if len(x)&lt;=1:\r\n            return x\r\n        else: \r\n            first = self.__bitonic_sort(True,x&#x5B;:len(x)\/2])\r\n            second = self.__bitonic_sort(False,x&#x5B;len(x)\/2:])\r\n            return self.bitonic_merge(up,first+second)\r\n \r\n    def bitonic_merge(self, up, x): \r\n        # assume input x is bitonic, and sorted list is returned \r\n        if len(x) == 1:\r\n            return x\r\n        else:\r\n            self.bitonic_compare(up,x)\r\n            first = self.bitonic_merge(up,x&#x5B;:len(x)\/2])\r\n            second = self.bitonic_merge(up,x&#x5B;len(x)\/2:])\r\n            return first + second\r\n \r\n    def bitonic_compare(self, up, x):\r\n        dist = len(x)\/2\r\n        for i in xrange(dist):\r\n            self.comparison_count += 1\r\n            if (x&#x5B;i] &gt; x&#x5B;i+dist]) == up:\r\n                self.swap_count += 1\r\n                x&#x5B;i], x&#x5B;i+dist] = x&#x5B;i+dist], x&#x5B;i] #swap\r\n\r\n\r\nclass MyFirstBitonicSorter(object):\r\n    def __init__(self):\r\n        pass\r\n    \r\n    def bitonic_sort(self, data):\r\n        self.comparison_count = 0\r\n        self.swap_count = 0\r\n        \r\n        self.data = data\r\n        chunk_size = 2\r\n        while chunk_size &lt;= len(self.data):\r\n            for chunk_index in xrange(len(self.data) \/ chunk_size):\r\n                _from = chunk_size * chunk_index\r\n                _to = _from + chunk_size - 1\r\n                self.sort(chunk_index % 2 == 0, _from, _to)\r\n            chunk_size *= 2\r\n        return self.data\r\n    \r\n    def sort(self, up, _from, _to):\r\n        chunk_size = _to - _from + 1\r\n        if chunk_size == 2:\r\n            self.compare(up, _from, _to)\r\n        else:\r\n            split = 1\r\n            while split &lt; chunk_size:\r\n                sub_chunk_size = chunk_size \/ split\r\n                for sub_chunk_index in xrange(split):\r\n                    for i in xrange(sub_chunk_size \/ 2):\r\n                        a = sub_chunk_size * sub_chunk_index + i + _from\r\n                        b = a + sub_chunk_size \/ 2\r\n                        self.compare(up, a, b)\r\n                split *= 2\r\n            \r\n    def compare(self, up, a, b):\r\n        self.comparison_count += 1\r\n        if (self.data&#x5B;a] &gt; self.data&#x5B;b]) == up:\r\n            self.swap_count += 1\r\n            self.data&#x5B;a], self.data&#x5B;b] = self.data&#x5B;b], self.data&#x5B;a]\r\n        \r\n\r\nclass GPULikeBitonicSorter(object):\r\n    def __init__(self):\r\n        pass\r\n    \r\n    def bitonic_sort(self, data):\r\n        self.comparison_count = 0\r\n        self.swap_count = 0\r\n        self.launch_kernel_count = 0\r\n        \r\n        chunk_size = 2\r\n        while chunk_size &lt;= len(data):\r\n            sub_chunk_size = chunk_size\r\n            while sub_chunk_size &gt; 1:\r\n                self.launch_kernel(data, chunk_size, sub_chunk_size)\r\n                sub_chunk_size \/= 2\r\n            chunk_size *= 2\r\n        return data\r\n    \r\n    def launch_kernel(self, data, chunk_size, sub_chunk_size):\r\n        self.launch_kernel_count += 1\r\n        for i in xrange(len(data) \/ 2):\r\n            self.kernel(data, len(data), chunk_size, sub_chunk_size, i)\r\n        \r\n    def kernel(self, data, data_size, chunk_size, sub_chunk_size, thread_index):\r\n        half_chunk_size = chunk_size \/ 2\r\n        chunk_index = thread_index \/ half_chunk_size\r\n        \r\n        half_sub_chunk_size = sub_chunk_size \/ 2\r\n        sub_chunk_index = thread_index \/ half_sub_chunk_size\r\n        \r\n        up = chunk_index % 2 == 0\r\n        a = sub_chunk_size * sub_chunk_index + thread_index % half_sub_chunk_size\r\n        b = a + half_sub_chunk_size\r\n        \r\n        self.comparison_count += 1\r\n        if (data&#x5B;a] &gt; data&#x5B;b]) == up:\r\n            self.swap_count += 1\r\n            data&#x5B;a], data&#x5B;b] = data&#x5B;b], data&#x5B;a]\r\n            \r\n        \r\nif __name__ == &quot;__main__&quot;:\r\n    #random.seed(12345)\r\n    \r\n    data_size = 128 * 128\r\n    trial_count = 3\r\n    \r\n    data = &#x5B;random.randint(0, data_size) for i in xrange(data_size)]\r\n    results = &#x5B;]\r\n    times = &#x5B;]\r\n    \r\n    wbs = WikipediaBitonicSorter()\r\n    times.append(\r\n        timeit.timeit(\r\n            setup = 'from __main__ import results, data, wbs',\r\n            stmt = 'results.append(wbs.bitonic_sort(data&#x5B;:]))',\r\n            number = trial_count\r\n        )\r\n    )\r\n    print 'wbs.comparison_count = %d' % wbs.comparison_count\r\n    print 'wbs.swap_count = %d' % wbs.swap_count\r\n    print 'wbs time: %f sec' % times&#x5B;-1]\r\n    print\r\n\r\n    mfbs = MyFirstBitonicSorter()\r\n    times.append(\r\n        timeit.timeit(\r\n            setup = 'from __main__ import results, data, mfbs',\r\n            stmt = 'results.append(mfbs.bitonic_sort(data&#x5B;:]))',\r\n            number = trial_count\r\n        )\r\n    )\r\n    print 'mfbs.comparison_count = %d' % mfbs.comparison_count\r\n    print 'mfbs.swap_count = %d' % mfbs.swap_count\r\n    print 'mfbs time: %f sec' % times&#x5B;-1]\r\n    print\r\n    \r\n    glbs = GPULikeBitonicSorter()\r\n    times.append(\r\n        timeit.timeit(\r\n            setup = 'from __main__ import results, data, glbs',\r\n            stmt = 'results.append(glbs.bitonic_sort(data&#x5B;:]))',\r\n            number = trial_count\r\n        )\r\n    )\r\n    print 'glbs.comparison_count = %d' % glbs.comparison_count\r\n    print 'glbs.swap_count = %d' % glbs.swap_count\r\n    print 'glbs.launch_kernel_count = %d' % glbs.launch_kernel_count\r\n    print 'glbs time: %f sec' % times&#x5B;-1]\r\n    print\r\n    \r\n    times.append(\r\n        timeit.timeit(\r\n            setup = 'from __main__ import results, data',\r\n            stmt = 'results.append(sorted(data))',\r\n            number = trial_count\r\n        )\r\n    )\r\n    print 'builtin function sorted time: %f sec' % times&#x5B;-1]\r\n    print\r\n    \r\n    for i in xrange(len(results) - 1):\r\n        if results&#x5B;i] != results&#x5B;i+1]:\r\n            print 'Result has a difference.'<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>GPU\u3067\u30bd\u30fc\u30c8\u3092\u884c\u3046\u5fc5\u8981\u304c\u3042\u308a\u3001GPU\u30bd\u30fc\u30c8\u306b\u304a\u3051\u308b\u4ee3\u8868\u683c\u3067\u3042\u308b\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8\u306b\u3064\u3044\u3066\u8abf\u3079\u305f\u3002<\/p>\n<p>\u30b0\u30b0\u3063\u3066\u307f\u308b\u3068\u8272\u3093\u306a\u30b5\u30a4\u30c8\u306b\u30bd\u30fc\u30b9\u3084\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306e\u56f3\u304c\u8f09\u3063\u3066\u308b\u3051\u3069\u3001\u3044\u307e\u3044\u3061\u3088\u304f\u89e3\u3089\u306a\u3044\u3002<br \/>\n\u56f3\u3092\u898b\u308b\u3068\u6982\u5ff5\u30ec\u30d9\u30eb\u306e\u4e8b\u306f\u3059\u3050\u5206\u304b\u308b\u3051\u3069\u3001\u56f3\u3068\u3044\u3046\u3082\u306e\u306e\u5bbf\u547d\u3068\u3057\u3066\u305d\u306e\u30b9\u30da\u30fc\u30b9\u304c\u9650\u3089\u308c\u3066\u308b\u305f\u3081\u304b\u3001\u3069\u3053\u306e\u56f3\u3082\u8981\u7d20\u65708\u500b\u3068\u304b\u3067\u5c11\u306a\u3059\u304e\u3066\u3001\u8981\u7d20\u6570\u30922^n\u306b\u62bd\u8c61\u5316\u3057\u305f\u969b\u306e\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u3092\u30a4\u30e1\u30fc\u30b8\u3057\u3065\u3089\u3044\u3002<br \/>\n\u3042\u3068\u3001\u51e6\u7406\u306e\u6d41\u308c\u306e\u56f3\u304c\u3001\u524d\u534a\u3068\u5f8c\u534a\u3067\u5206\u3051\u3066\u3042\u3063\u3066\u3001\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u304c\u5927\u304d\u304f\u5206\u3051\u30662\u30d5\u30a7\u30fc\u30ba\u304b\u3089\u306a\u308b\u3068\u52d8\u9055\u3044\u3057\u3084\u3059\u3044\u56f3\u3082\u591a\u3044\u3002<br \/>\n\u3044\u3084\u307e\u3041\u3053\u308c\u306f\u3042\u304f\u307e\u3067\u3082\u79c1\u306e\u982d\u306e\u554f\u984c\u3067\u3059\u3051\u3069\u306d\u3002<\/p>\n<p>\u500b\u4eba\u7684\u306b\u4e00\u756a\u5206\u304b\u308a\u3084\u3059\u304b\u3063\u305f\u306e\u306f<a href=\"http:\/\/en.wikipedia.org\/wiki\/Bitonic_sorter\" target=\"_blank\">\u7c73Wikipedia\u306e\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8\u306e\u30da\u30fc\u30b8<\/a>\u3002<br \/>\n\u3053\u3053\u306b\u3082\u56f3\u304c\u8f09\u3063\u3066\u308b\u3051\u3069\u3001\u5206\u304b\u308a\u3084\u3059\u304b\u3063\u305f\u306e\u306f\u56f3\u3067\u306f\u306a\u304f\u3066\u3001\u305d\u306e\u4e0b\u306b\u8f09\u3063\u3066\u308bPython\u306e\u30bd\u30fc\u30b9\u3002<\/p>\n<p>\u3057\u304b\u3057\u305d\u306e\u30bd\u30fc\u30b9\u3001\u7d14\u7c8b\u306a\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306e\u7406\u89e3\u306b\u306f\u304b\u306a\u308a\u5f79\u7acb\u3064\u3068\u306f\u3044\u3048\u3001\u30d0\u30ea\u30d0\u30ea\u518d\u5e30\u3064\u304b\u3063\u3066\u308b\u3057\u3001\u30ea\u30b9\u30c8\u3092\u9023\u7d50\u3057\u305f\u308a\u3057\u3066\u308b\u304b\u3089\u3001GPU\u3060\u3068\u3069\u3093\u306a\u6d41\u308c\u3067\u51e6\u7406\u3059\u308c\u3070\u3044\u3044\u306e\u304b\u3068\u3044\u3046\u3053\u3068\u304c\u5206\u304b\u308a\u306b\u304f\u3044\u3002<\/p>\n<p>\u3068\u3044\u3046\u3053\u3068\u3067\u3001\u307e\u305a\u518d\u5e30\u3068\u30ea\u30b9\u30c8\u306e\u9023\u7d50\u3092\u53d6\u308a\u9664\u3044\u305f\u30d0\u30fc\u30b8\u30e7\u30f3\u3092\u66f8\u3044\u3066\u307f\u3066\u3001\u3055\u3089\u306b\u305d\u3053\u304b\u3089GPU\u3060\u3068\u3053\u3093\u306a\u611f\u3058\u306b\u306a\u308b\u3088\u306d\u3068\u3044\u3046\u30d0\u30fc\u30b8\u30e7\u30f3\u3092\u66f8\u3044\u3066\u307f\u305f\u3002<br \/>\n\uff08\u30bd\u30fc\u30b9\u306f\u672c\u6587\u306e\u6700\u5f8c\u306b\u8f09\u305b\u3066\u307e\u3059\u3002\uff09<\/p>\n<p>\u4ee5\u4e0b\u30bd\u30fc\u30b9\u306e\u89e3\u8aac\u3002<\/p>\n<p>WikipediaBitonicSorter \u304cWikipedia\u306b\u8f09\u3063\u3066\u305f\u30bd\u30fc\u30b9\u3092\u30af\u30e9\u30b9\u5316\u3057\u305f\u3082\u306e\u3067\u3001 MyFirstBitonicSorter \u304c\u305d\u306e\u540d\u306e\u3068\u304a\u308a\u79c1\u304c\u6700\u521d\u306b\u66f8\u3044\u305f\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8\uff08\u518d\u5e30\u3068\u30ea\u30b9\u30c8\u306e\u9023\u7d50\u306a\u3057\uff09\u3067\u3001 GPULikeBitonicSorter \u304cGPU\u3060\u3068\u3053\u3093\u306a\u51e6\u7406\u306e\u6d41\u308c\u306b\u306a\u308b\u3088\u306d\u30d0\u30fc\u30b8\u30e7\u30f3\u306e\u30d0\u30a4\u30c8\u30cb\u30c3\u30af\u30bd\u30fc\u30c8\u3067\u3042\u308b\u3002<\/p>\n<p>\u3082\u3061\u308d\u3093\u300cGPU Like\u300d\u3068\u3044\u3046\u4e8b\u3067\u5b9f\u969b\u306fCPU\u3067\u51e6\u7406\u3092\u884c\u3046\u3051\u3069\u3001\u540c\u3058\u3088\u3046\u306a\u51e6\u7406\u3092CUDA\u3084\u3089OpenCL\u3084\u3089OpenGL\u5411\u3051\u306b\u66f8\u3051\u3070\u3061\u3083\u3093\u3068\u52d5\u304f\u3068\u601d\u3046\u3002<br \/>\n\u305f\u3060\u3057\u3001\u5171\u6709\u30e1\u30e2\u30ea\u3092\u4f7f\u3063\u305f\u308a\u3001\u30e1\u30e2\u30ea\u30a2\u30af\u30bb\u30b9\u3092\u30b3\u30a2\u30ec\u30c3\u30b7\u30f3\u30b0\u3057\u305f\u308a\u3001\u30b9\u30ec\u30c3\u30c9\u306e\u5206\u5c90\u65b9\u5411\u3092\u306a\u308b\u3079\u304f\u63c3\u3048\u305f\u308a\u3059\u308b\u3088\u3046\u306a\u5de5\u592b\u306f\u3057\u3066\u306a\u3044\u306e\u3067\u3001\u3053\u306e\u307e\u307eGPU\u306b\u6301\u3063\u3066\u884c\u3063\u3066\u3082\u6027\u80fd\u306f\u51fa\u3057\u5207\u308c\u306a\u3044\u3068\u601d\u3046\u3002<br \/>\n\u305d\u3046\u3044\u3046\u6700\u9069\u5316\u306f\u5b9f\u969b\u306bGPU\u306b\u6301\u3063\u3066\u3044\u3063\u305f\u5f8c\u3057\u304b\u3067\u304d\u306a\u3044\u3002<\/p>\n<p>\u4eca\u56de\u306e\u30bd\u30fc\u30b9\u306fCPU\u3067\u52d5\u4f5c\u3059\u308b\u306e\u3067\u901f\u5ea6\u3092\u6e2c\u3063\u3066\u3082\u4ed5\u65b9\u306a\u3044\u3093\u3060\u3051\u3069\u3001\u4e00\u5fdc\u901f\u5ea6\u3092\u6e2c\u308b\u3088\u3046\u306b\u3057\u305f\u3002<br \/>\n\u3046\u3061\u306e\u74b0\u5883\u3067\u306f\u3001\u8981\u7d20\u6570 1048576 (1024 * 1024) \u3067\u3069\u308c\u30823\u5206\u524d\u5f8c\u304b\u304b\u308b\u4e2d\u3001\u7d44\u307f\u8fbc\u307f\u95a2\u6570\u306esorted\uff08\u4e2d\u8eab\u306fC\u3067\u66f8\u304b\u308c\u305f\u30af\u30a4\u30c3\u30af\u30bd\u30fc\u30c8\u304b\u306a\uff1f\uff09\u306f2\u79d2\u5f31\u3068\u306a\u304b\u306a\u304b\u306e\u30bf\u30a4\u30e0\u3060\u3063\u305f\u3002<br \/>\n\u5b9f\u969b\u306bGPU\u306b\u6301\u3063\u3066\u884c\u3063\u3066\u3069\u306e\u304f\u3089\u3044\u306e\u30bf\u30a4\u30e0\u304c\u51fa\u308b\u304b\u305d\u306e\u3046\u3061\u8a66\u3057\u305f\u3044\u3002<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">import random\r\nimport timeit\r\n\r\n\r\nclass WikipediaBitonicSorter(object):\r\n    def __init__(self):\r\n        pass\r\n    \r\n    def bitonic_sort(self, data):\r\n        self.comparison_count = 0\r\n        self.swap_count = 0\r\n        return self.__bitonic_sort(True, data)\r\n    \r\n    def __bitonic_sort(self, up, x):\r\n        if len(x)&lt;=1:\r\n            return x\r\n        else: \r\n            first = self.__bitonic_sort(True,x&#x5B;:len(x)\/2])\r\n            second = self.__bitonic_sort(False,x&#x5B;len(x)\/2:])\r\n            return self.bitonic_merge(up,first+second)\r\n \r\n    def bitonic_merge(self, up, x): \r\n        # assume input x is bitonic, and sorted list is returned \r\n        if len(x) == 1:\r\n            return x\r\n        else:\r\n            self.bitonic_compare(up,x)\r\n            first = self.bitonic_merge(up,x&#x5B;:len(x)\/2])\r\n            second = self.bitonic_merge(up,x&#x5B;len(x)\/2:])\r\n            return first + second\r\n \r\n    def bitonic_compare(self, up, x):\r\n        dist = len(x)\/2\r\n        for i in xrange(dist):\r\n            self.comparison_count += 1\r\n            if (x&#x5B;i] &gt; x&#x5B;i+dist]) == up:\r\n                self.swap_count += 1\r\n                x&#x5B;i], x&#x5B;i+dist] = x&#x5B;i+dist], x&#x5B;i] #swap\r\n\r\n\r\nclass MyFirstBitonicSorter(object):\r\n    def __init__(self):\r\n        pass\r\n    \r\n    def bitonic_sort(self, data):\r\n        self.comparison_count = 0\r\n        self.swap_count = 0\r\n        \r\n        self.data = data\r\n        chunk_size = 2\r\n        while chunk_size &lt;= len(self.data):\r\n            for chunk_index in xrange(len(self.data) \/ chunk_size):\r\n                _from = chunk_size * chunk_index\r\n                _to = _from + chunk_size - 1\r\n                self.sort(chunk_index % 2 == 0, _from, _to)\r\n            chunk_size *= 2\r\n        return self.data\r\n    \r\n    def sort(self, up, _from, _to):\r\n        chunk_size = _to - _from + 1\r\n        if chunk_size == 2:\r\n            self.compare(up, _from, _to)\r\n        else:\r\n            split = 1\r\n            while split &lt; chunk_size:\r\n                sub_chunk_size = chunk_size \/ split\r\n                for sub_chunk_index in xrange(split):\r\n                    for i in xrange(sub_chunk_size \/ 2):\r\n                        a = sub_chunk_size * sub_chunk_index + i + _from\r\n                        b = a + sub_chunk_size \/ 2\r\n                        self.compare(up, a, b)\r\n                split *= 2\r\n            \r\n    def compare(self, up, a, b):\r\n        self.comparison_count += 1\r\n        if (self.data&#x5B;a] &gt; self.data&#x5B;b]) == up:\r\n            self.swap_count += 1\r\n            self.data&#x5B;a], self.data&#x5B;b] = self.data&#x5B;b], self.data&#x5B;a]\r\n        \r\n\r\nclass GPULikeBitonicSorter(object):\r\n    def __init__(self):\r\n        pass\r\n    \r\n    def bitonic_sort(self, data):\r\n        self.comparison_count = 0\r\n        self.swap_count = 0\r\n        self.launch_kernel_count = 0\r\n        \r\n        chunk_size = 2\r\n        while chunk_size &lt;= len(data):\r\n            sub_chunk_size = chunk_size\r\n            while sub_chunk_size &gt; 1:\r\n                self.launch_kernel(data, chunk_size, sub_chunk_size)\r\n                sub_chunk_size \/= 2\r\n            chunk_size *= 2\r\n        return data\r\n    \r\n    def launch_kernel(self, data, chunk_size, sub_chunk_size):\r\n        self.launch_kernel_count += 1\r\n        for i in xrange(len(data) \/ 2):\r\n            self.kernel(data, len(data), chunk_size, sub_chunk_size, i)\r\n        \r\n    def kernel(self, data, data_size, chunk_size, sub_chunk_size, thread_index):\r\n        half_chunk_size = chunk_size \/ 2\r\n        chunk_index = thread_index \/ half_chunk_size\r\n        \r\n        half_sub_chunk_size = sub_chunk_size \/ 2\r\n        sub_chunk_index = thread_index \/ half_sub_chunk_size\r\n        \r\n        up = chunk_index % 2 == 0\r\n        a = sub_chunk_size * sub_chunk_index + thread_index % half_sub_chunk_size\r\n        b = a + half_sub_chunk_size\r\n        \r\n        self.comparison_count += 1\r\n        if (data&#x5B;a] &gt; data&#x5B;b]) == up:\r\n            self.swap_count += 1\r\n            data&#x5B;a], data&#x5B;b] = data&#x5B;b], data&#x5B;a]\r\n            \r\n        \r\nif __name__ == &quot;__main__&quot;:\r\n    #random.seed(12345)\r\n    \r\n    data_size = 128 * 128\r\n    trial_count = 3\r\n    \r\n    data = &#x5B;random.randint(0, data_size) for i in xrange(data_size)]\r\n    results = &#x5B;]\r\n    times = &#x5B;]\r\n    \r\n    wbs = WikipediaBitonicSorter()\r\n    times.append(\r\n        timeit.timeit(\r\n            setup = 'from __main__ import results, data, wbs',\r\n            stmt = 'results.append(wbs.bitonic_sort(data&#x5B;:]))',\r\n            number = trial_count\r\n        )\r\n    )\r\n    print 'wbs.comparison_count = %d' % wbs.comparison_count\r\n    print 'wbs.swap_count = %d' % wbs.swap_count\r\n    print 'wbs time: %f sec' % times&#x5B;-1]\r\n    print\r\n\r\n    mfbs = MyFirstBitonicSorter()\r\n    times.append(\r\n        timeit.timeit(\r\n            setup = 'from __main__ import results, data, mfbs',\r\n            stmt = 'results.append(mfbs.bitonic_sort(data&#x5B;:]))',\r\n            number = trial_count\r\n        )\r\n    )\r\n    print 'mfbs.comparison_count = %d' % mfbs.comparison_count\r\n    print 'mfbs.swap_count = %d' % mfbs.swap_count\r\n    print 'mfbs time: %f sec' % times&#x5B;-1]\r\n    print\r\n    \r\n    glbs = GPULikeBitonicSorter()\r\n    times.append(\r\n        timeit.timeit(\r\n            setup = 'from __main__ import results, data, glbs',\r\n            stmt = 'results.append(glbs.bitonic_sort(data&#x5B;:]))',\r\n            number = trial_count\r\n        )\r\n    )\r\n    print 'glbs.comparison_count = %d' % glbs.comparison_count\r\n    print 'glbs.swap_count = %d' % glbs.swap_count\r\n    print 'glbs.launch_kernel_count = %d' % glbs.launch_kernel_count\r\n    print 'glbs time: %f sec' % times&#x5B;-1]\r\n    print\r\n    \r\n    times.append(\r\n        timeit.timeit(\r\n            setup = 'from __main__ import results, data',\r\n            stmt = 'results.append(sorted(data))',\r\n            number = trial_count\r\n        )\r\n    )\r\n    print 'builtin function sorted time: %f sec' % times&#x5B;-1]\r\n    print\r\n    \r\n    for i in xrange(len(results) - 1):\r\n        if results&#x5B;i] != results&#x5B;i+1]:\r\n            print 'Result has a difference.'<\/pre>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[458,60],"class_list":["post-2668","post","type-post","status-publish","format-standard","hentry","category-tech","tag-458","tag-python"],"_links":{"self":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts\/2668","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/comments?post=2668"}],"version-history":[{"count":0,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/posts\/2668\/revisions"}],"wp:attachment":[{"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/media?parent=2668"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/categories?post=2668"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/peta.okechan.net\/blog\/wp-json\/wp\/v2\/tags?post=2668"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}