parallelize recursion python -

May 15, 2012

i working big dataset now. input 4 different datasets , have apply particular function each dataset. have done read 4 dataset , apply function in parallel each dataset using pool.map. have parent , 4 child process. fine till this.

q1. happens inside each process. in function applying on each dataset, comparing each tuple other tuples, kind of recursion. there way make parallel, because comparison may take long time since dataset big. how make because child process? possible parallelize again within child process, because have more processors, want utilize it.

q2. have in mind parallelization of recursive task is, if comparing tuple x tuple y( every tuple other tuple), can make chunks x , each chunk comparison y. guess can done 2 'for loops'. suggestions how this?

re: q1, if you're creating child processes using multiprocessing.pool, no, worker processes cannot have children. attempting create 1 raise exception:

assertionerror: daemonic processes not allowed have children

the reason stated pretty - processes in pool daemonic, , daemonic processes can't have children. reason terminating parent process terminate daemonic children, daemonic children not able terminate their children, leave behind orphaned processes. stated in documentation:

note daemonic process not allowed create child processes. otherwise daemonic process leave children orphaned if gets terminated when parent process exits.

you can around parent processes creating set of non-daemonic process objects, rather using pool. then, each child can create own multiprocessing.pool:

import multiprocessing  def subf(x):     print "in subf"  def f(x):     print "in f"     p = multiprocessing.pool(2)     p.map(subf, range(2))   if __name__ == "__main__":     processes = []     in range(2):         proc = multiprocessing.process(target=f, args=(i,))         proc.start()         processes.append(proc)

output:

in f in f in subf in subf in subf in subf

this approach seems work ok you, since initial dataset contains 4 items. can create 1 process per item in dataset, , still have free cpus spare each sub-process use in small pool.

re: q2, sounds use itertools.product create 1 large iterable of each pair of tuples want compare. can use pool.map parallelize comparing each pair. here's example showing how works:

def f(x):     print(x)  if __name__ == "__main__":     # create 2 lists of tuples, use-case     x = zip(range(3), range(3,6))     y = zip(range(6, 9), range(9, 12))      pool = multiprocessing.pool()     pool.map(f, itertools.product(x, y))

output:

((0, 3), (6, 9)) ((0, 3), (7, 10)) ((0, 3), (8, 11)) ((1, 4), (6, 9)) ((1, 4), (7, 10)) ((1, 4), (8, 11)) ((2, 5), (6, 9)) ((2, 5), (8, 11)) ((2, 5), (7, 10))

Search This Blog

UIO

parallelize recursion python -

Comments

Post a Comment

Popular posts from this blog

How to dequeue messages from RabbitMQ in a scheduled time -

Python Kivy ListView: How to delete selected ListItemButton? -

ruby - How do I merge two hashes into a hash of arrays? -