parallelize recursion python -
i working big dataset now. input 4 different datasets , have apply particular function each dataset. have done read 4 dataset , apply function in parallel each dataset using pool.map. have parent , 4 child process. fine till this.
q1. happens inside each process. in function applying on each dataset, comparing each tuple other tuples, kind of recursion. there way make parallel, because comparison may take long time since dataset big. how make because child process? possible parallelize again within child process, because have more processors, want utilize it.
q2. have in mind parallelization of recursive task is, if comparing tuple x tuple y( every tuple other tuple), can make chunks x , each chunk comparison y. guess can done 2 'for loops'. suggestions how this?
re: q1, if you're creating child processes using multiprocessing.pool
, no, worker processes cannot have children. attempting create 1 raise exception:
assertionerror: daemonic processes not allowed have children
the reason stated pretty - processes in pool
daemonic, , daemonic processes can't have children. reason terminating parent process terminate daemonic children, daemonic children not able terminate their children, leave behind orphaned processes. stated in documentation:
note daemonic process not allowed create child processes. otherwise daemonic process leave children orphaned if gets terminated when parent process exits.
you can around parent processes creating set of non-daemonic process
objects, rather using pool
. then, each child can create own multiprocessing.pool
:
import multiprocessing def subf(x): print "in subf" def f(x): print "in f" p = multiprocessing.pool(2) p.map(subf, range(2)) if __name__ == "__main__": processes = [] in range(2): proc = multiprocessing.process(target=f, args=(i,)) proc.start() processes.append(proc)
output:
in f in f in subf in subf in subf in subf
this approach seems work ok you, since initial dataset contains 4 items. can create 1 process
per item in dataset, , still have free cpus spare each sub-process use in small pool
.
re: q2, sounds use itertools.product
create 1 large iterable of each pair of tuples want compare. can use pool.map
parallelize comparing each pair. here's example showing how works:
def f(x): print(x) if __name__ == "__main__": # create 2 lists of tuples, use-case x = zip(range(3), range(3,6)) y = zip(range(6, 9), range(9, 12)) pool = multiprocessing.pool() pool.map(f, itertools.product(x, y))
output:
((0, 3), (6, 9)) ((0, 3), (7, 10)) ((0, 3), (8, 11)) ((1, 4), (6, 9)) ((1, 4), (7, 10)) ((1, 4), (8, 11)) ((2, 5), (6, 9)) ((2, 5), (8, 11)) ((2, 5), (7, 10))
Comments
Post a Comment