multiprocessing - Fastest way to extract tar files using Python -
i have extract hundreds of tar.bz files each size of 5gb. tried following code:
import tarfile multiprocessing import pool files = glob.glob('d:\\*.tar.bz') ##all files in d f in files: tar = tarfile.open (f, 'r:bz2') pool = pool(processes=5) pool.map(tar.extractall('e:\\') ###i want extract them in e tar.close()
but code has type error: typeerror: map() takes @ least 3 arguments (2 given)
how can solve it? further ideas accelerate extracting?
you need change pool.map(tar.extractall('e:\\')
pool.map(tar.extractall(),"list_of_all_files")
note map()
takes 2 argument first function , second iterable , , apply function every item of iterable , return list of results.
edit : need pass tarinfo
object other process :
def test_multiproc(): files = glob.glob('d:\\*.tar.bz2') pool = pool(processes=5) result = pool.map(read_files, files) def read_files(name): t = tarfile.open (name, 'r:bz2') t.extractall('e:\\') t.close() >>>test_multiproc()
Comments
Post a Comment