python - How to optimize converting DataFrame to dict? -
i have pd.dataframe need converted dictionary. here's example dataframe (call mydf):
user_id colors 0 1000 red 1 1000 yellow 2 1000 blue 3 2000 yellow 4 2000 green
i keys in dictionary distinct user_id values (in case 1000 , 2000). need values subset of dataframe corresponding respective key. what's fastest way convert dictionary when call
mydict[1000]
it returns
user_id colors 0 1000 red 1 1000 yellow 2 1000 blue
?
i'm seeking alternative calling
mydf[mydf['user_id']==1000]
because .csv super large , think optimize lookup. other suggestions appreciated!
my current solution below, i'm looking alternatives because takes 40 minutes build on 1.1gb .csv.
mydict = {} idx, row in mydf.iterrows(): if row['user_id'] not in mydict: mydict[row['user_id']] = [mydf.loc[idx]] else: mydict[row['user_id']].append(mydf.loc[idx])
here go:
in [6]: {k: v.colors.tolist() k, v in df.groupby('user_id')} out[6]: {1000: ['red', 'yellow', 'blue'], 2000: ['yellow', 'green']}
no performance guarantees though. tolist
may bit slow.
Comments
Post a Comment