python - How to optimize converting DataFrame to dict? -


i have pd.dataframe need converted dictionary. here's example dataframe (call mydf):

   user_id    colors 0  1000       red 1  1000       yellow 2  1000       blue 3  2000       yellow 4  2000       green 

i keys in dictionary distinct user_id values (in case 1000 , 2000). need values subset of dataframe corresponding respective key. what's fastest way convert dictionary when call

mydict[1000] 

it returns

   user_id    colors 0  1000       red 1  1000       yellow 2  1000       blue 

?

i'm seeking alternative calling

mydf[mydf['user_id']==1000] 

because .csv super large , think optimize lookup. other suggestions appreciated!

my current solution below, i'm looking alternatives because takes 40 minutes build on 1.1gb .csv.

mydict = {}  idx, row in mydf.iterrows():      if row['user_id'] not in mydict:          mydict[row['user_id']] = [mydf.loc[idx]]      else:          mydict[row['user_id']].append(mydf.loc[idx]) 

here go:

in [6]: {k: v.colors.tolist() k, v in df.groupby('user_id')} out[6]: {1000: ['red', 'yellow', 'blue'], 2000: ['yellow', 'green']} 

no performance guarantees though. tolist may bit slow.


Comments

Popular posts from this blog

php - Submit Form Data without Reloading page -

linux - Rails running on virtual machine in Windows -