python - How to optimize converting DataFrame to dict? -

April 15, 2015

i have pd.dataframe need converted dictionary. here's example dataframe (call mydf):

   user_id    colors 0  1000       red 1  1000       yellow 2  1000       blue 3  2000       yellow 4  2000       green

i keys in dictionary distinct user_id values (in case 1000 , 2000). need values subset of dataframe corresponding respective key. what's fastest way convert dictionary when call

mydict[1000]

it returns

   user_id    colors 0  1000       red 1  1000       yellow 2  1000       blue

i'm seeking alternative calling

mydf[mydf['user_id']==1000]

because .csv super large , think optimize lookup. other suggestions appreciated!

my current solution below, i'm looking alternatives because takes 40 minutes build on 1.1gb .csv.

mydict = {}  idx, row in mydf.iterrows():      if row['user_id'] not in mydict:          mydict[row['user_id']] = [mydf.loc[idx]]      else:          mydict[row['user_id']].append(mydf.loc[idx])

here go:

in [6]: {k: v.colors.tolist() k, v in df.groupby('user_id')} out[6]: {1000: ['red', 'yellow', 'blue'], 2000: ['yellow', 'green']}

no performance guarantees though. tolist may bit slow.

Search This Blog

UIO

python - How to optimize converting DataFrame to dict? -

Comments

Post a Comment

Popular posts from this blog

How to dequeue messages from RabbitMQ in a scheduled time -

Python Kivy ListView: How to delete selected ListItemButton? -

ruby - How do I merge two hashes into a hash of arrays? -