python - How do CPython and PyPy decide when to resize a set? -


when adding elements sets on cpython , pypy, when resized, , sizes of underlying container?

this question similar in principle max_load_factor, as c++ describes unordered_map.

cpython uses this check decide when resize:

if (!(so->used > n_used && so->fill*3 >= (so->mask+1)*2)) 

this means when 2/3 full, container resize.

the resizing doubles amount of space large sets , quadruples small ones:

return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4); 

armin rigo points out in comments pypy implements sets dictionaries, relevant resizing code is:

jit.conditional_call(d.resize_counter <= x * 3,                      _ll_dict_resize_to, d, num_extra) 

this same strategy, resize_counter empty space left in dictionary.


before pointed out, resorted benchmarking. can detect resizes looking small pauses. random small sizes, have careful remove noise. here's script that:

from collections import counter import time  iterations = 100 internal_iterations = 100000  def main():     resizes = counter()      in range(iterations):         print("completion: [{:3.0%}]\r".format(i/iterations), end="")          s = set()         maxtime_so_far = 0         j in range(internal_iterations):             start = time.time()             s.add(j)             end = time.time()              if end-start > maxtime_so_far:                 maxtime_so_far = end-start                 resizes[j] += 1      print()      format_string = "{0:<6} = 0b{0:<18b} [found %: {1:2.0%}]"      index in sorted(resizes):         chance = resizes[index] / iterations          if chance >= 0.05:             print(format_string.format(index, chance))  main() 

and here's output cpython:

completion: [99%] 0      = 0b0                  [found %: 100%] 5      = 0b101                [found %: 83%] 21     = 0b10101              [found %: 12%] 85     = 0b1010101            [found %: 94%] 341    = 0b101010101          [found %: 97%] 1365   = 0b10101010101        [found %: 100%] 5461   = 0b1010101010101      [found %: 100%] 21845  = 0b101010101010101    [found %: 100%] 87381  = 0b10101010101010101  [found %: 100%] 

you can see 10101...₂ pattern, power of 2 divided three, when object resize. (it's resizes 1 after that, because script 0-indexed).

on pypy3, changing number of iterations greater (iterations = 1000; internal_iterations = 100000), get

completion: [100%] 0      = 0b0                  [found %: 78%] 5      = 0b101                [found %: 6%] 21     = 0b10101              [found %: 5%] 341    = 0b101010101          [found %: 24%] 1365   = 0b10101010101        [found %: 66%] 5461   = 0b1010101010101      [found %: 100%] 21845  = 0b101010101010101    [found %: 100%] 87381  = 0b10101010101010101  [found %: 71%] 

this implies strategy same pypy.

strangely, , possibly due jit or gc, more like:

completion: [100%] 0      = 0b0                  [found %: 13%] 5      = 0b101                [found %: 11%] 21     = 0b10101              [found %: 22%] 22     = 0b10110              [found %: 6%] 23     = 0b10111              [found %: 5%] 24     = 0b11000              [found %: 5%] 341    = 0b101010101          [found %: 30%] 1365   = 0b10101010101        [found %: 66%] 5461   = 0b1010101010101      [found %: 98%] 

depending on iteration counts. imagine due relatively small number of iterations around point, , doesn't mean much. if gc collection happens around item 20, it's liable cause sort of stress.


Comments

Popular posts from this blog

php - Submit Form Data without Reloading page -

linux - Rails running on virtual machine in Windows -