python - How do CPython and PyPy decide when to resize a set? -
when adding elements sets on cpython , pypy, when resized, , sizes of underlying container?
this question similar in principle max_load_factor
, as c++ describes unordered_map
.
cpython uses this check decide when resize:
if (!(so->used > n_used && so->fill*3 >= (so->mask+1)*2))
this means when 2/3 full, container resize.
the resizing doubles amount of space large sets , quadruples small ones:
return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4);
armin rigo points out in comments pypy implements sets dictionaries, relevant resizing code is:
jit.conditional_call(d.resize_counter <= x * 3, _ll_dict_resize_to, d, num_extra)
this same strategy, resize_counter
empty space left in dictionary.
before pointed out, resorted benchmarking. can detect resizes looking small pauses. random small sizes, have careful remove noise. here's script that:
from collections import counter import time iterations = 100 internal_iterations = 100000 def main(): resizes = counter() in range(iterations): print("completion: [{:3.0%}]\r".format(i/iterations), end="") s = set() maxtime_so_far = 0 j in range(internal_iterations): start = time.time() s.add(j) end = time.time() if end-start > maxtime_so_far: maxtime_so_far = end-start resizes[j] += 1 print() format_string = "{0:<6} = 0b{0:<18b} [found %: {1:2.0%}]" index in sorted(resizes): chance = resizes[index] / iterations if chance >= 0.05: print(format_string.format(index, chance)) main()
and here's output cpython:
completion: [99%] 0 = 0b0 [found %: 100%] 5 = 0b101 [found %: 83%] 21 = 0b10101 [found %: 12%] 85 = 0b1010101 [found %: 94%] 341 = 0b101010101 [found %: 97%] 1365 = 0b10101010101 [found %: 100%] 5461 = 0b1010101010101 [found %: 100%] 21845 = 0b101010101010101 [found %: 100%] 87381 = 0b10101010101010101 [found %: 100%]
you can see 10101...₂
pattern, power of 2 divided three, when object resize. (it's resizes 1 after that, because script 0-indexed).
on pypy3, changing number of iterations greater (iterations = 1000; internal_iterations = 100000
), get
completion: [100%] 0 = 0b0 [found %: 78%] 5 = 0b101 [found %: 6%] 21 = 0b10101 [found %: 5%] 341 = 0b101010101 [found %: 24%] 1365 = 0b10101010101 [found %: 66%] 5461 = 0b1010101010101 [found %: 100%] 21845 = 0b101010101010101 [found %: 100%] 87381 = 0b10101010101010101 [found %: 71%]
this implies strategy same pypy.
strangely, , possibly due jit or gc, more like:
completion: [100%] 0 = 0b0 [found %: 13%] 5 = 0b101 [found %: 11%] 21 = 0b10101 [found %: 22%] 22 = 0b10110 [found %: 6%] 23 = 0b10111 [found %: 5%] 24 = 0b11000 [found %: 5%] 341 = 0b101010101 [found %: 30%] 1365 = 0b10101010101 [found %: 66%] 5461 = 0b1010101010101 [found %: 98%]
depending on iteration counts. imagine due relatively small number of iterations around point, , doesn't mean much. if gc collection happens around item 20, it's liable cause sort of stress.
Comments
Post a Comment