python - Why does this take so long to match? Is it a bug? -

July 15, 2015

i need match urls in web application, i.e. /123,456,789, , wrote regex match pattern:

r'(\d+(,)?)+/$'

i noticed not seem evaluate, after several minutes when testing pattern:

re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523')

the expected result there no matches.

this expression, however, executes (note trailing slash):

re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523/')

is bug?

there catastrophic backtracking going on cause exponential amount of processing depending on how long non-match string is. has nested repetitions , optional comma (even though regex engines can determine wouldn't match attempting of extraneous repetition). solved optimizing expression.

the easiest way accomplish 1+ digits or commas followed slash , end of string: [\d,]+/$. however, not perfect since allow ,123,,4,5/.

for can use optimized version of initial try: (?:\d,?)+/$. first, made repeating group non-capturing ((?:...)) isn't necessary provides "cleaner match". next, , crucial step, stopped repeating \d inside of group since group repeating. finally, removed unnecessary group around optional , since ? affects last character. pretty 1 digit, maybe comma, repeat, , followed trailing /.

this can still match odd string 1,2,3,/, heck of improved original regex negative lookbehind: (?:\d,?)+(?<!,)/$. assert there no comma directly before trailing /.

Search This Blog

UIO

python - Why does this take so long to match? Is it a bug? -

Comments

Post a Comment

Popular posts from this blog

How to dequeue messages from RabbitMQ in a scheduled time -

php - Submit Form Data without Reloading page -

linux - Rails running on virtual machine in Windows -