python - Why does this take so long to match? Is it a bug? -
i need match urls in web application, i.e. /123,456,789
, , wrote regex match pattern:
r'(\d+(,)?)+/$'
i noticed not seem evaluate, after several minutes when testing pattern:
re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523')
the expected result there no matches.
this expression, however, executes (note trailing slash):
re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523/')
is bug?
there catastrophic backtracking going on cause exponential amount of processing depending on how long non-match string is. has nested repetitions , optional comma (even though regex engines can determine wouldn't match attempting of extraneous repetition). solved optimizing expression.
the easiest way accomplish 1+ digits or commas followed slash , end of string: [\d,]+/$
. however, not perfect since allow ,123,,4,5/
.
for can use optimized version of initial try: (?:\d,?)+/$
. first, made repeating group non-capturing ((?:...)
) isn't necessary provides "cleaner match". next, , crucial step, stopped repeating \d
inside of group since group repeating. finally, removed unnecessary group around optional ,
since ?
affects last character. pretty 1 digit, maybe comma, repeat, , followed trailing /
.
this can still match odd string 1,2,3,/
, heck of improved original regex negative lookbehind: (?:\d,?)+(?<!,)/$
. assert there no comma directly before trailing /
.
Comments
Post a Comment