python - Why does this take so long to match? Is it a bug? -
i need match urls in web application, i.e. /123,456,789, , wrote regex match pattern:
r'(\d+(,)?)+/$' i noticed not seem evaluate, after several minutes when testing pattern:
re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523') the expected result there no matches.
this expression, however, executes (note trailing slash):
re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523/') is bug?
there catastrophic backtracking going on cause exponential amount of processing depending on how long non-match string is. has nested repetitions , optional comma (even though regex engines can determine wouldn't match attempting of extraneous repetition). solved optimizing expression.
the easiest way accomplish 1+ digits or commas followed slash , end of string: [\d,]+/$. however, not perfect since allow ,123,,4,5/.
for can use optimized version of initial try: (?:\d,?)+/$. first, made repeating group non-capturing ((?:...)) isn't necessary provides "cleaner match". next, , crucial step, stopped repeating \d inside of group since group repeating. finally, removed unnecessary group around optional , since ? affects last character. pretty 1 digit, maybe comma, repeat, , followed trailing /.
this can still match odd string 1,2,3,/, heck of improved original regex negative lookbehind: (?:\d,?)+(?<!,)/$. assert there no comma directly before trailing /.
Comments
Post a Comment