python - Why does this take so long to match? Is it a bug? -


i need match urls in web application, i.e. /123,456,789, , wrote regex match pattern:

r'(\d+(,)?)+/$' 

i noticed not seem evaluate, after several minutes when testing pattern:

re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523') 

the expected result there no matches.

this expression, however, executes (note trailing slash):

re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523/') 

is bug?

there catastrophic backtracking going on cause exponential amount of processing depending on how long non-match string is. has nested repetitions , optional comma (even though regex engines can determine wouldn't match attempting of extraneous repetition). solved optimizing expression.


the easiest way accomplish 1+ digits or commas followed slash , end of string: [\d,]+/$. however, not perfect since allow ,123,,4,5/.

for can use optimized version of initial try: (?:\d,?)+/$. first, made repeating group non-capturing ((?:...)) isn't necessary provides "cleaner match". next, , crucial step, stopped repeating \d inside of group since group repeating. finally, removed unnecessary group around optional , since ? affects last character. pretty 1 digit, maybe comma, repeat, , followed trailing /.


this can still match odd string 1,2,3,/, heck of improved original regex negative lookbehind: (?:\d,?)+(?<!,)/$. assert there no comma directly before trailing /.


Comments

Popular posts from this blog

php - Submit Form Data without Reloading page -

linux - Rails running on virtual machine in Windows -