python - Modifying a group within Regular Expression Match -
so have function apart of django (v 1.5) model takes body of text , finds of tags, such , converts correct ones user , removes of others.
the below function works requires me use note_tags = '.*?\r\n' because tag group 0 finds of tags regardless of whether user's nickname in there. curious how use groups can remove of un-useful tags without having modify regex.
def format_for_user(self, user): body = self.body note_tags = '<note .*?>.*?</note>\r\n' user_msg = false if not user none: user_tags = '(<note %s>).*?</note>' % user.nickname user_tags = re.compile(user_tags) tag in user_tags.finditer(body): if tag.groups(1): replacement = str(tag.groups(1)[0]) body = body.replace(replacement, '<span>') replacement = str(tag.group(0)[-7:]) body = body.replace(replacement, '</span>') user_msg = true note_tags = '<note .*?>.*?</span>\r\n' note_tags = re.compile(note_tags) tag in note_tags.finditer(body): body = body.replace(tag.group(0), '') return (body, user_msg)
so abarnert correct, shouldn't using regex parse html , instead should use along lines of beautifulsoup.
so used beautifulsoup , resulting code , solves lot of problems regex having.
def format_for_user(self, user): body = self.body soup = beautifulsoup(body) user_msg = false if not user none: user_tags = soup.findall('note', {"class": "%s" % user.nickname}) tag in user_tags: tag.name = 'span' all_tags = soup.findall('note') tag in all_tags: tag.decompose() soup = soup.prettify() return (soup, user_msg)
Comments
Post a Comment