graphai.core.utils.text.clean module
- class graphai.core.utils.text.clean.HTMLCleaner
Bases:
HTMLParser
Class to parse and clean HTML tags from raw text.
- handle_starttag(tag, attrs)
- handle_endtag(tag)
- handle_data(d)
- get_data()
- graphai.core.utils.text.clean.normalize(text)
Normalizes the given text by solving encoding problems, deleting URLs, emails, cleaning HTML tags and converting to lowercase.
- Parameters:
text (str) – Text to be normalized.
- Returns:
Normalized text.
- Return type:
str