graphai.core.text.keywords module
- graphai.core.text.keywords.word_tokens(text)
Generates all possible word tokens from a sentence.
- Parameters:
text (str) – String containing words separated by spaces.
- Returns:
A list with all the possible word tokens for the given sentence.
- Return type:
list[str]
Examples
>>> word_tokens('how are you') ['how', 'are', 'you', 'how are', 'are you', 'how are you']
- graphai.core.text.keywords.rake_extract(text, use_nltk, split_words=False, return_scores=False, threshold='auto', filter_past_tenses_and_adverbs=False)
Extracts keywords from unconstrained text using python-rake or nltk-rake.
- Parameters:
text (str) – Text from which to extract the keywords.
use_nltk (bool) – Whether to use nltk-rake for keyword extraction, otherwise python-rake is used.
split_words (bool) – If True, keywords with more than one word are split. Default: False.
return_scores (bool) – If True, keywords are retured in a tuple with their RAKE score. Default: False.
threshold (float or 'auto') – Minimal RAKE score below which extracted keywords are ignored. Default: ‘auto’, which translates to 10% of the maximum score.
filter_past_tenses_and_adverbs (bool) – Whether to filter out words in keywords which are past tenses, past participles or adverbs. Default: False.
- Returns:
- A list of
str: Keywords, if split_words is True or return_scores is False.
tuple(str, float): A pair representing keywords and score, otherwise.
- Return type:
list[str] or list[tuple(str, float)]
Examples
>>> text = ' '.join([ >>> "Then a crowd a young boys they're a foolin' around in the corner", >>> "Drunk and dressed in their best brown baggies and their platform soles", >>> "They don't give a damn about any trumpet playin' band", >>> "It ain't what they call 'rock and roll'" >>> ]) >>> rake_extract(text, use_nltk=False) ['brown baggies', 'young boys', 'trumpet playin', 'corner drunk', 'platform soles']
- graphai.core.text.keywords.extract_keywords(text, use_nltk=False)
Extracts keywords from the given text, after normalising it (solving encoding problems, stripping HTML, lowercasing, etc.).
- Parameters:
text (str) – Text to extract keywords from.
use_nltk (bool) – Whether to use nltk-rake for keyword extraction, otherwise python-rake is used. Default: False.
- Returns:
A list containing the keywords extracted from the text.
- Return type:
list[str]
Examples
>>> text = ' '.join([ >>> "<p>", >>> "Then a crowd a young boys they're a foolin' around in the corner", >>> "Drunk and dressed in their best brown baggies and their platform soles", >>> "They don't give a damn about any trumpet playin' band", >>> "It ain't what they call 'rock and roll'", >>> "</p>" >>> ]) >>> extract_keywords(text) ['brown baggies', 'young boys', 'trumpet playin', 'corner drunk', 'platform soles']