graphai.core.text.keywords module

graphai.core.text.keywords.word_tokens(text)

Generates all possible word tokens from a sentence.

Parameters:

text (str) – String containing words separated by spaces.

Returns:

A list with all the possible word tokens for the given sentence.

Return type:

list[str]

Examples

>>> word_tokens('how are you')
['how', 'are', 'you', 'how are', 'are you', 'how are you']
graphai.core.text.keywords.rake_extract(text, use_nltk, split_words=False, return_scores=False, threshold='auto', filter_past_tenses_and_adverbs=False)

Extracts keywords from unconstrained text using python-rake or nltk-rake.

Parameters:
  • text (str) – Text from which to extract the keywords.

  • use_nltk (bool) – Whether to use nltk-rake for keyword extraction, otherwise python-rake is used.

  • split_words (bool) – If True, keywords with more than one word are split. Default: False.

  • return_scores (bool) – If True, keywords are retured in a tuple with their RAKE score. Default: False.

  • threshold (float or 'auto') – Minimal RAKE score below which extracted keywords are ignored. Default: ‘auto’, which translates to 10% of the maximum score.

  • filter_past_tenses_and_adverbs (bool) – Whether to filter out words in keywords which are past tenses, past participles or adverbs. Default: False.

Returns:

A list of
  • str: Keywords, if split_words is True or return_scores is False.

  • tuple(str, float): A pair representing keywords and score, otherwise.

Return type:

list[str] or list[tuple(str, float)]

Examples

>>> text = ' '.join([
>>>     "Then a crowd a young boys they're a foolin' around in the corner",
>>>     "Drunk and dressed in their best brown baggies and their platform soles",
>>>     "They don't give a damn about any trumpet playin' band",
>>>     "It ain't what they call 'rock and roll'"
>>> ])
>>> rake_extract(text, use_nltk=False)
['brown baggies', 'young boys', 'trumpet playin', 'corner drunk', 'platform soles']
graphai.core.text.keywords.extract_keywords(text, use_nltk=False)

Extracts keywords from the given text, after normalising it (solving encoding problems, stripping HTML, lowercasing, etc.).

Parameters:
  • text (str) – Text to extract keywords from.

  • use_nltk (bool) – Whether to use nltk-rake for keyword extraction, otherwise python-rake is used. Default: False.

Returns:

A list containing the keywords extracted from the text.

Return type:

list[str]

Examples

>>> text = ' '.join([
>>>     "<p>",
>>>     "Then a crowd a young boys they're a foolin' around in the corner",
>>>     "Drunk and dressed in their best brown baggies and their platform soles",
>>>     "They don't give a damn about any trumpet playin' band",
>>>     "It ain't what they call 'rock and roll'",
>>>     "</p>"
>>> ])
>>> extract_keywords(text)
['brown baggies', 'young boys', 'trumpet playin', 'corner drunk', 'platform soles']