graphai.core.text package
- class graphai.core.text.ConceptsGraph
Bases:
object
- load_from_db()
- get_ontology_concepts()
- add_graph_score(results, smoothing=True)
Computes graph_score for the provided DataFrame containing concepts.
- Parameters:
results (pd.DataFrame) – A pandas DataFrame including the column [‘concept_id’].
smoothing (bool) – Whether to apply a transformation to the graph_score that bumps scores to avoid
Default (a negative exponential shape.) – True.
- Returns:
A pandas DataFrame with the original columns plus [‘graph_score’].
- Return type:
pd.DataFrame
- add_ontology_scores(results, smoothing=True)
Computes ontology_local_score and ontology_global_score for the provided DataFrame containing keywords and concepts.
- Parameters:
results (pd.DataFrame) – A pandas DataFrame including the columns [‘keywords’, ‘concept_id’].
smoothing (bool) – Whether to apply a transformation to the ontology scores that pushes scores away from 0.5. Default: True.
- Returns:
A pandas DataFrame with the original columns plus [‘ontology_local_score’, ‘ontology_global_score’].
- Return type:
pd.DataFrame
- graphai.core.text.extract_keywords(text, use_nltk=False)
Extracts keywords from the given text, after normalising it (solving encoding problems, stripping HTML, lowercasing, etc.).
- Parameters:
text (str) – Text to extract keywords from.
use_nltk (bool) – Whether to use nltk-rake for keyword extraction, otherwise python-rake is used. Default: False.
- Returns:
A list containing the keywords extracted from the text.
- Return type:
list[str]
Examples
>>> text = ' '.join([ >>> "<p>", >>> "Then a crowd a young boys they're a foolin' around in the corner", >>> "Drunk and dressed in their best brown baggies and their platform soles", >>> "They don't give a damn about any trumpet playin' band", >>> "It ain't what they call 'rock and roll'", >>> "</p>" >>> ]) >>> extract_keywords(text) ['brown baggies', 'young boys', 'trumpet playin', 'corner drunk', 'platform soles']
- graphai.core.text.wikisearch(keywords_list, es, fraction=(0, 1), method='es-base')
Finds 10 relevant concepts (Wikipedia pages) for each set of keywords in a list.
- Parameters:
keywords_list (list(str)) – List containing the sets of keywords for which to search concepts.
es (ESConceptDetection) – Elasticsearch interface.
fraction (tuple(int, int)) – Portion of the keywords_list to be processed, e.g. (1/3, 2/3) means only
considered. (the middle third of the list is) –
method (str) – Method to retrieve the concepts (Wikipedia pages). It can be either “wikipedia-api”, to use the
API (Wikipedia) –
{"es-base" (or one of) –
"es-score"} –
elasticsearch. (to use) –
- Returns:
A pandas DataFrame with columns [‘keywords’, ‘concept_id’, ‘concept_name’, ‘searchrank’, ‘search_score’], unique by (‘keywords’, ‘concept_id’). The searchrank is the position of the concept in the list of results for that set of keywords, starting with 1. The search score is the elasticsearch score for method “es-score” or 1 - (searchrank - 1)/n for the other methods. Default: ‘es-base’. Fallback: ‘wikipedia-api’.
- Return type:
pd.DataFrame
- graphai.core.text.compute_scores(results, graph, restrict_to_ontology=False, score_smoothing=True, aggregation_coef=0.5, filtering_threshold=0.15, refresh_scores=True)
Gathers wikisearch results, computes several scores for them, and finally aggregates and filters them.
- Parameters:
results (pd.DataFrame) – A pandas DataFrame with columns [‘keywords’, ‘concept_id’, ‘concept_name’, ‘searchrank’, ‘search_score’].
graph (ConceptsGraph) – The concepts graph and ontology object.
restrict_to_ontology (bool) – Whether to filter concepts that are not in the ontology. Default: False.
score_smoothing (bool) – Whether to apply a transformation to some scores to distribute them more evenly in [0, 1]. Default: True.
aggregation_coef (float) – A number in [0, 1] that controls how the scores of the aggregated pages are computed.
Keywords (A value of 0 takes the sum of scores over) –
[0 (then normalises in) –
Keywords. (1]. A value of 1 takes the max of scores over) –
Default (from that score's perspective.) – 0.5.
filtering_threshold (float) – A number in [0, 1] that is used as a threshold for all the scores to decide whether the page is good enough
Default – 0.15.
refresh_scores (bool) – Whether to recompute scores after filtering. Default: True.
- Returns:
A pandas DataFrame with columns [‘concept_id’, ‘concept_name’] and a column ‘x_score’ for each score, including a ‘mixed_score’ with a weighted average of the other scores.
- Return type:
pd.DataFrame
- graphai.core.text.draw_ontology(results, graph, level=3)
Draws the ontology neighbourhood induced by the given set of wikify results. The resulting svg is not returned but stored in /tmp/file.svg.
- Parameters:
results (pd.DataFrame) – A pandas DataFrame with columns [‘concept_id’, ‘concept_name’, ‘search_score’,
'levenshtein_score' –
'graph_score' –
'ontology_local_score' –
'ontology_global_score' –
'keywords_score' –
'mixed_score']. –
graph (ConceptsGraph) – The concepts graph and ontology object.
level (int) – Level up to which the visualisation considers categories. Default: 3.
- graphai.core.text.draw_graph(results, graph, concept_score_threshold=0.3, edge_threshold=0.3, min_component_size=3)
Draws the concept graph neighbourhood induced by the given set of wikify results. The resulting svg is not returned but stored in /tmp/file.svg.
- Parameters:
results (pd.DataFrame) – A pandas DataFrame with columns [‘concept_id’, ‘concept_name’, ‘search_score’,
'levenshtein_score' –
'graph_score' –
'ontology_local_score' –
'ontology_global_score' –
'keywords_score' –
'mixed_score']. –
graph (ConceptsGraph) – The concepts graph and ontology object.
concept_score_threshold (float) – Score threshold below which concepts are filtered out. Default: 0.3.
edge_threshold (float) – Score threshold below which edges are filtered out. Default: 0.3.
min_component_size (int) – Size threshold below which connected components are filtered out. Default: 3.
- graphai.core.text.generate_exercise(data)
Makes request to Chatbot API to generate an exercise.