graphai.core.text.scores module

graphai.core.text.scores.compute_levenshtein_score(results)

graphai.core.text.scores.compute_keywords_scores(results, smoothing)

graphai.core.text.scores.compute_mixed_score(results)

graphai.core.text.scores.aggregate_results(results, coef=0.5)

Aggregates a pandas DataFrame of keyword-concept results, i.e. unique by (keywords, concept_id), into a pandas DataFrame of concept results, i.e. unique by concept_id.

Parameters:

results (pd.DataFrame) – A pandas DataFrame with columns [‘keywords’, ‘concept_id’, ‘concept_name’] and a column ‘x_score’ for each score.
coef (float) – A number in [0, 1] that controls how the scores of the aggregated concepts are computed.
keywords (A value of 0 takes the sum of scores over) –
[0 (then normalises in) –
keywords. (1]. A value of 1 takes the max of scores over) –
Default (Any value in between linearly interpolates those two approaches.) – 0.5.

Returns:

A pandas DataFrame with columns [‘concept_id’, ‘concept_name’] and a column ‘x_score’ for each score.

Return type:

pd.DataFrame

graphai.core.text.scores.filter_results(results, epsilon=0.15)

Filters a DataFrame of concept results depending on their scores, based on some criteria specified in the parameters.

Parameters:

results (pd.DataFrame) – A pandas DataFrame with the column ‘mixed_score’.
epsilon (float) – A number in [0, 1] that is used as a threshold on ‘mixed_score’ to decide whether to keep the concept. Default: 0.15.

Returns:

A DataFrame with the same columns as ‘results’ and a subset of its rows.

Return type:

pd.DataFrame

graphai.core.text.scores.compute_scores(results, graph, restrict_to_ontology=False, score_smoothing=True, aggregation_coef=0.5, filtering_threshold=0.15, refresh_scores=True)

Gathers wikisearch results, computes several scores for them, and finally aggregates and filters them.

Parameters:

results (pd.DataFrame) – A pandas DataFrame with columns [‘keywords’, ‘concept_id’, ‘concept_name’, ‘searchrank’, ‘search_score’].
graph (ConceptsGraph) – The concepts graph and ontology object.
restrict_to_ontology (bool) – Whether to filter concepts that are not in the ontology. Default: False.
score_smoothing (bool) – Whether to apply a transformation to some scores to distribute them more evenly in [0, 1]. Default: True.
aggregation_coef (float) – A number in [0, 1] that controls how the scores of the aggregated pages are computed.
Keywords (A value of 0 takes the sum of scores over) –
[0 (then normalises in) –
Keywords. (1]. A value of 1 takes the max of scores over) –
Default (from that score's perspective.) – 0.5.
filtering_threshold (float) – A number in [0, 1] that is used as a threshold for all the scores to decide whether the page is good enough
Default – 0.15.
refresh_scores (bool) – Whether to recompute scores after filtering. Default: True.

Returns:

A pandas DataFrame with columns [‘concept_id’, ‘concept_name’] and a column ‘x_score’ for each score, including a ‘mixed_score’ with a weighted average of the other scores.

Return type:

pd.DataFrame