graphai.core.ontology.data module

graphai.core.ontology.data.db_results_to_pandas_df(results, cols)

graphai.core.ontology.data.get_id_dict(ids)

graphai.core.ontology.data.make_adj_undirected(graph_adj): Makes a directed graph undirected by making the adjacency matrix symmetric :param graph_adj: Adjacency matrix :return: Undirected graph adjacency matrix

graphai.core.ontology.data.derive_col_to_col_graph(orig_adj): Derives the adjacency matrix of the graph induced on the columns of the original adjacency matrix through its rows. :param orig_adj: Original adjacency matrix :return: A^T.A

graphai.core.ontology.data.get_col_to_col_dict(df, source_col, target_col): Gets a dictionary mapping the elements of one dataframe column to the elements of another :param df: The dataframe :param source_col: Source column (keys) :param target_col: Target column (values) :return: The dictionary

graphai.core.ontology.data.return_chosen_indices(base_list, indices)

graphai.core.ontology.data.remove_invalid_pairs(l_main, l_secondary_1, l_secondary_2, ref_dict): Takes two lists that refer to the rows and columns of a matrix plus a reference dictionary, and only keeps those indices of the two lists whose elements in the “main” list appear in the reference dictionary. In other words, eliminates the row-col or col-row pairs whose row/col (respectively) does not appear in the reference dictionary. :param l_main: The main list, which will be checked against the dictionary :param l_secondary: The secondary list :param ref_dict: The reference dictionary :return: Two cleaned up lists in the order that they were provided in

graphai.core.ontology.data.create_graph_from_df(df, source_col, target_col, weight_col=None, col_dict=None, row_dict=None, pool_rows_and_cols=False, make_symmetric=False): Takes a dataframe containing the edges for a graph, and turns it into a directed or undirected graph (represented by an adjacency matrix) :param df: The dataframe :param source_col: The column for the source nodes :param target_col: The column for the target nodes :param weight_col: The column for edge weights :param col_dict: Precomputed dictionary for the adj matrix columns (targets), optional :param row_dict: Precomputed dictionary for the adj matrix rows (sources), optional :param pool_rows_and_cols: Whether to pool together the rows and columns, used for Wikipedia concept :param make_symmetric: Whether to make the graph undirected :return: The adjacency matrix, row dictionary, and column dictionary (id to index)

graphai.core.ontology.data.convert_to_csr_matrix(g): Converts a given matrix or ndarray to a CSR matrix :param g: Matrix or ndarray :return: CSR matrix

graphai.core.ontology.data.to_ndarray_and_flatten(a)

graphai.core.ontology.data.adjusted_exp(x, overlap=5.0)

graphai.core.ontology.data.adjusted_exp_slope_1_point(overlap=5.0)

graphai.core.ontology.data.compute_average(score, n, avg)

graphai.core.ontology.data.ensure_nonzero_denominator(v)

graphai.core.ontology.data.compute_average_of_df(df, avg)

graphai.core.ontology.data.average_and_combine(s1, s2, l1, l2, avg, coeffs, skip_empty=False)

graphai.core.ontology.data.embeddings_table_exists()

graphai.core.ontology.data.execute_single_entity_concepts_and_anchors_query(concepts_query, anchors_query, entity_id)

graphai.core.ontology.data.execute_multi_entity_concepts_and_anchors_query(concepts_query, anchors_query, entity_ids)

graphai.core.ontology.data.combine_concept_and_anchor_scores(concepts_query, anchors_query, entity_id, avg, coeffs, top_n, d4_cat_id_to_index, concept_lengths, anchor_lengths)

class graphai.core.ontology.data.OntologyData(test_mode=False, **kwargs)

Bases: object

load_data()

load_ontology_concept_names()

load_ontology_categories()

load_non_ontology_concept_names()

load_concept_concept_graphscore()

load_category_category()

load_category_concept()

load_anchor_page_dict(): Loads category to anchor page list dictionary using the direct category-anchor table and the category child-parent relations table. :returns: None

compute_symmetric_concept_concept_matrix(): Loads the concept-concept matrix and creates its index dictionary :returns: None

compute_precalculated_similarity_matrices(): Precomputes the matrices and index dictionaries for similarities between concepts, clusters, and categories. :returns: None

get_concept_concept_similarity(concept_1_id, concept_2_id)

Returns the similarity score between two concepts :param concept_1_id: ID of concept 1 :param concept_2_id: ID of concept 2

Returns:: Similarity score

get_concept_cluster_similarity(concept_id, cluster_id, avg='linear')

Returns the similarity score between a concept and a cluster :param concept_id: ID of concept :param cluster_id: ID of cluster :param avg: Averaging method

Returns:: Similarity score

get_cluster_cluster_similarity(cluster_1_id, cluster_2_id, avg='linear')

Returns the similarity score between two clusters :param cluster_1_id: ID of cluster 1 :param cluster_2_id: ID of cluster 2 :param avg: Averaging method

Returns:: Similarity score

get_concept_category_similarity(concept_id, category_id, avg='linear', coeffs=(1, 4))

Returns the similarity score between a concept and a category :param concept_id: ID of cluster :param category_id: ID of category :param avg: Averaging method :param coeffs: Coefficients for anchors and concepts

Returns:: Similarity score

get_cluster_category_similarity(cluster_id, category_id, avg='linear', coeffs=(1, 4))

Returns the similarity score between a cluster and a category :param cluster_id: ID of cluster :param category_id: ID of category :param avg: Averaging method :param coeffs: Coefficients for anchors and concepts

Returns:: Similarity score

get_category_category_similarity(category_1_id, category_2_id, avg='linear', coeffs=(1, 4))

Returns the similarity score between two categories :param category_1_id: ID of category 1 :param category_2_id: ID of category 2 :param avg: Averaging method :param coeffs: Coefficients for anchors and concepts

Returns:: Similarity score

get_concept_closest_concept(concept_id, top_n=1)

Finds the closest concept to a given concept :param concept_id: Concept ID :param top_n: Number of top concepts to return

Returns:: Top concepts and their scores

get_concept_closest_concept_embedding(concept_id, top_n=1)

get_concept_closest_category(concept_id, avg='log', coeffs=(1, 10), top_n=1, use_depth_3=False, return_clusters=None, adaptive_threshold=None)

Finds the closest category to a given concept :param concept_id: Concept ID :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4 :param return_clusters: Number of top clusters to return for each top category. If None, clusters are

not returned.

Returns:: Top categories, their scores, parent depth-3 category if use_depth_3==True, and top clusters of each top category if return_clusters is not None.

get_concept_category_closest_embedding(concept_id, avg='log', coeffs=(1, 10), top_n=5, return_clusters=None)

get_cluster_closest_category(cluster_id, avg='log', coeffs=(1, 10), top_n=1, use_depth_3=False)

Finds the closest category to a given cluster :param cluster_id: Cluster ID :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4

Returns:: Top categories, their scores, and parent depth-3 category if use_depth_3==True.

get_custom_cluster_closest_category(concept_ids, avg='log', coeffs=(1, 10), top_n=1, use_depth_3=False)

Finds the closest category to a custom cluster, provided as a list of concepts :param concept_ids: List of concept IDs :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4

Returns:: Top categories, their scores, and parent depth-3 category if use_depth_3==True.

get_cluster_closest_category_embedding(cluster_id, avg='log', coeffs=(1, 10), top_n=1)

Finds the closest category to a custom cluster, provided as a list of concepts :param concept_ids: List of concept IDs :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4

Returns:: Top categories, their scores, and parent depth-3 category if use_depth_3==True.

get_custom_cluster_closest_category_embedding(concept_ids, avg='log', coeffs=(1, 10), top_n=1)

Finds the closest category to a custom cluster, provided as a list of concepts :param concept_ids: List of concept IDs :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4

Returns:: Top categories, their scores, and parent depth-3 category if use_depth_3==True.

get_category_closest_category(category_id, avg='log', coeffs=(1, 10), top_n=1, use_depth_3=False)

Finds the closest category to a given category. As with the category-category similarity method, the similarity is composed of between-anchor and between-concept similarity, and there is no anchor-concept crossover. :param category_id: Category ID :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4

Returns:: Top categories, their scores, and parent depth-3 category if use_depth_3==True.

get_ontology_concept_names_table(concepts_to_keep=None)

get_ontology_category_names()

get_ontology_category_info(cat_id)

get_non_ontology_concept_names()

get_concept_concept_graphscore_table(concepts_to_keep=None)

get_category_to_category()

get_category_parent(child_id)

get_category_branch(category_id)

get_category_children(parent_id)

get_cluster_parent(cluster_id)

get_cluster_children(cluster_id)

get_concept_parent_category(concept_id)

get_concept_parent_cluster(concept_id)

get_category_cluster_list(cat_id)

get_category_concept_list(cat_id)

get_cluster_concept_list(cluster_id)

get_category_concept_table(concepts_to_keep=None)

get_category_cluster_table()

get_category_anchor_pages(category_id)

get_cluster_concepts(cluster_id)

get_concept_name(concept_id)

get_concept_names_list(concept_ids)

get_test_concept_names()

get_test_category_concept()

get_test_cluster_concept()

get_root_category()

generate_tree_structure(start=None)

generate_category_concept_dict()