graphai.core.ontology.data module
- graphai.core.ontology.data.db_results_to_pandas_df(results, cols)
- graphai.core.ontology.data.get_id_dict(ids)
- graphai.core.ontology.data.make_adj_undirected(graph_adj)
Makes a directed graph undirected by making the adjacency matrix symmetric :param graph_adj: Adjacency matrix :return: Undirected graph adjacency matrix
- graphai.core.ontology.data.derive_col_to_col_graph(orig_adj)
Derives the adjacency matrix of the graph induced on the columns of the original adjacency matrix through its rows. :param orig_adj: Original adjacency matrix :return: A^T.A
- graphai.core.ontology.data.get_col_to_col_dict(df, source_col, target_col)
Gets a dictionary mapping the elements of one dataframe column to the elements of another :param df: The dataframe :param source_col: Source column (keys) :param target_col: Target column (values) :return: The dictionary
- graphai.core.ontology.data.return_chosen_indices(base_list, indices)
- graphai.core.ontology.data.remove_invalid_pairs(l_main, l_secondary_1, l_secondary_2, ref_dict)
Takes two lists that refer to the rows and columns of a matrix plus a reference dictionary, and only keeps those indices of the two lists whose elements in the “main” list appear in the reference dictionary. In other words, eliminates the row-col or col-row pairs whose row/col (respectively) does not appear in the reference dictionary. :param l_main: The main list, which will be checked against the dictionary :param l_secondary: The secondary list :param ref_dict: The reference dictionary :return: Two cleaned up lists in the order that they were provided in
- graphai.core.ontology.data.create_graph_from_df(df, source_col, target_col, weight_col=None, col_dict=None, row_dict=None, pool_rows_and_cols=False, make_symmetric=False)
Takes a dataframe containing the edges for a graph, and turns it into a directed or undirected graph (represented by an adjacency matrix) :param df: The dataframe :param source_col: The column for the source nodes :param target_col: The column for the target nodes :param weight_col: The column for edge weights :param col_dict: Precomputed dictionary for the adj matrix columns (targets), optional :param row_dict: Precomputed dictionary for the adj matrix rows (sources), optional :param pool_rows_and_cols: Whether to pool together the rows and columns, used for Wikipedia concept :param make_symmetric: Whether to make the graph undirected :return: The adjacency matrix, row dictionary, and column dictionary (id to index)
- graphai.core.ontology.data.convert_to_csr_matrix(g)
Converts a given matrix or ndarray to a CSR matrix :param g: Matrix or ndarray :return: CSR matrix
- graphai.core.ontology.data.to_ndarray_and_flatten(a)
- graphai.core.ontology.data.adjusted_exp(x, overlap=5.0)
- graphai.core.ontology.data.adjusted_exp_slope_1_point(overlap=5.0)
- graphai.core.ontology.data.compute_average(score, n, avg)
- graphai.core.ontology.data.ensure_nonzero_denominator(v)
- graphai.core.ontology.data.compute_average_of_df(df, avg)
- graphai.core.ontology.data.average_and_combine(s1, s2, l1, l2, avg, coeffs, skip_empty=False)
- graphai.core.ontology.data.embeddings_table_exists()
- graphai.core.ontology.data.execute_single_entity_concepts_and_anchors_query(concepts_query, anchors_query, entity_id)
- graphai.core.ontology.data.execute_multi_entity_concepts_and_anchors_query(concepts_query, anchors_query, entity_ids)
- graphai.core.ontology.data.combine_concept_and_anchor_scores(concepts_query, anchors_query, entity_id, avg, coeffs, top_n, d4_cat_id_to_index, concept_lengths, anchor_lengths)
- class graphai.core.ontology.data.OntologyData(test_mode=False, **kwargs)
Bases:
object
- load_data()
- load_ontology_concept_names()
- load_ontology_categories()
- load_non_ontology_concept_names()
- load_concept_concept_graphscore()
- load_category_category()
- load_category_concept()
- load_anchor_page_dict()
Loads category to anchor page list dictionary using the direct category-anchor table and the category child-parent relations table. :returns: None
- compute_symmetric_concept_concept_matrix()
Loads the concept-concept matrix and creates its index dictionary :returns: None
- compute_precalculated_similarity_matrices()
Precomputes the matrices and index dictionaries for similarities between concepts, clusters, and categories. :returns: None
- get_concept_concept_similarity(concept_1_id, concept_2_id)
Returns the similarity score between two concepts :param concept_1_id: ID of concept 1 :param concept_2_id: ID of concept 2
- Returns:
Similarity score
- get_concept_cluster_similarity(concept_id, cluster_id, avg='linear')
Returns the similarity score between a concept and a cluster :param concept_id: ID of concept :param cluster_id: ID of cluster :param avg: Averaging method
- Returns:
Similarity score
- get_cluster_cluster_similarity(cluster_1_id, cluster_2_id, avg='linear')
Returns the similarity score between two clusters :param cluster_1_id: ID of cluster 1 :param cluster_2_id: ID of cluster 2 :param avg: Averaging method
- Returns:
Similarity score
- get_concept_category_similarity(concept_id, category_id, avg='linear', coeffs=(1, 4))
Returns the similarity score between a concept and a category :param concept_id: ID of cluster :param category_id: ID of category :param avg: Averaging method :param coeffs: Coefficients for anchors and concepts
- Returns:
Similarity score
- get_cluster_category_similarity(cluster_id, category_id, avg='linear', coeffs=(1, 4))
Returns the similarity score between a cluster and a category :param cluster_id: ID of cluster :param category_id: ID of category :param avg: Averaging method :param coeffs: Coefficients for anchors and concepts
- Returns:
Similarity score
- get_category_category_similarity(category_1_id, category_2_id, avg='linear', coeffs=(1, 4))
Returns the similarity score between two categories :param category_1_id: ID of category 1 :param category_2_id: ID of category 2 :param avg: Averaging method :param coeffs: Coefficients for anchors and concepts
- Returns:
Similarity score
- get_concept_closest_concept(concept_id, top_n=1)
Finds the closest concept to a given concept :param concept_id: Concept ID :param top_n: Number of top concepts to return
- Returns:
Top concepts and their scores
- get_concept_closest_concept_embedding(concept_id, top_n=1)
- get_concept_closest_category(concept_id, avg='log', coeffs=(1, 10), top_n=1, use_depth_3=False, return_clusters=None, adaptive_threshold=None)
Finds the closest category to a given concept :param concept_id: Concept ID :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4 :param return_clusters: Number of top clusters to return for each top category. If None, clusters are
not returned.
- Returns:
Top categories, their scores, parent depth-3 category if use_depth_3==True, and top clusters of each top category if return_clusters is not None.
- get_concept_category_closest_embedding(concept_id, avg='log', coeffs=(1, 10), top_n=5, return_clusters=None)
- get_cluster_closest_category(cluster_id, avg='log', coeffs=(1, 10), top_n=1, use_depth_3=False)
Finds the closest category to a given cluster :param cluster_id: Cluster ID :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4
- Returns:
Top categories, their scores, and parent depth-3 category if use_depth_3==True.
- get_custom_cluster_closest_category(concept_ids, avg='log', coeffs=(1, 10), top_n=1, use_depth_3=False)
Finds the closest category to a custom cluster, provided as a list of concepts :param concept_ids: List of concept IDs :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4
- Returns:
Top categories, their scores, and parent depth-3 category if use_depth_3==True.
- get_cluster_closest_category_embedding(cluster_id, avg='log', coeffs=(1, 10), top_n=1)
Finds the closest category to a custom cluster, provided as a list of concepts :param concept_ids: List of concept IDs :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4
- Returns:
Top categories, their scores, and parent depth-3 category if use_depth_3==True.
- get_custom_cluster_closest_category_embedding(concept_ids, avg='log', coeffs=(1, 10), top_n=1)
Finds the closest category to a custom cluster, provided as a list of concepts :param concept_ids: List of concept IDs :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4
- Returns:
Top categories, their scores, and parent depth-3 category if use_depth_3==True.
- get_category_closest_category(category_id, avg='log', coeffs=(1, 10), top_n=1, use_depth_3=False)
Finds the closest category to a given category. As with the category-category similarity method, the similarity is composed of between-anchor and between-concept similarity, and there is no anchor-concept crossover. :param category_id: Category ID :param avg: Averaging method. Options are (‘linear’, ‘log’, and ‘none’) :param coeffs: Coefficients for averaging of the scores anchors and concepts :param top_n: Number of top categories to return :param use_depth_3: Whether to go through depth-3 or directly use depth-4
- Returns:
Top categories, their scores, and parent depth-3 category if use_depth_3==True.
- get_ontology_concept_names_table(concepts_to_keep=None)
- get_ontology_category_names()
- get_ontology_category_info(cat_id)
- get_non_ontology_concept_names()
- get_concept_concept_graphscore_table(concepts_to_keep=None)
- get_category_to_category()
- get_category_parent(child_id)
- get_category_branch(category_id)
- get_category_children(parent_id)
- get_cluster_parent(cluster_id)
- get_cluster_children(cluster_id)
- get_concept_parent_category(concept_id)
- get_concept_parent_cluster(concept_id)
- get_category_cluster_list(cat_id)
- get_category_concept_list(cat_id)
- get_cluster_concept_list(cluster_id)
- get_category_concept_table(concepts_to_keep=None)
- get_category_cluster_table()
- get_category_anchor_pages(category_id)
- get_cluster_concepts(cluster_id)
- get_concept_name(concept_id)
- get_concept_names_list(concept_ids)
- get_test_concept_names()
- get_test_category_concept()
- get_test_cluster_concept()
- get_root_category()
- generate_tree_structure(start=None)
- generate_category_concept_dict()