graphai.core.ontology.clustering module

graphai.core.ontology.clustering.compute_all_graphs_from_scratch(data_to_use_dict, concept_names)

Computes all the graphs from scratch using their corresponding dataframes. :param : param data_to_use_dict: Dictionary mapping data source names to dataframes :param : param concept_names: Dataframe containing concept names

Returns:: The resulting data source to graph matrix dictionary, index to name dict, index to ID dict

graphai.core.ontology.clustering.normalize_features(g): Normalizes the rows of a matrix (according to the l2 norm) :param g: Dense matrix :return: Normalized matrix

graphai.core.ontology.clustering.get_laplacian(graph, normed=False): Computes the laplacian (normalized or unnormalized) of a graph using its adjacency matrix. :param graph: Adjacency matrix :param normed: Whether to compute the normalized laplacian :return: The laplacian as a LIL sparse matrix

graphai.core.ontology.clustering.sum_laplacians(laplacians): Aggregates laplacians using a simple sum :param laplacians: List of laplacians :return: Aggregated laplacian

graphai.core.ontology.clustering.arithmetic_mean_laplacians(laplacians): Aggregates laplacians using the arithmetic mean :param laplacians: List of laplacians :return: Aggregated laplacian

graphai.core.ontology.clustering.combine_laplacians(laplacians, mean=True): Combines a list of matrix laplacians using the requested method :param laplacians: Lis of laplacians :param mean: whether to compute the arithmetic mean or just the sum :return: Aggregated laplacian

graphai.core.ontology.clustering.spec_embed_on_laplacian(laplacian, n_clusters, seed=420): Computes the spectral embedding of a given laplacian matrix :param laplacian: The laplacian (as a sparse matrix) :param n_clusters: The number of components to compute :param seed: Random seed for the optimizer :return: The spectral embedding of the laplacian matrix

graphai.core.ontology.clustering.combine_and_embed_laplacian(main_graphs, n_dims=1000)

Computes and combines graph Laplacians and calculates their spectral embedding :param : param main_graphs: List of graphs to be used :param : param combination_method: Method for combining the Laplacians, “armean” by default :param : param n_dims: Number of dimensions for the spectral embedding

Returns:: The combined Laplacian matrix and the spectral embedding

graphai.core.ontology.clustering.perform_PCA(data, n_components, random_state=420, center_and_scale=True): Performs PCA on the data :param data: The original data :param n_components: Number of components :param random_state: Random state :return: The data after dimensionality reduction using PCA

graphai.core.ontology.clustering.precompute_clustering_metric(data, affinity, normalize_vectors, random_state)

graphai.core.ontology.clustering.perform_constrained_agglomerative(data, n_clusters, normalize_vectors=False, random_state=420, affinity='cosine', linkage='average', full_compute=False): Performs agglomerative clustering on the data with must-link and cannot-link constraints :param data: The data (ndarray) :param n_clusters: Number of clusters :param ml: List of must-link constraints :param cl: List of cannot-link constraints :param normalize_vectors: Whether to normalize each data point (using l2 norm) :param random_state: The random state :param return_model: Whether to return the clustering model as well as the labels :param affinity: How the distance between data points is computed :param linkage: Linkage of the clusters :param full_compute: Whether to compute the full tree :return: Cluster labels, optionally clustering model

graphai.core.ontology.clustering.variance_ratio_eval(data, labels): Computes the variance ratio of clusters :param data: The data :param labels: Cluster labels :return: The variance ratio score (also known as the Calinski-Harabasz score)

graphai.core.ontology.clustering.davies_bouldin_eval(data, labels): Computes the Davies-Bouldin score :param data: The data :param labels: Cluster labels :return: The score

graphai.core.ontology.clustering.cluster_using_embedding(embedding, n_clusters, params=None)

Computes one level of clustering using the provided embedding and constraints :param : param embedding: Embedding of concepts in low-dimensional space :param : param params: Parameters for the clustering, e.g. # of clusters, # of PCA dimensions, etc.

Returns:: Clustering labels

graphai.core.ontology.clustering.group_clustered(data, labels, mode='mean', rows_and_cols=False, precomputed_map=None)

Groups the data points together based on provided clustering labels such that each cluster becomes one single data point. :param data: The data (ndarray or sparse matrix) :param labels: The cluster labels :param mode: What to aggregate with (mean/median) :param rows_and_cols: Whether to perform the aggregation on both the rows and the columns of the data :param precomputed_map: If provided, this will be used as the cluster to concept map. If not, the mapping

will be computed.

Returns:: The transformed data and the cluster to concept map

graphai.core.ontology.clustering.reassign_outliers(labels, embeddings, min_n=3)

Reassigns outlier clusters to non-outlier clusters. :param : param labels: Labels of each concept :param : param embeddings: The concept embedding vectors :param : param min_n: Minimum size for a cluster to not be considered an outlier

Returns:: New labels after reassignment

graphai.core.ontology.clustering.cluster_and_reassign_outliers(embedding, n_clusters, min_n=None, params=None)

graphai.core.ontology.clustering.assign_to_categories_using_existing(labels, category_concept, category_id_to_index)

graphai.core.ontology.clustering.convert_cluster_labels_to_dict(cluster_labels, concept_index_to_id, concept_index_to_name)