graphai.core.ontology.clustering module
- graphai.core.ontology.clustering.compute_all_graphs_from_scratch(data_to_use_dict, concept_names)
Computes all the graphs from scratch using their corresponding dataframes. :param : param data_to_use_dict: Dictionary mapping data source names to dataframes :param : param concept_names: Dataframe containing concept names
- Returns:
The resulting data source to graph matrix dictionary, index to name dict, index to ID dict
- graphai.core.ontology.clustering.normalize_features(g)
Normalizes the rows of a matrix (according to the l2 norm) :param g: Dense matrix :return: Normalized matrix
- graphai.core.ontology.clustering.get_laplacian(graph, normed=False)
Computes the laplacian (normalized or unnormalized) of a graph using its adjacency matrix. :param graph: Adjacency matrix :param normed: Whether to compute the normalized laplacian :return: The laplacian as a LIL sparse matrix
- graphai.core.ontology.clustering.sum_laplacians(laplacians)
Aggregates laplacians using a simple sum :param laplacians: List of laplacians :return: Aggregated laplacian
- graphai.core.ontology.clustering.arithmetic_mean_laplacians(laplacians)
Aggregates laplacians using the arithmetic mean :param laplacians: List of laplacians :return: Aggregated laplacian
- graphai.core.ontology.clustering.combine_laplacians(laplacians, mean=True)
Combines a list of matrix laplacians using the requested method :param laplacians: Lis of laplacians :param mean: whether to compute the arithmetic mean or just the sum :return: Aggregated laplacian
- graphai.core.ontology.clustering.spec_embed_on_laplacian(laplacian, n_clusters, seed=420)
Computes the spectral embedding of a given laplacian matrix :param laplacian: The laplacian (as a sparse matrix) :param n_clusters: The number of components to compute :param seed: Random seed for the optimizer :return: The spectral embedding of the laplacian matrix
- graphai.core.ontology.clustering.combine_and_embed_laplacian(main_graphs, n_dims=1000)
Computes and combines graph Laplacians and calculates their spectral embedding :param : param main_graphs: List of graphs to be used :param : param combination_method: Method for combining the Laplacians, “armean” by default :param : param n_dims: Number of dimensions for the spectral embedding
- Returns:
The combined Laplacian matrix and the spectral embedding
- graphai.core.ontology.clustering.perform_PCA(data, n_components, random_state=420, center_and_scale=True)
Performs PCA on the data :param data: The original data :param n_components: Number of components :param random_state: Random state :return: The data after dimensionality reduction using PCA
- graphai.core.ontology.clustering.precompute_clustering_metric(data, affinity, normalize_vectors, random_state)
- graphai.core.ontology.clustering.perform_constrained_agglomerative(data, n_clusters, normalize_vectors=False, random_state=420, affinity='cosine', linkage='average', full_compute=False)
Performs agglomerative clustering on the data with must-link and cannot-link constraints :param data: The data (ndarray) :param n_clusters: Number of clusters :param ml: List of must-link constraints :param cl: List of cannot-link constraints :param normalize_vectors: Whether to normalize each data point (using l2 norm) :param random_state: The random state :param return_model: Whether to return the clustering model as well as the labels :param affinity: How the distance between data points is computed :param linkage: Linkage of the clusters :param full_compute: Whether to compute the full tree :return: Cluster labels, optionally clustering model
- graphai.core.ontology.clustering.variance_ratio_eval(data, labels)
Computes the variance ratio of clusters :param data: The data :param labels: Cluster labels :return: The variance ratio score (also known as the Calinski-Harabasz score)
- graphai.core.ontology.clustering.davies_bouldin_eval(data, labels)
Computes the Davies-Bouldin score :param data: The data :param labels: Cluster labels :return: The score
- graphai.core.ontology.clustering.cluster_using_embedding(embedding, n_clusters, params=None)
Computes one level of clustering using the provided embedding and constraints :param : param embedding: Embedding of concepts in low-dimensional space :param : param params: Parameters for the clustering, e.g. # of clusters, # of PCA dimensions, etc.
- Returns:
Clustering labels
- graphai.core.ontology.clustering.group_clustered(data, labels, mode='mean', rows_and_cols=False, precomputed_map=None)
Groups the data points together based on provided clustering labels such that each cluster becomes one single data point. :param data: The data (ndarray or sparse matrix) :param labels: The cluster labels :param mode: What to aggregate with (mean/median) :param rows_and_cols: Whether to perform the aggregation on both the rows and the columns of the data :param precomputed_map: If provided, this will be used as the cluster to concept map. If not, the mapping
will be computed.
- Returns:
The transformed data and the cluster to concept map
- graphai.core.ontology.clustering.reassign_outliers(labels, embeddings, min_n=3)
Reassigns outlier clusters to non-outlier clusters. :param : param labels: Labels of each concept :param : param embeddings: The concept embedding vectors :param : param min_n: Minimum size for a cluster to not be considered an outlier
- Returns:
New labels after reassignment
- graphai.core.ontology.clustering.cluster_and_reassign_outliers(embedding, n_clusters, min_n=None, params=None)
- graphai.core.ontology.clustering.assign_to_categories_using_existing(labels, category_concept, category_id_to_index)
- graphai.core.ontology.clustering.convert_cluster_labels_to_dict(cluster_labels, concept_index_to_id, concept_index_to_name)