graphai.core.common.fingerprinting module

graphai.core.common.fingerprinting.perceptual_hash_text(s)

Computes the perceptual hash of a strong :param s: String to hash :param min_window_length: Minimum window length for k-grams :param max_window_length: Maximum window length for k-grams :param hash_len: Length of the hash

Returns:

Perceptual hash of string

graphai.core.common.fingerprinting.md5_video_or_audio(input_filename_with_path, video=True)

Computes the md5 hash of the video or audio stream of a video file :param input_filename_with_path: Full path of the input file :param video: Whether to compute the md5 for the video stream or the audio stream

Returns:

MD5 hash

graphai.core.common.fingerprinting.perceptual_hash_audio(input_filename_with_path, max_length=7200)

Computes the perceptual hash of an audio file :param input_filename_with_path: Path of the input file :param max_length: Maximum length of the file in seconds

Returns:

String representation of the computed fingerprint and its decoded representation. Both are None if the file doesn’t exist.

graphai.core.common.fingerprinting.perceptual_hash_image(input_filename_with_path, hash_size=16)

Computes the perceptual hash of an image file :param input_filename_with_path: Path of the input file :param hash_size: Size of hash

Returns:

String representation of the computed fingerprint. None if file does not exist

graphai.core.common.fingerprinting.perceptual_hash_pdf(input_filename_with_path, hash_size=16)

Computes the perceptual hash of an image file :param input_filename_with_path: Path of the input file :param hash_size: Size of hash

Returns:

String representation of the computed fingerprint. None if file does not exist

graphai.core.common.fingerprinting.compare_decoded_fingerprints(decoded_1, decoded_2)

Compares two decoded fingerprints :param decoded_1: Fingerprint 1 :param decoded_2: Fingerprint 2

Returns:

Fuzzy matching ratio between 0 and 1

graphai.core.common.fingerprinting.compare_encoded_fingerprints(f1, f2=None, decoder_func=<function hex_to_hash>)

Compares two string-encoded audio fingerprints and returns the ratio of the fuzzy match between them (value between 0 and 1, with 1 indicating an exact match). :param f1: The target fingerprint :param f2: The second fingerprint, can be None (similarity is 0 if so)

Returns:

Ratio of fuzzy match between the two fingerprints

graphai.core.common.fingerprinting.find_closest_fingerprint_from_list(target_fp, fp_list, token_list, date_list, min_similarity=1, decoder_func=<function hex_to_hash>, strip_underscores=True)

Given a target fingerprint and a list of candidate fingerprints, finds the one with the highest similarity to the target whose similarity is above a minimum value. :param target_fp: Target fingerprint :param fp_list: List of candidate fingerprints :param token_list: List of tokens corresponding to those fingerprints :param min_similarity: Minimum similarity value. If the similarity of the most similar candidate to the target

is lower than this value, None will be returned as the result.

Parameters:
  • decoder_func – The function that decodes the string hash, different for audio vs image hashes

  • strip_underscores – For text fingerprints, removes the trailing underscores

Returns:

Closest fingerprint, its token, and the highest score. All three are None if the closest one does not satisfy the minimum similarity criterion.

graphai.core.common.fingerprinting.find_closest_fingerprint_for_list_from_list(target_fp, fp_list, token_list, date_list, min_similarity=1, decoder_func=<function hex_to_hash>, strip_underscores=True)
graphai.core.common.fingerprinting.find_closest_audio_fingerprint_from_list(target_fp, fp_list, token_list, date_list, min_similarity=1)

Finds closest audio fingerprint from list

graphai.core.common.fingerprinting.find_closest_image_fingerprint_from_list(target_fp, fp_list, token_list, date_list, min_similarity=1)

Finds closest image fingerprint from list

graphai.core.common.fingerprinting.find_closest_text_fingerprint_from_list(target_fp, fp_list, token_list, date_list, min_similarity=1)

Finds closest image fingerprint from list

graphai.core.common.fingerprinting.md5_text(s)

Computes the md5 hash of a string :param s: The string

Returns:

MD5 hash

graphai.core.common.fingerprinting.compute_text_fingerprint(token, text)