graphai.core.video.video_utils module

graphai.core.video.video_utils.retrieve_video_file_from_generic_url(url, output_filename_with_path, output_token)

Retrieves a file from a given URL using WGET and stores it locally. :param url: the URL :param output_filename_with_path: Path of output file :param output_token: Token of output file

Returns:

Output token if successful, None otherwise

graphai.core.video.video_utils.retrieve_file_from_kaltura(url, output_filename_with_path, output_token)

Retrieves a video file in m3u8 format from Kaltura and stores it locally :param url: URL of the m3u8 playlist :param output_filename_with_path: Full path of output file :param output_token: Token of output file

Returns:

Output token if retrieval is successful, None otherwise

graphai.core.video.video_utils.retrieve_file_from_youtube(url, output_filename_with_path, output_token)

Downloads a video from YouTube :param url: Youtube URL :param output_filename_with_path: Full path of output file :param output_token: Token of output file

Returns:

Token of output file if successful, None otherwise

graphai.core.video.video_utils.retrieve_file_from_any_source(url, output_filename_with_path, output_token, is_kaltura=False)
graphai.core.video.video_utils.create_video_filename_using_url_format(token, url)
graphai.core.video.video_utils.perform_slow_audio_probe(input_filename_with_path)

Performs a slower probe using ffmpeg by decoding the audio stream :param input_filename_with_path: Input file path

Returns:

Probe results

graphai.core.video.video_utils.generate_symbolic_token(origin, token)

Generates a new symbolic token based on the origin token and the token itself :param origin: Origin token :param token: Target token

Returns:

Symbolic token

graphai.core.video.video_utils.detect_audio_duration(input_filename_with_path)

Detects the duration of the audio track of the provided video file and returns its name in ogg format :param input_filename_with_path: Path of input file :param input_token: Token of input file

Returns:

Audio duration.

graphai.core.video.video_utils.generate_audio_token(token)
graphai.core.video.video_utils.extract_audio_from_video(input_filename_with_path, output_filename_with_path, output_token)

Extracts the audio track from a video. :param input_filename_with_path: Path of input file :param output_filename_with_path: Path of output file :param output_token: Token of output file

Returns:

Output token and duration of audio if successful, None if not.

graphai.core.video.video_utils.extract_frames(input_filename_with_path, output_folder_with_path, output_folder)

Extracts frames from an image file :param input_filename_with_path: Path of input video file :param output_folder_with_path: Path of output image folder :param output_folder: The output folder only (as return token)

Returns:

The return token if successful, None otherwise

graphai.core.video.video_utils.generate_frame_sample_indices(input_folder_with_path, step=12)

Generates indices for extracted frames (so we don’t use every single frame for our calculations) :param input_folder_with_path: Full path of the input image folder :param step: Step size for indices

Returns:

List of indices

graphai.core.video.video_utils.read_txt_gz_file(fp)

Reads the contents of a txt.gz file :param fp: File path

Returns:

Resulting text

graphai.core.video.video_utils.write_txt_gz_file(text, fp)

Writes text to a txt.gz file :param text: String to write :param fp: File path

Returns:

None

graphai.core.video.video_utils.read_json_gz_file(fp)

Reads contents of a json.gz file :param fp: File path

Returns:

Contents of JSON file as dict

graphai.core.video.video_utils.tesseract_ocr_or_get_cached(ocr_path, image_path, language)

Performs OCR using tesseract or uses cached results :param ocr_path: Root path of OCR files :param image_path: Root path of image files :param language: Langauge of the slides

Returns:

Extracted text

graphai.core.video.video_utils.generate_img_and_ocr_paths_and_perform_tesseract_ocr(input_folder_with_path, k, language=None)
class graphai.core.video.video_utils.NLPModels

Bases: object

load_nlp_models()

Lazy-loads and returns the NLP models used for local OCR in slide detection :returns: The NLP model dict

get_words(text, lang='en', valid_only=False)
get_text_word_vector(text, lang='en', valid_only=True)
get_text_word_vector_using_words(words, lang='en')
graphai.core.video.video_utils.get_cosine_sim(v1, v2)
graphai.core.video.video_utils.frame_ocr_distance(input_folder_with_path, k1, k2, nlp_models: NLPModels, language=None)

Computes OCR distance between two frames :param input_folder_with_path: Full path of input image folder :param k1: Index of frame 1 :param k2: Index of frame 2 :param nlp_models: NLP models used to compute distance between OCR results :param language: Language of the text in the input images

Returns:

Distance between the two frames

graphai.core.video.video_utils.frame_hash_similarity(input_folder_with_path, k1, k2)

Computes the hash-based similarity between two frames :param input_folder_with_path: Full path of the input image folder :param k1: Index of frame 1 :param k2: Index of frame 2

Returns:

Similarity between the two frames (between 0 and 1)

graphai.core.video.video_utils.compute_ocr_noise_level(input_folder_with_path, frame_sample_indices, nlp_models, language=None)

Computes noise values for a sequence of frames :param input_folder_with_path: Full path of the input image folder :param frame_sample_indices: Indices of the sampled frames :param nlp_models: The NLP models used for the OCR distance :param language: Language of the slides

Returns:

List of distances identified as noise (i.e. below the default noise threshold)

graphai.core.video.video_utils.compute_ocr_threshold(distance_list, multiplier=5, default_threshold=0.05)

Computes the OCR noise threshold using a list of subsequent frame distances. Threshold = multiplier * median if a number, else default_threshold :param distance_list: List of OCR distances :param multiplier: Multiplier for median of distance values. :param default_threshold: Default value to use if the list is empty

Returns:

The noise threshold

graphai.core.video.video_utils.check_ocr_and_hash_thresholds(input_folder_with_path, k_l, k_r, ocr_dist_threshold, hash_similarity_threshold, nlp_models, language=None)
graphai.core.video.video_utils.frame_ocr_transition(input_folder_with_path, k_l, k_r, ocr_dist_threshold, hash_similarity_threshold, nlp_models, language=None)

Recursive function that finds slide transitions through binary tree search :param input_folder_with_path: Full path of input image folder, where they all follow FRAME_FORMAT :param k_l: Leftmost index of the binary search :param k_r: Rightmost index of the binary search :param ocr_dist_threshold: Minimum OCR-based distance for two frames to be considered distinct :param hash_similarity_threshold: Maximum hash-based similarity for two frames to be considered distinct :param nlp_models: NLP models for the OCR results :param language: Language of the document

Returns:

[transition frame index, distance] if a transition is found, [None, None] otherwise

graphai.core.video.video_utils.compute_video_ocr_transitions(input_folder_with_path, frame_sample_indices, ocr_dist_threshold, hash_dist_threshold, nlp_models, language=None, keep_first=True, keep_last=True)

Computes all the slide transitions for slides extracted from a video file :param input_folder_with_path: Path of the slide folder :param frame_sample_indices: Indices of sampled frames :param ocr_dist_threshold: Threshold for OCR distance (below which slides are considered to be the same) :param hash_dist_threshold: Threshold for perceptual hash similarity (above which they are considered to be the same) :param nlp_models: NLP models for parsing the OCR results :param language: Language of the slides :param keep_first: Whether to return the first frame index as a slide. True by default :param keep_last: Whether to return the final frame index as a slide. True by default.

Returns:

List of transitory slides