graphai.core.video.video_utils module
- graphai.core.video.video_utils.retrieve_video_file_from_generic_url(url, output_filename_with_path, output_token)
Retrieves a file from a given URL using WGET and stores it locally. :param url: the URL :param output_filename_with_path: Path of output file :param output_token: Token of output file
- Returns:
Output token if successful, None otherwise
- graphai.core.video.video_utils.retrieve_file_from_kaltura(url, output_filename_with_path, output_token)
Retrieves a video file in m3u8 format from Kaltura and stores it locally :param url: URL of the m3u8 playlist :param output_filename_with_path: Full path of output file :param output_token: Token of output file
- Returns:
Output token if retrieval is successful, None otherwise
- graphai.core.video.video_utils.retrieve_file_from_youtube(url, output_filename_with_path, output_token)
Downloads a video from YouTube :param url: Youtube URL :param output_filename_with_path: Full path of output file :param output_token: Token of output file
- Returns:
Token of output file if successful, None otherwise
- graphai.core.video.video_utils.retrieve_file_from_any_source(url, output_filename_with_path, output_token, is_kaltura=False)
- graphai.core.video.video_utils.create_video_filename_using_url_format(token, url)
- graphai.core.video.video_utils.perform_slow_audio_probe(input_filename_with_path)
Performs a slower probe using ffmpeg by decoding the audio stream :param input_filename_with_path: Input file path
- Returns:
Probe results
- graphai.core.video.video_utils.generate_symbolic_token(origin, token)
Generates a new symbolic token based on the origin token and the token itself :param origin: Origin token :param token: Target token
- Returns:
Symbolic token
- graphai.core.video.video_utils.detect_audio_duration(input_filename_with_path)
Detects the duration of the audio track of the provided video file and returns its name in ogg format :param input_filename_with_path: Path of input file :param input_token: Token of input file
- Returns:
Audio duration.
- graphai.core.video.video_utils.generate_audio_token(token)
- graphai.core.video.video_utils.extract_audio_from_video(input_filename_with_path, output_filename_with_path, output_token)
Extracts the audio track from a video. :param input_filename_with_path: Path of input file :param output_filename_with_path: Path of output file :param output_token: Token of output file
- Returns:
Output token and duration of audio if successful, None if not.
- graphai.core.video.video_utils.extract_frames(input_filename_with_path, output_folder_with_path, output_folder)
Extracts frames from an image file :param input_filename_with_path: Path of input video file :param output_folder_with_path: Path of output image folder :param output_folder: The output folder only (as return token)
- Returns:
The return token if successful, None otherwise
- graphai.core.video.video_utils.generate_frame_sample_indices(input_folder_with_path, step=12)
Generates indices for extracted frames (so we don’t use every single frame for our calculations) :param input_folder_with_path: Full path of the input image folder :param step: Step size for indices
- Returns:
List of indices
- graphai.core.video.video_utils.read_txt_gz_file(fp)
Reads the contents of a txt.gz file :param fp: File path
- Returns:
Resulting text
- graphai.core.video.video_utils.write_txt_gz_file(text, fp)
Writes text to a txt.gz file :param text: String to write :param fp: File path
- Returns:
None
- graphai.core.video.video_utils.read_json_gz_file(fp)
Reads contents of a json.gz file :param fp: File path
- Returns:
Contents of JSON file as dict
- graphai.core.video.video_utils.tesseract_ocr_or_get_cached(ocr_path, image_path, language)
Performs OCR using tesseract or uses cached results :param ocr_path: Root path of OCR files :param image_path: Root path of image files :param language: Langauge of the slides
- Returns:
Extracted text
- graphai.core.video.video_utils.generate_img_and_ocr_paths_and_perform_tesseract_ocr(input_folder_with_path, k, language=None)
- class graphai.core.video.video_utils.NLPModels
Bases:
object
- load_nlp_models()
Lazy-loads and returns the NLP models used for local OCR in slide detection :returns: The NLP model dict
- get_words(text, lang='en', valid_only=False)
- get_text_word_vector(text, lang='en', valid_only=True)
- get_text_word_vector_using_words(words, lang='en')
- graphai.core.video.video_utils.get_cosine_sim(v1, v2)
- graphai.core.video.video_utils.frame_ocr_distance(input_folder_with_path, k1, k2, nlp_models: NLPModels, language=None)
Computes OCR distance between two frames :param input_folder_with_path: Full path of input image folder :param k1: Index of frame 1 :param k2: Index of frame 2 :param nlp_models: NLP models used to compute distance between OCR results :param language: Language of the text in the input images
- Returns:
Distance between the two frames
- graphai.core.video.video_utils.frame_hash_similarity(input_folder_with_path, k1, k2)
Computes the hash-based similarity between two frames :param input_folder_with_path: Full path of the input image folder :param k1: Index of frame 1 :param k2: Index of frame 2
- Returns:
Similarity between the two frames (between 0 and 1)
- graphai.core.video.video_utils.compute_ocr_noise_level(input_folder_with_path, frame_sample_indices, nlp_models, language=None)
Computes noise values for a sequence of frames :param input_folder_with_path: Full path of the input image folder :param frame_sample_indices: Indices of the sampled frames :param nlp_models: The NLP models used for the OCR distance :param language: Language of the slides
- Returns:
List of distances identified as noise (i.e. below the default noise threshold)
- graphai.core.video.video_utils.compute_ocr_threshold(distance_list, multiplier=5, default_threshold=0.05)
Computes the OCR noise threshold using a list of subsequent frame distances. Threshold = multiplier * median if a number, else default_threshold :param distance_list: List of OCR distances :param multiplier: Multiplier for median of distance values. :param default_threshold: Default value to use if the list is empty
- Returns:
The noise threshold
- graphai.core.video.video_utils.check_ocr_and_hash_thresholds(input_folder_with_path, k_l, k_r, ocr_dist_threshold, hash_similarity_threshold, nlp_models, language=None)
- graphai.core.video.video_utils.frame_ocr_transition(input_folder_with_path, k_l, k_r, ocr_dist_threshold, hash_similarity_threshold, nlp_models, language=None)
Recursive function that finds slide transitions through binary tree search :param input_folder_with_path: Full path of input image folder, where they all follow FRAME_FORMAT :param k_l: Leftmost index of the binary search :param k_r: Rightmost index of the binary search :param ocr_dist_threshold: Minimum OCR-based distance for two frames to be considered distinct :param hash_similarity_threshold: Maximum hash-based similarity for two frames to be considered distinct :param nlp_models: NLP models for the OCR results :param language: Language of the document
- Returns:
[transition frame index, distance] if a transition is found, [None, None] otherwise
- graphai.core.video.video_utils.compute_video_ocr_transitions(input_folder_with_path, frame_sample_indices, ocr_dist_threshold, hash_dist_threshold, nlp_models, language=None, keep_first=True, keep_last=True)
Computes all the slide transitions for slides extracted from a video file :param input_folder_with_path: Path of the slide folder :param frame_sample_indices: Indices of sampled frames :param ocr_dist_threshold: Threshold for OCR distance (below which slides are considered to be the same) :param hash_dist_threshold: Threshold for perceptual hash similarity (above which they are considered to be the same) :param nlp_models: NLP models for parsing the OCR results :param language: Language of the slides :param keep_first: Whether to return the first frame index as a slide. True by default :param keep_last: Whether to return the final frame index as a slide. True by default.
- Returns:
List of transitory slides