graphai.core.voice.transcribe module

class graphai.core.voice.transcribe.WhisperTranscriptionModel

Bases: object

load_model_whisper(): Lazy-loads a Whisper model into memory

get_last_usage()

unload_model(unload_period=43200.0)

get_silence_thresholds(strict_silence=False)

detect_audio_segment_lang_whisper(input_filename_with_path)

Detects the language of an audio file using a 30-second sample :param input_filename_with_path: Path to input file

Returns:: Highest-scoring language code (e.g. ‘en’)

transcribe_audio_whisper(input_filename_with_path, force_lang=None, verbose=False, strict_silence=False)

Transcribes an audio file using whisper :param input_filename_with_path: Path to input file :param force_lang: Whether to explicitly feed the model the language of the audio.

None results in automatic detection.

Parameters:

verbose – Verbosity of the transcription
strict_silence – Whether silence detection is strict or lenient. Affects the logprob and no speech thresholds.

Returns:

‘text’ contains the full transcript, ‘segments’ contains a JSON-like dict of translated segments which can be used as subtitles, and ‘language’ which contains the language code.

Return type:

A dictionary with three keys

graphai.core.voice.transcribe.extract_media_segment(input_filename_with_path, output_filename_with_path, output_token, start, length)

Extracts a segment of a given video or audio file indicated by the starting time and the length :param input_filename_with_path: Full path of input file :param output_filename_with_path: Full path of output file :param output_token: Output token :param start: Starting timestamp :param length: Length of segment

Returns:: The output token if successful, None otherwise

graphai.core.voice.transcribe.detect_language_retrieve_from_db_and_split(input_dict, file_manager, n_divs=5, segment_length=30)

graphai.core.voice.transcribe.detect_language_parallel(tokens_dict, i, model, file_manager)

graphai.core.voice.transcribe.detect_language_callback(token, results_list, force)

graphai.core.voice.transcribe.transcribe_audio_to_text(input_dict, model, file_manager, strict_silence=False)

graphai.core.voice.transcribe.transcribe_callback(token, results, force)

graphai.core.voice.transcribe.cache_lookup_audio_transcript(token)

graphai.core.voice.transcribe.cache_lookup_audio_fingerprint(token)

graphai.core.voice.transcribe.cache_lookup_audio_language(token)