graphai.core.translation.text_utils module

graphai.core.translation.text_utils.generate_src_tgt_dict(src, tgt)

Creates a source language and target language dictionary for translation :param src: Source lang :param tgt: Target lang

Returns:: dict

graphai.core.translation.text_utils.generate_translation_text_token(s, src, tgt)

Generates an md5-based token for a string :param s: The string :param src: Source lang :param tgt: Target lang

Returns:: Token

graphai.core.translation.text_utils.compute_slide_tfidf_scores(list_of_sets, min_freq=1)

graphai.core.translation.text_utils.find_set_cover(list_of_sets, coverage=1.0, scores=None)

graphai.core.translation.text_utils.find_best_slide_subset(slides_and_concepts, coverage=1.0, priorities=True, min_freq=2)

class graphai.core.translation.text_utils.TranslationModels

Bases: object

load_models(): Loads Huggingface translation and tokenization models plus a pysbd segmenter :returns: None

get_device()

get_last_usage()

unload_model(unload_period=10800.0)

translate(text, how='en-fr', skip_sentence_segmentation=False)

Translates provided text :param text: Text to translate :param how: source-target language :param skip_sentence_segmentation: If True, skips segmentation

Returns:: Translated text and ‘unpunctuated text too long’ flag