graphai.core.translation.text_utils module
- graphai.core.translation.text_utils.generate_src_tgt_dict(src, tgt)
Creates a source language and target language dictionary for translation :param src: Source lang :param tgt: Target lang
- Returns:
dict
- graphai.core.translation.text_utils.generate_translation_text_token(s, src, tgt)
Generates an md5-based token for a string :param s: The string :param src: Source lang :param tgt: Target lang
- Returns:
Token
- graphai.core.translation.text_utils.compute_slide_tfidf_scores(list_of_sets, min_freq=1)
- graphai.core.translation.text_utils.find_set_cover(list_of_sets, coverage=1.0, scores=None)
- graphai.core.translation.text_utils.find_best_slide_subset(slides_and_concepts, coverage=1.0, priorities=True, min_freq=2)
- class graphai.core.translation.text_utils.TranslationModels
Bases:
object
- load_models()
Loads Huggingface translation and tokenization models plus a pysbd segmenter :returns: None
- get_device()
- get_last_usage()
- unload_model(unload_period=10800.0)
- translate(text, how='en-fr', skip_sentence_segmentation=False)
Translates provided text :param text: Text to translate :param how: source-target language :param skip_sentence_segmentation: If True, skips segmentation
- Returns:
Translated text and ‘unpunctuated text too long’ flag