graphai.core.image.ocr module

graphai.core.image.ocr.is_valid_latex(text)
graphai.core.image.ocr.cleanup_json(text)
class graphai.core.image.ocr.ImgToBase64Converter(image_path)

Bases: object

get_base64()
class graphai.core.image.ocr.AbstractOCRModel(api_key, model_class, model_name)

Bases: ABC

establish_connection()

Lazily connects to the OCR API :returns: True if a connection already exists or if a new connection is successfully established, False otherwise

abstract perform_ocr(input_filename_with_path)
class graphai.core.image.ocr.GoogleOCRModel(api_key)

Bases: AbstractOCRModel

perform_ocr(input_filename_with_path)

Performs OCR with two methods (text_detection and document_text_detection) :param input_filename_with_path: Full path of the input image file

Returns:

Text results of the two OCR methods

wait_for_ocr_results(image_object, method='dtd', retries=6)

Makes call to Google OCR API and waits for the results :param image_object: Image object for the Google API :param method: Method to use, ‘td’ for text detection and ‘dtd’ for document text detection :param retries: Number of retries to perform in case of failure

Returns:

OCR results

class graphai.core.image.ocr.OpenAIOCRModel(api_key)

Bases: AbstractOCRModel

perform_ocr(input_filename_with_path, validate_latex=True, model_type=None)
class graphai.core.image.ocr.GeminiOCRModel(api_key)

Bases: AbstractOCRModel

perform_ocr(input_filename_with_path, model_type=None)
graphai.core.image.ocr.get_ocr_colnames(method)
graphai.core.image.ocr.perform_tesseract_ocr_on_pdf(pdf_path, language=None, in_pages=True)

Performs OCR using tesseract on a pdf file :param pdf_path: Path to the PDF file :param language: Language of the PDF file :param in_pages: Whether to return the results as a separate pages (in a JSON string) or as a singular string.

Returns:

String containing the entire PDF file’s extracted contents