Fraunhofer OCR Engine (recognaize-ocr)

Fraunhofer OCR software that performs layout analysis and extracts textual content from documents

Docker container

Fraunhofer IAIS: Document Analytics

Container in AI4EU Experiments

Developed by

Fraunhofer-Gesellschaft

License

Other

Intellectual property of Fraunhofer IAIS (closed source)

Main Characteristic

Fraunhofer OCR Engine software extracts the geometrical structure and the textual content from scanned and digital-born documents. Additionally, it implements various pre-processing algorithms, like document skew angle correction and image binarization. The OCR is performed using Deep Learning algorithms.

Research areas

Collaborative AI

Technical Categories

Computer vision Machine learning Natural language processing

Keywords

Last updated

21.04.2023 - 14:50

Detailed Description

Additional informaton: The software is deployed in a Docker container with a small REST API. Thus, a document can easily be sent to the API, the document then is processed and the results are sent back to the user. The results contain the information about the logical document layout, the recognized characters, font information etc. as well as binarization results and page images.

References:

Namysl, M., & Konya, I. (2019). Efficient, lexicon-free OCR using deep learning. arXiv preprint arXiv:1906.01969 https://arxiv.org/abs/1906.01969

Related pipeline and models in AI4EU Experiments Marketplace

Trustworthy AI

The Fraunhofer OCR engine is (1) lawful, as it respects all applicable laws and regulations (e. g. software licenses of used open source components), especially it is GDPR-compliant, (2) ethical, as it pursues the ethical goal of making information from documents easily accessible in digital form to the documents' owner, (3) robust, from a technical perspective, especially as it is deployed in a "ready-to-use" Docker container, to make processing documents as simple as possible.

GDPR Requirements

The Fraunhofer OCR software allows the user to extract textual context from document images. The software itself is GDPR compliant. Documents are processed within a Docker container and all data remains on the user's local computer. However, the user must ensure that he has the authority to store and process the document, for example if it contains personal data or other sensitive, GDPR-relevant information.

Related Projects

AI4Media