Fraunhofer OCR Engine (recognaize-ocr)
Fraunhofer OCR software that performs layout analysis and extracts textual content from documents
Fraunhofer OCR Engine software extracts the geometrical structure and the textual content from scanned and digital-born documents. Additionally, it implements various pre-processing algorithms, like document skew angle correction and image binarization. The OCR is performed using Deep Learning algorithms.
Additional informaton: The software is deployed in a Docker container with a small REST API. Thus, a document can easily be sent to the API, the document then is processed and the results are sent back to the user. The results contain the information about the logical document layout, the recognized characters, font information etc. as well as binarization results and page images.
References:
- Namysl, M., & Konya, I. (2019). Efficient, lexicon-free OCR using deep learning. arXiv preprint arXiv:1906.01969 https://arxiv.org/abs/1906.01969
Related pipeline and models in AI4EU Experiments Marketplace