Fraunhofer Table Extraction

Fraunhofer Table Extraction software to extract table information from documents

Docker container

Developed by

Fraunhofer-Gesellschaft

License

Other

Intellectual property of Fraunhofer IAIS (closed source)

Main Characteristic

Fraunhofer Table Extraction software to extract table information from documents. The software applies a multi-step approach and contains various heuristics for (1) detecting tables in documents, (2) segmenting tables, i. e. recognizing the row and column structure, and (3) interpreting tables, i. e. semantically understanding the contents of table. The software is deployed in a Docker container with a small REST API. Thus, a document can easily be sent to the API, the document then is processed and the results are sent back to the user. The results contain the contents of each table in various file formats (JSON, XML, CSV) as well as additional information and page images.

Technical Categories

Machine learning

Keywords

Last updated

11.06.2021 - 20:39

Detailed Description

The docker container is based on Ubuntu 18.04 (x86-64). It contains all required resources (binaries, configuration files, scripts, Python restserver) to easily start the application and send a document to the application. The document then is processed and the results are synchronously returned.

Start-up:
cd docker docker-compose up

Example API call:
curl -X POST -H "Content-Type: application/pdf" http://localhost:8101/v1/process/somecorpusid/somedocumentid --data-binary "@sample.pdf" -o result.zip

Trustworthy AI

The Fraunhofer Table Extraction software is (1) lawful, as it respects all applicable laws and regulations (e. g. software licenses of used open source components), especially it is GDPR-compliant, (2) ethical, as it pursues the ethical goal of making information from documents easily accessible in digital form to the documents' owner, (3) robust, from a technical perspective, especially as it is deployed in a "ready-to-use" Docker container, to make processing documents as simple as possible.

GDPR Requirements

The Fraunhofer Table Extraction software allows the user to extract textual context from document images. The software itself is GDPR compliant. Documents are processed within a Docker container and all data remains on the user's local computer. However, the user must ensure that he has the authority to store and process the document, for example if it contains personal data or other sensitive, GDPR-relevant information.

Related Projects

AI4Media