Fraunhofer Table Extraction
Fraunhofer Table Extraction software to extract table information from documents
Fraunhofer Table Extraction software to extract table information from documents. The software applies a multi-step approach and contains various heuristics for (1) detecting tables in documents, (2) segmenting tables, i. e. recognizing the row and column structure, and (3) interpreting tables, i. e. semantically understanding the contents of table. The software is deployed in a Docker container with a small REST API. Thus, a document can easily be sent to the API, the document then is processed and the results are sent back to the user. The results contain the contents of each table in various file formats (JSON, XML, CSV) as well as additional information and page images.
The docker container is based on Ubuntu 18.04 (x86-64). It contains all required resources (binaries, configuration files, scripts, Python restserver) to easily start the application and send a document to the application. The document then is processed and the results are synchronously returned.
Start-up:
cd docker docker-compose up
Example API call:
curl -X POST -H "Content-Type: application/pdf" http://localhost:8101/v1/process/somecorpusid/somedocumentid --data-binary "@sample.pdf" -o result.zip