

We focused on evaluating the quality of the segmentation produced by the system, where decisions regarding the length, disposition, and display duration of the caption need to be taken so as to make the text easily readable.
Owing to the progress of underlying NLP technologies (speech to text, text normalization and compression, machine translation) automatic captioning technologies (ATC) both intra- and inter-lingual, are rapidly improving. ACTs are useful for many contents and contexts: from talks and lectures to news, fiction, and other entertaining content.
While historical systems are based on complex NLP pipelines, recent proposals are based on integrated (end-to-end) systems, which question standard evaluation schemes, where each module can be assessed independently from the others.
We focused on evaluating the quality of the output segmentation, where decisions regarding the length, disposition, and display duration of the caption need to be taken, all having a direct impact on the acceptability and readability. We notably studied ways to perform reference-free evaluations of automatic caption segmentation. We also correlated these « technology-oriented » metrics with user-oriented evaluations in typical use cases: post-editing and direct broadcasting.
Output
This Humane-AI-Net micro-project was carried out by Centre national de la recherche scientifique (CNRS, Francois Yvon) and Fondazione Bruno Kessler (FBK, Marco Turchi).