Evaluating segmentation in automatic captioning systems

We focused on evaluating the quality of the segmentation produced by the system, where decisions regarding the length, disposition, and display duration of the caption need to be taken so as to make the text easily readable.

Micro-project in Humane-AI-Net

github.com

Short description

Owing to the progress of underlying NLP technologies (speech to text, text normalization and compression, machine translation) automatic captioning technologies (ATC) both intra- and inter-lingual, are rapidly improving. ACTs are useful for many contents and contexts: from talks and lectures to news, fiction, and other entertaining content.

While historical systems are based on complex NLP pipelines, recent proposals are based on integrated (end-to-end) systems, which question standard evaluation schemes, where each module can be assessed independently from the others.

We focused on evaluating the quality of the output segmentation, where decisions regarding the length, disposition, and display duration of the caption need to be taken, all having a direct impact on the acceptability and readability. We notably studied ways to perform reference-free evaluations of automatic caption segmentation. We also correlated these « technology-oriented » metrics with user-oriented evaluations in typical use cases: post-editing and direct broadcasting.

Additional information

Output

Survey of existing segmentation metrics
Design of a contrastive evaluation set
Implementantion and comparison of metrics on multiple languages / tasks
Publication: Alina Karakanta, François Buet, Mauro Cettolo, François Yvon (2022). "Evaluating Subtitle Segmentation for End-to-end Generation Systems", to appear in the Proceedings of the International Language Resource and Evaluation Conference (LREC), Marseille, France (June, 2022).
https://cirrus.universite-paris-saclay.fr/s/22D9e8RLQ3crXYo

This Humane-AI-Net micro-project was carried out by Centre national de la recherche scientifique (CNRS, Francois Yvon) and Fondazione Bruno Kessler (FBK, Marco Turchi).

Technical categories

Natural language processing

Technology Readiness Level

TRL 1-2 (Basic research)

Keywords

Lead institution

LISN, CNRS & Université Paris-Saclay

Last updated

Tue, 05/17/2022 - 14:01

Contact details

François Yvon (prenom.nom@limsi.fr)

Assets related to Evaluating segmentation in automatic captioning systems

Research Institution