Video Shot Detection

Segmentation of videos into smaller sections i.e. video shots

Library

Fraunhofer IAIS

Model in AI4EU Experiments

Pipeline in AI4EU Experiments

GPU enabled Model in AI4EU Experiments

GPU enabled Pipeline in AI4EU Experiments

Developed by

Fraunhofer-Gesellschaft

License

MIT license (MIT)

The underlying model (TransNet v2) has MIT License.

Main Characteristic

The shot detection system will detect the boundaries between video shots by detecting a change between visual scenes.

Input: A video file. For a more accurate result, all frames need to be assessed.
Output: Detection result will be a file where each row contains the start and the end frames of each shot in the video

Research areas

Physical AI

Technical Categories

AI services Computer vision

Keywords

Last updated

08.12.2023 - 11:46

Detailed Description

Model: The underlying model for the shot detection is a deep learning-based model called TransNetV2 [1]. This model has been trained on datasets with combination of real (15%) and synthetic (85%) shot transitions (cuts) created from two datasets IACC.3 [2] and ClipShots [3].

Evaluation: This model achieves the F1 score of 0.898 on TRECVID 2007 dataset. Annotations are provided by TRECVID and downloaded from their website. It appears that the ground truth annotations differ about 2 frames from the actual cuts. As a result, a tolerance of 2 frames is considered when applying the evaluation.

References:

[1] Souček, T., Lokoč, J. (2020). TransNet V2: An effective deep network architecture for fast shot transition detection. arXiv preprint arXiv:2008.04838.

[2] Awad, G., Butt, A., Fiscus, J., Joy, D., Delgado, A., McClinton, W., Michel, M., Smeaton, A., Graham, Y., Kraaĳ, W., Qu´enot, G., Eskevich, M., Ordelman, R., Jones, G., & Huet, B. (2018). Trecvid 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of TRECVID 2017. NIST, USA.

[3] Tang, S., Feng, L., Kuang, Z., Chen, Y., Zhang, W. (2019). Fast Video Shot Transition Localization with Deep Structured Models. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018

Trustworthy AI

The mining service is (1) lawful, as it respects all applicable laws and regulations (e. g. software licenses of used open source components), (2) ethical, as it pursues the ethical goal of making information from documents easily accessible in digital form to the documents' owner.

GDPR Requirements

The mining service allows the user to generate metadata for video files. The software itself is GDPR compliant. video files are processed locally and all data remains on the user's local computer. However, the user must ensure that he has the authority to store and process the file, for example if it contains personal data or other sensitive, GDPR-relevant information.

Related Projects

AI4Media Decentralized AI on simple social networks