Video Shot Detection
Segmentation of videos into smaller sections i.e. video shots
The shot detection system will detect the boundaries between video shots by detecting a change between visual scenes.
- Input: A video file. For a more accurate result, all frames need to be assessed.
- Output: Detection result will be a file where each row contains the start and the end frames of each shot in the video
Model: The underlying model for the shot detection is a deep learning-based model called TransNetV2 [1]. This model has been trained on datasets with combination of real (15%) and synthetic (85%) shot transitions (cuts) created from two datasets IACC.3 [2] and ClipShots [3].
Evaluation: This model achieves the F1 score of 0.898 on TRECVID 2007 dataset. Annotations are provided by TRECVID and downloaded from their website. It appears that the ground truth annotations differ about 2 frames from the actual cuts. As a result, a tolerance of 2 frames is considered when applying the evaluation.
References:
[1] Souček, T., Lokoč, J. (2020). TransNet V2: An effective deep network architecture for fast shot transition detection. arXiv preprint arXiv:2008.04838.
[2] Awad, G., Butt, A., Fiscus, J., Joy, D., Delgado, A., McClinton, W., Michel, M., Smeaton, A., Graham, Y., Kraaij, W., Qu´enot, G., Eskevich, M., Ordelman, R., Jones, G., & Huet, B. (2018). Trecvid 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of TRECVID 2017. NIST, USA.
[3] Tang, S., Feng, L., Kuang, Z., Chen, Y., Zhang, W. (2019). Fast Video Shot Transition Localization with Deep Structured Models. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018