Model: The underlying model for the shot detection is a deep learning-based model called TransNetV2 . This model has been trained on datasets with combination of real (15%) and synthetic (85%) shot transitions (cuts) created from two datasets IACC.3  and ClipShots .
Evaluation: This model achieves the F1 score of 0.898 on TRECVID 2007 dataset. Annotations are provided by TRECVID and downloaded from their website. It appears that the ground truth annotations differ about 2 frames from the actual cuts. As a result, a tolerance of 2 frames is considered when applying the evaluation.
 Souček, T., Lokoč, J. (2020). TransNet V2: An effective deep network architecture for fast shot transition detection. arXiv preprint arXiv:2008.04838.
 Awad, G., Butt, A., Fiscus, J., Joy, D., Delgado, A., McClinton, W., Michel, M., Smeaton, A., Graham, Y., Kraaĳ, W., Qu´enot, G., Eskevich, M., Ordelman, R., Jones, G., & Huet, B. (2018). Trecvid 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of TRECVID 2017. NIST, USA.
 Tang, S., Feng, L., Kuang, Z., Chen, Y., Zhang, W. (2019). Fast Video Shot Transition Localization with Deep Structured Models. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018