ObjectGraphs
Using Objects and a Graph Convolutional Network for the Bottom-up Recognition and Explanation of Events in Video

ObjectGraphs is a novel bottom-up video event recognition and explanation approach. It utilizes a rich frame representation and the relations between objects within each frame. Specifically, following the application of an object detector (OD) on the frames, graphs are used to model the object relations and a graph convolutional network (GCN) is utilized to perform reasoning on the graphs. The resulting object-based frame-level features are then forwarded to a long short-term memory (LSTM) network for video event recognition. Moreover, the weighted in-degrees (WiDs) derived from the graph’s adjacency matrix at frame level are used for identifying the objects that were considered most (or least) salient for event recognition and contributed the most (or least) to the final event recognition decision, thus providing an explanation for the latter.
Video Event Detection with GCNs
This repository hosts the code and data for our paper: N. Gkalelis, A. Goulas, D. Galanopoulos, V. Mezaris, "ObjectGraphs: Using Objects and a Graph Convolutional Network for the Bottom-up Recognition and Explanation of Events in Video", Proc. 2nd Int. Workshop on Large Scale Holistic Video Understanding (HVU) at the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2021.
For detailed information about
- Code requirements
- Video preprocessing
- Training
- Evaluation
- Usage
- Provided features
- License and Citation
- References
please refer to the repository on GitHub.
Acknowledgements
This work was supported by the EU Horizon 2020 programme under grant agreements 832921 (MIRROR) and 951911 (AI4Media).