CORe50
A new Dataset and Benchmark for Continual Learning and Object Recognition, Detection, Segmentation.
Continual/Lifelong learning (CL) of high-dimensional data streams is a challenging research problem far from being solved. CORe50 is a new dataset and benchmark specifically designed for assessing Continual Learning techniques in an Object Recognition context. This is provided along with a few baseline approaches for three different continual learning scenarios.
Number of elements: 50
Overall size of the datatsets: 25
Additiional information: One of the greatest goals of AI is building an artificial continual learning agent which can construct a sophisticated understanding of the external world from its own experience through the adaptive, goal-oriented and incremental development of ever more complex skills and knowledge. Yet, Continual/Lifelong learning (CL) of high-dimensional data streams is a challenging research problem far from being solved. In fact, fully retraining models each time new data becomes available is infeasible, due to computational and storage issues, while naïve continual learning strategies have been shown to suffer from catastrophic forgetting. Moreover, even in the context of real-world object recognition applications (e.g. robotics), where continual learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques.
In this asset we provide a new dataset and benchmark, specifically designed for assessing Continual Learning techniques in an Object Recognition context, along with a few baseline approaches for three different continual learning scenarios. Futhermore, we recently extended CORe50 to support object detection and segmentation.
CORe50, specifically designed for (C)ontinual (O)bject (Re)cognition, is a collection of 50 domestic objects belonging to 10 categories: plug adapters, mobile phones, scissors, light bulbs, cans, glasses, balls, markers, cups and remote controls. Classification can be performed at object level (50 classes) or at category level (10 classes). The first task (the default one) is much more challenging because objects of the same category are very difficult to be distinguished under certain poses. The dataset has been collected in 11 distinct sessions (8 indoor and 3 outdoor) characterized by different backgrounds and lighting. For each session and for each object, a 15 seconds video (at 20 fps) has been recorded with a Kinect 2.0 sensor delivering 300 RGB-D frames.