Abstract
We describe a new image dataset, the Egocentric, Manual, Multi-Image (EMMI) dataset, collected to enable the study of how appearance-related and distributional properties of visual experience affect learning outcomes. Images in EMMI come from first-person, wearable camera recordings of common household objects and toys being manually manipulated to undergo structured transformations like rotation and translation. We also present results from initial experiments, using deep convolutional neural networks, that begin to examine how different distributions of training data can affect visual object recognition, and how the representation of properties like rotation invariance can be studied in novel ways using the unique properties of EMMI.