The goal of this research project is to develop a convolutional model to robustly count the number of people present in images and videos of extremely dense crowds, using the least amount of data possible for training.
Training a neural network that can robustly count individuals typically requires a huge amount of annotated data. The process of retrieving this data and tagging it is particularly onerous and expensive.
We develop a self-supervised algorithm based on an appropriate crowding degree sorting function. This allows us to train and adapt a convolutional network using very little human supervision.
Results and Benefits
The network trained using the self-supervised approach achieves, at a lower cost, a level of accuracy comparable to that of networks trained on large amounts of tagged data.