Tracking the Rubik’s Cube - Using point trackers for semi-automatic video annotation
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The need for extensive manual labor in annotating video datasets causes a scarcity of them. Without video datasets of good quality and sufficient diversity, the research and development of machine learning algorithms in computer vision is impeded. To challenge this problem, we apply two point tracker algorithms CoTracker and TAPIR as a semi-automatic method for annotating video data and compare them to the object detector YOLOv5. We also conduct an ad-hoc qualitative analysis examining the properties of video data where the point trackers fail and succeed in their tasks. The evaluation of the methods used two datasets: the TAP-Vid DAVIS benchmark, and a new dataset made for this thesis - the Rotating Rubik’s Cube dataset. With regards to intersection-over-union, position accuracy, and occlusion accuracy, YOLOv5 scored highest with TAPIR and CoTracker following respectively. The qualitative aspects in which the point trackers fail are due to camera movement, occlusions, and irregular light changes among others. However, given that CoTracker and TAPIR do not need to be trained before annotation, they can be used when no annotated data is available. Keywords: