Tracking the Rubik’s Cube - Using point trackers for semi-automatic video annotation
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The need for extensive manual labor in annotating video datasets causes a scarcity of
them. Without video datasets of good quality and sufficient diversity, the research
and development of machine learning algorithms in computer vision is impeded.
To challenge this problem, we apply two point tracker algorithms CoTracker and
TAPIR as a semi-automatic method for annotating video data and compare them
to the object detector YOLOv5. We also conduct an ad-hoc qualitative analysis
examining the properties of video data where the point trackers fail and succeed in
their tasks. The evaluation of the methods used two datasets: the TAP-Vid DAVIS
benchmark, and a new dataset made for this thesis - the Rotating Rubik’s Cube
dataset. With regards to intersection-over-union, position accuracy, and occlusion
accuracy, YOLOv5 scored highest with TAPIR and CoTracker following respectively.
The qualitative aspects in which the point trackers fail are due to camera movement,
occlusions, and irregular light changes among others. However, given that CoTracker
and TAPIR do not need to be trained before annotation, they can be used when no
annotated data is available.
Keywords:
Description
Keywords
Data science, machine learning, deep learning, point tracking, data annotation, object detection, thesis