Tracking the Rubik’s Cube - Using point trackers for semi-automatic video annotation

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The need for extensive manual labor in annotating video datasets causes a scarcity of them. Without video datasets of good quality and sufficient diversity, the research and development of machine learning algorithms in computer vision is impeded. To challenge this problem, we apply two point tracker algorithms CoTracker and TAPIR as a semi-automatic method for annotating video data and compare them to the object detector YOLOv5. We also conduct an ad-hoc qualitative analysis examining the properties of video data where the point trackers fail and succeed in their tasks. The evaluation of the methods used two datasets: the TAP-Vid DAVIS benchmark, and a new dataset made for this thesis - the Rotating Rubik’s Cube dataset. With regards to intersection-over-union, position accuracy, and occlusion accuracy, YOLOv5 scored highest with TAPIR and CoTracker following respectively. The qualitative aspects in which the point trackers fail are due to camera movement, occlusions, and irregular light changes among others. However, given that CoTracker and TAPIR do not need to be trained before annotation, they can be used when no annotated data is available. Keywords:

Description

Keywords

Data science, machine learning, deep learning, point tracking, data annotation, object detection, thesis

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By