Multiple-Camera Multiple-Object 3D Localization in Sports Videos

Publication Type:
Thesis
Issue Date:
2020
Full metadata record
Sports video analysis and object 3D detection are extensively studied problems in computer vision. As one of the most important scenarios of object detection in 3D, multiple-camera multiple-object 3D localization (MCMOL) in sports videos has recently drawn much attention in the research community due to the growing trend of object detection from monocular to multiview, i.e., from 2D to 3D. Due to heavy occlusion in crowded sports scenes and high-speed moving targets in sports games, MCMOL for sports objects tends to be extremely challenging. Existing solutions generally apply foreground extraction as input, design statistical or Convolutional Neural Network (CNN) models commonly to all visible targets to obtain objects’ coordinates and/or location encoding. However, ambiguous foreground masks and heavy occlusion limit their performance by a large margin. Moreover, the obtained coordinates cannot be associated or retrieved back to the particular objects. There is no one-to-one relationship between the outcomes and the objects to be detected. Thus, the false-positive and false-negative rates increase. To deal with the above-mentioned issues, in this thesis, we conduct comprehensive studies about the MCMOL problems in sports videos. Due to the challenges mentioned above, we develop three multi-camera multi-object 3D localization approaches that provide accurate, reliable, and distinguishable results. Firstly, we apply Convolutional Neural Network with Initialization Settings over the Probabilistic Occupancy Map (i.e., POM+CNN+IniSet). This approach applies CNN-based monocular segmentation jointly on multiple cameras and develops an indicative parameter initialization scheme for the Bayesian iteration model. Afterward, we propose the POM with Identification (PomID) method and introduce the DeepPlayer model including a Cascade Mask-RCNN model and a pose-guided partial feature embedding to conduct segmentation and identification simultaneously for multiple players. This method separately estimates locations for individuals with identified labels and the rest of the objects without specific identities. Finally, we propose the Probabilistic and Identified Occupancy Map (PIOM) method and develop an Image&ID model to mathematically describe the segmentation pixels and identification estimation as the likelihood probabilities. This method then creates a multi-dimensional Bayesian model to estimate the localization results as posterior occupancy probabilities with unique ID labels. Given the pre-defined prior probabilities, the Bayesian model is optimized by an efficient iterative convergence. Our work is the first attempt to take advantage of CNN-based object identification for object 3D localization applications. Experimental results demonstrate that our proposed framework improves the localization performance by a large margin and outperforms the state-of-the-art in MCMOL sports video scenarios.
Please use this identifier to cite or link to this item: