Authors:
Martin Ahrnbom
;
Ivar Persson
and
Mikael Nilsson
Affiliation:
Centre for Mathematical Sciences, Lund University, Sweden
Keyword(s):
Pose Estimation, Instance Segmentation, Convolutional Neural Network, Traffic Safety, Road Users, Tracking, Stereo Camera, Trinocular Camera Array, Traffic Surveillance.
Abstract:
A system we denote Seg2Pose is presented which converts pixel coordinate tracks, represented by instance segmentation masks across multiple video frames, into world coordinate pose tracks, for road users seen by static surveillance cameras. The road users are bound to a ground surface represented by a number of 3D points and does not necessarily have to be perfectly flat. The system works with one or more views, by using a late fusion scheme. An approximate position, denoted the normal position, is computed from the camera calibration, per-class default heights and the ground surface model. The position is then refined a novel Convolutional Neural Network we denote Seg2PoseNet, taking instance segmentations and cropping positioning as its input. We evaluate this system quantitatively both on synthetic data from CARLA Simulator and on a real recording from a trinocular camera. The system outperforms the baseline method of only using the normal positions, which is roughly equivalent of
a typical 2D to 3D conversion system, in both datasets.
(More)