Object Tracking and Line Counting

Vladis Klimov

Published in

degirum

7 min readFeb 3, 2024

Multi-Object Tracking and Line Counting Using DeGirum SDK

Introduction

AI models for object detection on a video stream with multi-object tracking and line intersection counting represent an advanced application of computer vision that is widely employed in surveillance, traffic management, and various other domains. This sophisticated technology combines the capabilities of object detection, multi-object tracking, and counting to analyze the movement of multiple objects within a video stream and ascertain when they intersect with predefined lines or boundaries.

Multi-object tracking (MOT) algorithms play a crucial role in computer vision applications by enabling the simultaneous detection and tracking of multiple objects in a video sequence. These algorithms aim to maintain the identities of objects across frames, facilitating the analysis of object interactions, movements, and behaviors. Various MOT algorithms exist, each with unique characteristics and approaches.

DeGirum toolkit incorporates ByteTrack algorithm implementation for Multi-object tracking.

Key components of the object tracking and line counting application are:

Object Detection Model: utilizes a trained AI model for real-time identification and classification of objects in the video stream.
Multi-Object Tracking Algorithm: keeps track of individual objects over time, maintaining their trajectories as they move through different frames.
Line Definition: specifies one or more lines within the video frame where intersection counting will occur. These lines are typically defined by their coordinates.
Intersection Logic: implements algorithms to detect when and in which direction an object’s trajectory intersects with the defined lines. This triggers the counting mechanism.
Counting Mechanism: tracks the number of objects crossing the specified lines, incrementing the count whenever an intersection is detected, but only once for every trajectory.

Applications

Traffic Monitoring: counts and tracks vehicles as they move through intersections, helping in traffic flow analysis and congestion management.
Pedestrian Flow Analysis: monitors the movement of people at key points, aiding in crowd management and ensuring safety in public spaces.
Retail Analytics: tracks customer movements within a store, helping retailers understand foot traffic patterns and optimize store layouts.
Security Surveillance: identifies and counts individuals or objects crossing predefined security perimeters, enhancing surveillance capabilities.

Using DeGirum SDK for Object Tracking and Line Intersection Counting

Prerequisites

Assuming you have configured Python environment on your Windows, Linux, or MacOS computer, you need to install DeGirum PySDK and DeGirum Tools Python packages by running the following commands (follow this link for details):

pip install -U degirum
pip install -U degirum_tools

Alternatively, you can use Google Colab to run zone counting example Jupyter notebook provided by DeGirum.

Step-by-Step Guide

Import necessary packages

You need to import degirum and degirum_tools packages:

import cv2

import degirum as dg, degirum_tools

Select object detection AI model

As a starter, we will use YOLOv5s COCO model, which can detect 80 COCO classes. We will take this model from DeGirum cloud public model zoo.

Lets define the cloud zoo URL and the model name:

model_zoo_url = "https://cs.degirum.com/degirum/public"
model_name = "yolo_v5s_coco--512x512_quant_n2x_orca1_1"

Here cs.degirum.com is DeGirum DeLight Cloud Platform URL, and degirum/public is the path to DeGirum cloud public model zoo.

yolo_v5s_coco--512x512_quant_n2x_orca1_1 is the model name we will use for object detection. It is based on YOLOv5 Ultralytics model trained to detect 80 COCO classes and compiled for DeGrium ORCA1 AI hardware accelerator.

Define video source

For simplicity, we will use short highway traffic video from DeGirum PySDK examples GitHub repo:

video_source 
="https://github.com/DeGirum/PySDKExamples/raw/main/images/Traffic.mp4"

But you can use any video file you want. If you run the code locally and your computer has video camera, you may use video camera as a video source:

video_source = 0 # specify index of local video camera

Obtain cloud API access token

In order to use AI models from DeGirum Cloud Platform, you need to register and generate cloud API access token. Please follow these instructions. Registration is free.

Connect to model zoo and load the model

# connect to AI inference engine
zoo = dg.connect(dg.CLOUD, model_zoo_url, "<cloud API token>")

# load model
model = zoo.load_model(model_name)

Here we connect to DeGirum Cloud Platform to run AI model inferences (by using dg.CLOUD parameter) and to cloud model zoo specified by model_zoo_url using cloud API access token obtained on the previous step. Then we load a model specified by model_name.

For more inference options please refer to this documentation page.

Define intersection lines

For each intersection line you need to define the four-element tuple (x1, y1, x2, y2) of line endpoints pixel coordinates. Then you define a list containing all line tuples:

lines = [(120, 430, 870, 430), (860, 80, 860, 210)]

Here we defined two lines.

Define object tracker

We will use degirum_tools.ObjectTracker object from DeGirum Tools package, which implements ByteTrack MoT algorithm:

tracker = degirum_tools.ObjectTracker(
    class_list=["car"],
    track_thresh=0.35,
    track_buffer=100,
    match_thresh=0.9999,
    trail_depth=20,
    anchor_point=degirum_tools.AnchorPoint.BOTTOM_CENTER,
)

Here we specify the list of classes we want to track: class_list=["car"]. You may omit this parameter to track all classes the model reports.

We can specify various parameters of ByteTrack algorithm to fine-tune its performance:

track_thresh: AI object detection model confidence threshold for track activation;
track_buffer: number of frames to buffer when a track is lost;
match_thresh: IOU threshold for matching tracks with detections.

Also we specify the tracing parameters:

trail_depth: number of frames in object trail to keep; 0 to disable tracing;
anchor_point: bounding box anchor point to be used for tracing object trails.

We specify the anchor point to be the center of the bottom edge of the object bounding box: triggering_position=degirum_tools.AnchorPoint.BOTTOM_CENTER. Possible triggering positions are all four vertices and all four centers of bounding box edges.

Define line counting object

We will use degirum_tools.LineCounter object from DeGirum Tools package, which provides line intersection detection and counting functionality:

line_counter = degirum_tools.LineCounter(lines)

We initialize line counter passing the list of lines we defined on earlier step.

Define inference loop

with degirum_tools.Display("AI Camera") as display, \
    degirum_tools.open_video_stream(video_source) as stream:

    for result in model.predict_batch(
        degirum_tools.video_source(stream)
    ):
        tracker.analyze(result)
        line_counter.analyze(result)

        img = result.image_overlay
        img = tracker.annotate(result, img)
        img = line_counter.annotate(result, img)
        display.show(img)

To observe live results, we open the interactive display using degirum_tools.Display class.

Then we open the video stream using degirum_tools.open_video_stream.

Then we supply video source to the input of model.predict_batch method, which performs AI model inference on each frame retrieved from the video stream in effective pipelined manner.

The result object contains the inference results, which are stored in the result.results list.

tracker.analyze(result) call applies the object tracking algorithm to object detection inference results, adding object tracking results back to result object.

line_counter.analyze(result) call applies the line counting algorithm to object tracking results, obtained on the previous step, again adding new information back to result object.

After all these steps the result object contains three sets of information:

object detection inference results,
object tracking results,
line counting results.

The result.image_overlay method draws AI annotations on a top of original video frame image. These annotations include bounding boxes of all detected objects.

To display the results of object tracking and line counting, we use annotate methods of tracker and line_counter objects.

Finally, we call display.show(img) to display fully annotated image in OpenCV interactive window.

To simplify the boilerplate code you may use degirum_tools.predict_stream function, which effectively performs the same steps as above:

with degirum_tools.Display("AI Camera") as display:
    for result in degirum_tools.predict_stream(model, \
        video_source, analyzers=[tracker, line_counter]):

        display.show(result)

Access object tracking results

The object tracker provides two kinds of results:

unique object tracking IDs,
object trajectories (or traces).

Object tracking IDs are stored in elements of result.results list. Each such element is a dictionary initially containing AI model object detection results such as scores, bounding box coordinates etc. For each tracked object the object tracker adds a key-value pair to the corresponding element of result.results list. The key is "track_id" and the value is the unique object tracking ID. Please note that not all detected objects are assigned with IDs, so you need to check, if "track_id" key is present or not.

The following code iterates over all detections and extracts tracking IDs:

for detection in result.results:
    if (track_id := detection.get("track_id", None)) is not None:
        # this `detection` has object tracking ID `track_id`

If you enabled storing traces (you specified non-zero trail_depth parameter in degirum_tools.ObjectTracker constructor), then the object tracker will also add trails dictionary to the result object.

The result.trails dictionary contains traces of all tracked objects. This dictionary is keyed by the track IDs and contains lists of trail (x,y) coordinates for every active trail.

The following code illustrates how to display all trails with at least two points, using OpenCV polylines function:

all_trails = [
    np.array(trail) 
    for id, trail in result.trails.items() 
    if len(trail) > 1
]
cv2.polylines(img, all_trails, False, line_color)

Access line counting results

Line counting results are stores in the result.line_counts list, one element per each defined line. Each element of this list is degirum_tools.LineCounts dataclass object, which has the following integer attributes:

left: number of trajectories intersected a line from frame left to frame right,
right: from right to left,
top: from top to bottom,
bottom: from bottom to top.

The whole code of the example described above is available in this Jupyter notebook

The following is the sample screenshot of a live video produced by that example: