Single-View 3D Tracking in DeepStream

Introduction

This sample application demonstrates the single-view 3D tracking with DeepStream SDK. Given the camera matrix and human model of a static camera, this application estimates and keeps tracking of object states in the 3D physical world. It can recover the complete bounding box, foot location and body convex hulls precisely from partial occlusions. For algorithm and setup details, please refer to DeepStream Single View 3D Tracking Documentation.

The currnet SV3DT configuration in this sample uses an 2D pose estimator, which estimates human key-points on the 2D image plane. The SV3DT algorithm uses height and waist key-points as anchor to precisely estimate each person's 3D human height. If setting poseEstimatorType: 0 in configs/config_tracker_NvDCF_accuracy_3D.yml, pose estimator will be disabled, and the algorithm uses 2D detection bounding boxes and a human model with fixed height. It estimates the 3D location by matching the head with 2D bounding box's top edge.

Prerequisites

This sample application can be run on both x86 and Jetson platforms inside DeepStream container. Check here for DeepStream container setup.

Download the latest DeepStream container image from NGC (e.g., DS 9.0 in the example below)

export DS_IMG_NAME="nvcr.io/nvidia/deepstream:9.0-triton-multiarch"
docker pull $DS_IMG_NAME

Git clone the current deepstream_reference_apps repository to the host machine, and enter single-view 3D tracking directory inside the repository.
```
git clone https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps.git
cd deepstream_reference_apps/deepstream-tracker-3d
```

Download NVIDIA pretrained PeopleNet for 2D detection.

# current directory: deepstream_reference_apps/deepstream-tracker-3d
mkdir -p models/PeopleNet
cd models/PeopleNet; wget --no-check-certificate --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/deployable_quantized_onnx_v2.6.3/zip -O peoplenet_deployable_quantized_onnx_v2.6.3.zip; unzip peoplenet_deployable_quantized_onnx_v2.6.3.zip

The model files are now stored in PeopleNet directory as

deepstream-tracker-3d
├── configs
├── streams
└── models
    └── PeopleNet
        ├── labels.txt
        ├── resnet34_peoplenet.onnx
        └── resnet34_peoplenet_int8.txt

Running the Application

Launch the container from current directory, and execute the 3D tracking pipeline inside the container. The current config requires users to run with a display because it uses EGL sink to visualize the overlay results. To run through ssh without display, please change type=2 to 1 in group [sink0] in that file. Users can check DeepStream sink group for the usage of each sink.

cd ../..
sudo xhost + # give container access to display
# current directory: deepstream_reference_apps/deepstream-tracker-3d
docker run --runtime=nvidia -it --rm --net=host --privileged -v /tmp/.X11-unix:/tmp/.X11-unix -v $(pwd):/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-tracker-3d -e DISPLAY=$DISPLAY $DS_IMG_NAME

Inside container, run the following commands. Please note that when deepstream-app is launched for the first time, it tries to create model engine files, which may take a couple minutes, depending on HW platforms.

# Install prerequisites
cd /opt/nvidia/deepstream/deepstream/
bash user_additional_install.sh

# Download ReID and BodyPose3DNet model
export MODEL_DIR="/opt/nvidia/deepstream/deepstream/samples/models/Tracker"
mkdir -p $MODEL_DIR
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/reidentificationnet/versions/deployable_v1.0/files/resnet50_market1501.etlt' -P $MODEL_DIR
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/bodypose3dnet/versions/deployable_accuracy_onnx_1.0/files/bodypose3dnet_accuracy.onnx' -P $MODEL_DIR

# Run 3D tracking pipeline
cd /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-tracker-3d/configs
mkdir -p track_results
deepstream-app -c deepstream_app_source1_3d_tracking.txt

Output Retrival and Visualization

DeepStream Direct Visualization

When the pipeline is launced, DeepStream shows the output video like below while processing the input video. The 3D bounding boxes of the people are reconstructed and plotted even though there are partial occlusions. The result video is saved as out.mp4.

3D Metadata Processing and Visualization

The extracted metadata (e.g., bounding box, frame num, target ID, etc.) is saved in both extended MOT and KITTI format. For detailed explanation, please refer to DeepStream Tracker Miscellaneous Data Output.

The MOT results can be found in track_dump_0.txt file, which contains all the object metadata for one stream, and the data format is defined below. The foot image position and the convex hull of the projected cylindrical human model are defined in video frame coordinates, and can be used to draw the visualization figures below. Users can create such overlay video or image using their favorite tools like OpenCV. The foot world position is defined in the 3D world ground plane corresponding to the 3x4 camera projection matrix.

frame number(starting from 1)	object unique id	bbox left	bbox top	bbox width	bbox height	confidence	Foot World Position X	Foot World Position Y	blank	class id	tracker state	visibility	Foot Image Position X	Foot Image Position Y	ConvexHull Points Relative to bbox center
unsigned int	long unsigned int	int	int	int	int	float	float	float	int	unsigned int	int	float	float	float	int separated by vertical bar

Sample output is like below. The green dot in the visualization figure is plotted as (Foot Image Position X, Foot Image Position Y), and the cylinder is plotted by connecting convex hull points. For example, the bbox in the first line of the output file has center (1433, -50). Then its convex hull points are (1433 - 94, -50 - 170), (1433 - 87, -50 - 176), ..., (1433 - 23, -50 + 190). Users can plot the foot location and convex hull on their own as shown in the figure below.

1,1,1366,-195,134,290,1.000,-171.080,1058.295,-1,0,2,0.220,1458,80,-94|-170|-87|-176|-71|-183|-49|-191|-23|-196|0|-200|18|-201|29|-198|95|165|95|173|85|183|66|191|42|198|16|202|-4|201|-18|197|-23|190
2,1,1365,-194,135,290,0.989,-170.646,1045.254,-1,0,2,0.230,1458,84,-94|-170|-87|-176|-71|-183|-49|-191|-23|-196|0|-200|18|-201|29|-198|95|165|95|173|85|183|66|191|42|198|16|202|-4|201|-18|197|-23|190
3,1,1366,-196,134,290,0.860,-170.679,1054.089,-1,0,2,0.229,1458,82,-94|-170|-87|-176|-71|-183|-49|-191|-23|-196|0|-200|18|-201|29|-198|95|165|95|173|85|183|66|191|42|198|16|202|-4|201|-18|197|-23|190
...

The KITTI results can be found in track_results folder. A file will be created for each frame in each stream, and the data format is defined below.

object Label	object Unique Id	blank	blank	blank	bbox left	bbox top	bbox right	bbox bottom	blank	blank	blank	blank	blank	blank	blank	confidence	visibility (optional)	Foot Image Position X (optional)	Foot Image Position Y (optional)
string	long unsigned	float	int	float	float	float	float	float	float	float	float	float	float	float	float	float	float	float	float

Each frame is saved as track_results/00_000_xxxxxx.txt. Sample output of a frame is like below. Note that if a object is found in past frame data, it wouldn't have visibility and foot position in KITTI dump.

person 1 0.0 0 0.0 1365.907227 -196.875290 1500.249146 93.554581 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.890186 0.229870 1458.166992 81.585060
person 6 0.0 0 0.0 1419.655151 72.774818 1647.446167 561.540894 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.841420 0.408513 1575.264160 531.851746
person 0 0.0 0 0.0 1008.387817 36.228714 1202.886353 421.609314 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.632065 0.504738 1148.117188 399.992645
...

Customizing the Video

To run the single view 3D tracking on other videos, the following changes are required.

In deepstream_app_source1_3d_tracking.txt, change uri=file://../streams/Retail02_short.mp4 to the new video name, width=1920, height=1080, tracker-width=1920, tracker-height=1080 to the new video's resolution.
Generate camera projection matrix for the new video. Change projectionMatrix_3x4 in camInfo.yml into the new matrix.

Note: Multiple streams can run at the same time. Add all the sources to deepstream_app_source1_3d_tracking.txt, and all the camera information files to config_tracker_NvDCF_accuracy_3D.yml.

  cameraModelFilepath:  # In order of the source streams
    - 'camInfo-01.yml'
    - 'camInfo-02.yml'
    - ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-View 3D Tracking in DeepStream

Introduction

Prerequisites

Running the Application

Output Retrival and Visualization

DeepStream Direct Visualization

3D Metadata Processing and Visualization

Customizing the Video

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Single-View 3D Tracking in DeepStream

Introduction

Prerequisites

Running the Application

Output Retrival and Visualization

DeepStream Direct Visualization

3D Metadata Processing and Visualization

Customizing the Video