The ROAD Challenge @ ICCV 2021

The ROAD Challenge: Event Detection for Situation Awareness in Autonomous Driving

Call for participation

https://sites.google.com/view/roadchallangeiccv2021/challenge

Aim of the Challenge

The goal of this Challenge is to put to the forefront of the research in autonomous driving the topic of situation awareness, intended as the ability to create semantically useful representations of dynamic road scenes, in terms of the notion of a road event.

The ROAD dataset

This concept is at the core of the new ROad event Awareness Dataset (ROAD) for Autonomous Driving

https://github.com/gurkirt/road-dataset

 

ROAD is the first benchmark of its kind, a multi-label dataset designed to allow the community to investigate the use of semantically meaningful representations of dynamic road scenes to facilitate situation awareness and decision making. 

It contains 22 long-duration videos (ca 8 minutes each) annotated in terms of “road events”, defined as triplets of Agent, Action and Location labels and represented as ‘tubes’, i.e., series of frame-wise bounding box detections.

 

The above GitHub repository contains all the necessary instructions to pre-process the 22 ROAD videos, unpack them to the correct directory structure and run the provided baseline model.

 

Tasks and Challenges

The Challenge considers three video-level detection Tasks:

T1. Agent detection, in which the output is in the form of agent tubes collecting the bounding boxes associated with an active road agent in consecutive frames.

T2. Action detection, where the output is in the form of action tubes formed by bounding boxes around an action of interest in each video frame.

T3. Road event detection, where by road event we mean a triplet (Agent, Action, Location) as explained above, once again represented as a tube of frame-level detections.

Each Task thus consists in regressing whole series (‘tubes’) of temporally-linked bounding boxes associated with relevant instances, together with their class label(s).

Baseline

As a baseline for all three detection tasks we propose a simple yet effective 3D feature pyramid network with focal loss, an architecture we call 3D-RetinaNet:

http://arxiv.org/abs/2102.11585

The code is publicly available on GitHub:

https://github.com/gurkirt/3D-RetinaNet

Timeframe

Challenge participants have 18 videos at their disposal for training and validation. The remaining 4 videos are to be used to test the final performance of their model. This will apply to all three Tasks.

The timeframe for the Challenge is as follows:

·        Training and validation fold release: April 30 2021

·        Test fold release: July 20 2021

·        Submission of results: August 10 2021

·        Announcement of results: August 12 2021

·        Challenge event @ workshop: October 10-17 2021

Evaluation

Performance in each task is measured by video mean average precision (video-mAP), with an Intersection over Union (IoU) detection threshold set to 0.1, 0.2 and 0.5. The final performance of each task will be determined by the equally-weighted average of the performances at the three thresholds.

In the first stage of the Challenge participants will, for each task, submit their predictions as generated on the validation fold and get the evaluation metric in return, in order to get a feel of how well their method(s) work. In the second stage they will submit the predictions generated on the test fold which will be used for the final ranking.

A separate ranking will be produced for each of the Tasks.

Evaluation takes place on the EvalAI platform: 

https://eval.ai/web/challenges/challenge-page/1059/overview 

For each Challenge stage and each Task the maximum number of submissions is capped at 50, with an additional constraint of 5 submissions per day.

Both comments and pings are currently closed.

Comments are closed.

Design by 2b Consult