The First Continual Semi-Supervised Learning Challenge @ IJCAI 2021


The First Continual Semi-Supervised Learning Challenge

Call for participation

The Challenge is organised as part of the upcoming IJCAI 2021 First International Workshop on Continual Semi-Supervised Learning

https://sites.google.com/view/sscl-workshop-ijcai-2021/

Aim of the Workshop

Whereas continual learning has recently attracted much attention in the machine learning community, the focus has been mainly on preventing the model updated in the light of new data from ‘catastrophically forgetting’ its initial knowledge and abilities. This, however, is in stark contrast with common real-world situations in which an initial model is trained using limited data, only to be later deployed without any additional supervision. In these scenarios the goal is for the model to be incrementally updated using the new (unlabelled) data, in order to adapt to a target domain continually shifting over time.

The aim of this workshop is to formalise this new continual semi-supervised learning paradigm, and to introduce it to the machine learning community in order to mobilise effort in this direction. We present the first two benchmark datasets for this problem, derived from significant computer vision scenarios, and propose the first Continual Semi-Supervised Learning Challenges to the research community.

Problem Statement

In semi-supervised continual learning, an initial training batch of data points annotated with ground truth (class labels for classification problems, or vectors of target values for regression ones) is available and can be used to train an initial model. Then, however, the model is incrementally updated by exploiting the information provided by a time series of unlabelled data points, each of which is generated by a data generating process (modelled, as typically assumed, by a probability distribution) which may vary with time, without any artificial subdivision into ‘tasks’.

Challenges

We propose both a continual activity recognition (CAR) challenge and a continual crowd counting (CCC) challenge.

https://sites.google.com/view/sscl-workshop-ijcai-2021/challenges

In the former, the aim is to devise a learning mechanism for updating a baseline action recognition method (working at frame level) based on a data stream of video frames, of which only the initial fraction is labelled (a classification problem).

In the latter, the learning mechanism is applied to a baseline crowd counting method, also working on a frame-by-frame basis, and exploits a data stream of video frames of which only an initial fraction come with ground truth attached in the form of a density map (a regression problem).

Benchmark Datasets

As a benchmark for the continual activity recognition challenge we have created a Continual Activity Recognition (CAR) dataset, derived from a fraction of the MEVA (Multiview Extended Video with Activities) activity detection dataset (https://mevadata.org/). We selected a suitable set of 8 activity classes from the original list of 37, and annotated each frame in 15 video sequences, each composed by 3 clips originally from MEVA, with a single class label.

Our CAR benchmark is thus composed of 15 sequences, broken down into three groups:

· Five 15-minute-long sequences formed by three original videos which are contiguous.

· Five 15-minute-long sequences formed by three videos separated by a short time interval (5-20 minutes).

· Five 15-minute-long sequences formed by three original videos separated by a long interval of time (hours or even days).

Each of these three evaluation settings is designed to simulate a different mix of continuous and discrete dynamics of the domain distribution.

The raw video sequences are directly accessible from the Challenge website.

Our CCC benchmark is composed of 3 sequences, taken from existing crowd counting datasets:

· A single 2,000 frame sequence from the Mall dataset.

· A single 2,000-frame sequence from the UCSD dataset.

· A 750-frame sequence from the Fudan-ShanghaiTech (FDST) dataset, composed of 5 clips portraying the same scene each 150 frames long.

Ground truth

The ground truth for the CAR challenges (in the form of one activity label per frame) was created by us, after selecting a subset of 8 activity classes and revising the original annotation for the 45 video clips we selected for inclusion.

The ground truth for the CCC challenges (in the form of a density map for each frame) was generated by us for all three datasets following the annotation protocol described in

https://github.com/svishwa/crowdcount-mcnn

The ground truth for both challenges will be released on the Challenge web site according to the following schedule:

·        Training and validation fold release: May 5 2021

·        Test fold release: June 30 2021

·        Submission of results: July 15 2021

·        Announcement of results: July 31 2021

·        Challenge event @ workshop: August 21-23 2021

Tasks

For each challenge we propose two separate tasks, incremental and absolute.

CAR-A: The goal is to achieve the best average performance across all the unlabelled test portion of the 15 sequences in the CAR dataset. The choice of the baseline action recognition model is left to the participants.

CAR-I: The goal here is to achieve the best performance improvement over the baseline (supervised) model, measured on the unlabelled test data stream, on average over the 15 sequences. The baseline recognition model is set by us (see Baselines below).

CCC-A: Seeks the best average performance over the unlabelled test portion of the 3 sequences of the CCC dataset.  The choice of the baseline crowd counting model is left to the participants.

CCC-I: Seeks the best performance improvement over the baseline, measured on the unlabelled test portion of the data stream, on average over the 3 sequences of CCC. The baseline crowd counting model is set by us (see Baselines below).

Protocol for Incremental Training and Testing

Following from the problem definition, once a model is fine-tuned on the supervised portion of a data stream it is subsequently both incrementally updated using the unlabelled portion of the same data stream and tested there, using the available ground truth (encapsulated in an evaluation script).

Importantly, incremental training and testing must happen independently for each sequence, as we intent to simulate real-world scenarios in which a smart device with continual learning capability can only learn from its own data stream after deployment.

The two challenges differ in the sense that, whereas in CAR the baseline activity recognition model is initially fine-tuned using the supervised folds of all the 15 available sequences jointly (since each sequence only portrays a subset of the 8 activity classes), in CCC the supervised fine-tuning happens sequence by sequence (because of the disparate nature of the videos captured in different settings).

Split into Training, Validation and Test

The data for the challenges are released in two Stages:

1.       We first release the supervised portion of each data stream, together with a portion of the unlabelled data stream to use for the validation of the semi-supervised continual learning approach proposed by the participants.

2.       The remaining portion of the unlabelled data stream for each sequence in the dataset is released at a later stage to be used for the testing of the proposed approach.

Consequently, each data stream (sequence) in our benchmarks is divided into a supervised fold (S), a validation fold (V) and a test fold (T).

For the CAR challenge, the supervised fold for each sequence coincides with the first 5-minute video, the validation fold with the second 5-minute video, and the test fold with the third 5-minute video.

For the CCC challenge we distinguish two cases. For the 2,000-frame sequences from either the UCSD or the Mall dataset, S is formed by the first 400 images, V by the following 800 images, and T by the remaining 800 images. For the 750-frame sequence from the FDST dataset, S is the set of the first 150 images, V the set of the following 300 images, and T the set of the remaining 300 images.

Evaluation

Participants will be able to evaluate the performance of their method(s) on both the incremental and the absolute versions of the challenges on eval.ai.

In Stage 1 participants will, for each task (CAR-A, CAR-I, CCC-A, CCC-I), submit their predictions as generated on the validation folds and get the evaluation metric in return, in order to get a feel of how well their method(s) work. In Stage 2 they will submit the predictions generated on the test folds which will be used for the final ranking.

A separate ranking will be produced for each of the tasks.

For each of the challenge stages and each task the maximum number of submissions through the EvalAI platform is capped at 50, with an additional constraint of 5 submissions per day.

Detailed instructions about how to download the data and submit your predictions for evaluation at both validation and test time, for all four tasks, are provided here:

https://sites.google.com/view/sscl-workshop-ijcai-2021/challenges

Baselines

Baselines are provided regarding the initial action recognition model to be used in CAR-A, the base crowd counter to be used in CCC-A, and the semi-supervised incremental learning process itself.

Baseline activity recognition model

As our activity recognition baseline model we adopted the recent EfficientNet[1] network (model EfficientNet-B5), pre-trained on the large-scale ImageNet dataset. Detailed information about its implementation, along with pre-trained models, can be found on Github

https://github.com/lukemelas/EfficientNet-PyTorch

and is easily downloadable using the Python command “pip” (pip install efficientnet-pytorch).

Note that the performance of the baseline activity model is rather poor on our challenge videos, as relevant activities occupy only a small fraction of the duration of the videos. This leaves much room for improvement while properly representing the level of challenge real-world data poses.

Baseline crowd counter

As baseline crowd counting model we selected the Multi-Column Convolutional Neural Network (MCNN)[2]. Its implementation, along with pre-trained models, can also be found on Github

https://github.com/svishwa/crowdcount-mcnn

This network is implemented using PyTorch. Pre-trained models are available for both the ShanghaiTech A and the ShanghaiTech B datasets. For this Challenge we chose to adopt the ShanghaiTechB pre-trained model.

Baseline incremental learning approach

Finally, our baseline for incremental learning from unlabelled data stream is based on a vanilla (batch) self-training approach.

For each sequence, the unlabelled data stream (without distinction between validation and test folds) is partitioned into a number of sub-folds. Each sub-fold spans 1 minute in the CAR challenges, so that each unlabelled sequence is split into 10 sub-folds. Sub-folds span 100 frames in the CCC challenges, so that the UCSD and MALL (unlabelled) data streams are decomposed into 16 sub-folds whereas the FDST sequence only contains 6 sub-folds.

Starting with the model initially fine-tuned on the supervised portion of the data stream, self-training is iteratively applied in a batch fashion to each sub-fold. The predictions generated by the model obtained after convergence upon a sub-fold are the baseline predictions for the current sub-fold. The output of each self-training session is used as start model for the following session.

Reproducibility

Participants will be clearly told that we reserve the right to reproduce their results and check their validity. We will certainly reproduce the results of the challenge winners.



[1] Tan, M., & Le, Q. (2019, May). EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning (pp. 6105-6114). PMLR.

[2] Zhang, Yingying, et al. “Single-image crowd counting via multi-column convolutional neural network.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. 

Both comments and pings are currently closed.

Comments are closed.

Design by 2b Consult