Getting Started With DISTIL & Active Learning

DECILE
7 min readApr 22, 2021
DISTIL_LOGO

In this Article

  1. Introduction
  2. Incorporating Custom Models & Data with DISTIL
  3. DISTIL Workflow
  4. Code Walk-through
  5. Conclusion
  6. Video Explanation
  7. Resources
  8. Publications

1. Introduction

Distil is a toolkit in PyTorch which provides access to different active learning algorithms. Active Learning (AL) helps in reducing labeling cost and also reduces training time and resources. AL helps in selecting only the required data and experiments show that using only 10% of data for training can reach accuracy levels close to the levels reached when using the entire dataset.

This article provides a step by step explanation of how to use DISTIL along with your existing pipelines. Different active strategies supported by DISTIL are listed below:

  1. Uncertainty Sampling
  2. Margin Sampling
  3. Least Confidence Sampling
  4. FASS
  5. BADGE
  6. GLISTER ACTIVE
  7. CoreSets based Active Learning
  8. Random Sampling
  9. Submodular Sampling
  10. Adversarial Bim
  11. Adversarial DeepFool
  12. Baseline Sampling
  13. BALD
  14. Kmeans Sampling

The documentation for the same can be found at: DISTIL Documentation

2. Incorporating Custom Models & Data with DISTIL

There are two main things that needs to be incorporated in the code before using DISTIL.

  • Model
DISTIL Code Snippet 1
  1. The model should have a function get_embedding_dim which returns the number of hidden units in the last layer.
  2. The forward function should have a boolean flag “last” where:

if True: It should return the model output and the output of the second last layer
if False: It should only return the model output.

  • Data Handler

Since active learning works with data without labels, default data handlers cannot be used. The custom data handler should have following support:

  1. Your data handler class should have a Boolean parameter “use_test_transform=False” with default value False. This parameter is used to identify if handler is used for testing data.
  2. The data handler class should have a Boolean parameter “select” with default value True:

If True: It should return only X and not Y (used by active learning strategies)

If False: It should return both X and Y (used while training the model)

That’s it folks! Just a couple of changes to get your model ready for DISTIL.

3. DISTIL Workflow

Now we are ready to work with DISTIL. This section will describe the step by step workflow of DISTIL and how active learning exactly works.

Budget: It is the number of points added in the training set after every iteration. This needs to be decided before the training is initiated.

The yellow boxes in the flow chart denote the initial loop and it runs only once during the starting of the process. Let the budget be denoted by n.

  1. There is a set of unlabeled data which needs to be used for training the model.
  2. First n random points are selected for the initial round of training.
  3. These points needs to be manually labeled
  4. The model is trained with these labelled data.
  5. After the training is completed, DISTIL selects the next set of n data points based on hypothesized labels, gradient embeddings, etc. depending on the active learning algorithm chosen.
  6. These new selected points are labeled and added to the training data
  7. The model is trained again with the new training data.
  8. Repeat steps 5–7 until it reaches the desired testing accuracy or the points in training data reaches the threshold decided.

4. Code Walk-through

Based, on the above steps, let’s go through the code step by step based on the example provided here: DISTIL Example Code.

Step 1:

DISTIL Code Snippet 3

Loading the unlabeled data. Lines 60–63 loads the data which is in libsvm format. This is just an example, you can load data of your choice.

Step 2:

DISTIL Code Snippet 4

Line 96–99, first set of random points are selected for the initial loop of training. In line 100, the points selected from the training are removed from the unlabeled set.

Step 3:

DISTIL Code Snippet 5

Note: Here we have assumed we already have the labels of the dataset. In an ideal scenario, these labels won’t be present and need to be labelled manually.

In line, 102, labeling of the selected data points is being done.

Step 4:

DISTIL Code Snippet 6

In line 108–110, DISTIL object is initiated with Glister strategy. DISTIL provides support for various active learning strategies such as FASS, Margin Sampling, BADGE, BALD, etc. and it can be selected in this step.

DISTIL Code Snippet 7

In line 120–122, the model is trained with the current labeled training set. DISTIL focuses on decoupling training from active learning. The training loop is completely in the hand of the user and has no restrictions on the way the model is trained. Thus, after training the model, DISTIL needs to be made aware of the current model state. In line 123, the model state in DISTIL is updated.

Step 5:

DISTIL Code Snippet 8

In line 135, DISTIL is called to choose a new set of points using the select point and passing the budget which is the number of points to be selected. Assuming that new points will be required to be labeled might take some time, and the loop is not continuous, the state of DISTIL needs to be maintained before training the model for the new training set. In line 137, save_state function is called which saves the current state of DISTIL and can be loaded again before starting the next training iteration.

Step 6:

DISTIL Code Snippet 9

In line 140–141, the new selected data points are added to the training set and deleted from the unlabeled data pool. In line 144–145, the new chosen points are labeled. As explained in the above step, since labeling might take some time, the DISTIL state is saved beforehand. In line 151, the previous DISTIL state is loaded and since training and active learning are decoupled, In line 152–153, the new training data is being updated in DISTIL as well as the training class.

Step 7:

DISTIL Code Snippet 10

In line 155, the model is trained with the updated training set and in line 156, the new model state is updated in DISTIL using the update_model method of the DISTIL object.

Step 8:

DISTIL Code Snippet 11

Step 5–7 are repeated until a stopping criterion is met. In the example.py, the stopping criterion is the number of rounds or if testing accuracy crosses 98%.

5. Conclusion

Thus, DISTIL can be easily incorporated in your code as it focuses on the following principles:

  1. Minimal changes to add it to the existing training structure.
  2. Independent of the training strategy used.
  3. Achieving similar test accuracy with less amount of training data.
  4. Huge reduction in labelling cost and time.
  5. Access to various active learning strategies with just one line of code.

For latest discussions join the Decile_DISTIL_Dev group.

You can also refer to the video for a DISTIL tutorial based on this blog.

6. Video Explanation

7. Resources

DISTIL Documentation.

https://decile-team-distil.readthedocs.io/en/latest/

Code Repository:

https://github.com/decile-team/distil

Colab Examples:

https://github.com/decile-team/distil#demo-notebooks

Complete Code to the example discussed in the article:

More about Active Learning & DISTIL:

YouTube Playlist:

8. Publications

[1] Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009.

[2] Wang, Dan, and Yi Shang. “A new active labeling method for deep learning.” 2014 International joint conference on neural networks (IJCNN). IEEE, 2014

[3] Kai Wei, Rishabh Iyer, Jeff Bilmes, Submodularity in data subset selection and active learning, International Conference on Machine Learning (ICML) 2015

[4] Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. CoRR, 2019. URL: http://arxiv.org/abs/1906.03671, arXiv:1906.03671.

[5] Sener, Ozan, and Silvio Savarese. “Active learning for convolutional neural networks: A core-set approach.” ICLR 2018.

[6] Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer, GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning, 35th AAAI Conference on Artificial Intelligence, AAAI 2021

[7] Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan, Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision, 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019 Hawaii, USA

[8] Wei, Kai, et al. “Submodular subset selection for large-scale speech training data.” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.

Author:

Apurva Dani

AI Research & Development

DECILE Research Group

--

--