Semi-supervised learning, active learning and deep learning for classification

Final edit with all resources updated:

For a project, I am applying machine learning algorithms for classification.

Quite limited labeled data and much more unlabeled data.


  1. Apply semi-supervised classification
  2. Apply a somehow semi-supervised labeling process (known as active learning)

I’ve found a lot of information from research papers, like applying EM, Transductive SVM or S3VM (Semi Supervised SVM), or somehow using LDA, etc. Even there are few books on this topic.

Where are the implementations and practical sources?

Final update (based on helps provided by mpiktas, bayer, and Dikran Marsupial)

Semi-supervised learning:

Active learning:

  • Dualist: an implementation of active learning with source code on text classification
  • This webpage serves a wonderful overview of active learning.
  • An experimental Design workshop: here.

Deep learning:


It seems as if deep learning might be very interesting for you. This is a very recent field of deep connectionist models which are pretrained in an unsupervised way and fine tuned afterwards with supervision. The fine tuning requires a much less samples than the pretraining.

To wet your tongue, I recommend [Semantig Hashing Salakhutdinov, Hinton. Have a look at the codes this finds for distinct documents of the Reuters corpus: (unsupervised!)

enter image description here

If you need some code implemented, check out I don’t believe there are out of the box solutions, though.

Source : Link , Question Author : Flake , Answer Author : bayerj

Leave a Comment