Contrastive MNIST Experiments

Goal

Goal: obtaining a smooth 2D latent representation of MNIST images that clusters naturally by digits, without using the labels (SSL).

Performed a few simple experiments focusing on the following questions:

What augmentations are conducive to a meaningful latent space?
Contrastive loss vs reconstructive loss
Various other parameters: temperature, batch size, regularisations on latent space, ...

Evaluation by visual inspection of latent space and knn accuracy on a val set with k = 1, 5, 20

Key Findings

Contrastive vs. reconstructive loss: contrastive loss works much better. Uses latent space more efficiently, and has better performance on knn.
- Interpretation: reconstruction requires encoding a lot more info in latent space than would be necessary for class identity, which we care about.
Subtle augmentations that mimic "natural" variation between digits work better than more "artificial" ones.
- Interpretation: data augmentation allows us to guide the model to learn invariances. If the invariances correspond to within-label variance, the representation gravitates towards clustering by labels.
L2 regularisation improves knn performance
- Interpretation: L2 regularisation forces the model to cluster more tightly, improving global coherence of the representation.

Examples / Figures

Latent Space (Best Config)

Clear clusters of similar digits. For example, the "1" (orange) cluster is well separated. Local alignment is strong, though the global structure still has room for improvement.

Augmentations in the Best Config

Original images are shown in the left-most column; remaining columns display applied augmentations.

Training Behaviour (Best Config)

Training Behaviour (Bad config - unsuitable augments)

Augmentation Examples (Bad config - unsuitable augments)

Strong positional translations didn't work well. Presumably, this is due to MNIST images being well-centered, so the generated variation is not useful for our goal.

Next Steps

might be interesting to find ways to perturb the latent space, ideally leading to more global coherence. unclear how though.
maybe some VAE-inspired distribution embedding per image, instead of a single point?

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
figures		figures
scripts		scripts
.gitignore		.gitignore
README.md		README.md
labjournal.md		labjournal.md
proj.md		proj.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contrastive MNIST Experiments

Goal

Key Findings

Examples / Figures

Latent Space (Best Config)

Augmentations in the Best Config

Training Behaviour (Best Config)

Training Behaviour (Bad config - unsuitable augments)

Augmentation Examples (Bad config - unsuitable augments)

Next Steps

About

Uh oh!

Releases

Packages

Languages

luciensc/ContrastiveMNIST

Folders and files

Latest commit

History

Repository files navigation

Contrastive MNIST Experiments

Goal

Key Findings

Examples / Figures

Latent Space (Best Config)

Augmentations in the Best Config

Training Behaviour (Best Config)

Training Behaviour (Bad config - unsuitable augments)

Augmentation Examples (Bad config - unsuitable augments)

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages