Five ML Concepts - #29 | Software Wrighter Lab Blog

Five ML concepts in under 30 seconds each: Neural Collapse (late-stage geometric convergence of class representations), Grokking (sudden generalization after prolonged memorization), SAM (optimizing for flat loss regions under perturbations), Mechanistic Interpretability (analyzing internal circuits of neural networks), Self-Training Instability (feedback loops that amplify errors in self-generated data).

5 machine learning concepts. Under 30 seconds each.

Resource	Link
Papers	Links in References section
Video	Five ML Concepts #29
Comments	Discord

References

Concept	Reference
Neural Collapse	Prevalence of Neural Collapse (Papyan et al. 2020)
Grokking	Grokking: Generalization Beyond Overfitting (Power et al. 2022)
SAM	Sharpness-Aware Minimization (Foret et al. 2021)
Mechanistic Interpretability	Transformer Circuits (Anthropic 2021)
Self-Training Instability	Understanding Self-Training (Wei et al. 2020)

Today’s Five

1. Neural Collapse

In overparameterized networks trained to zero loss, class representations converge late in training to a symmetric, maximally separated structure. The last-layer features and classifiers align into a simplex equiangular tight frame.

This geometric phenomenon appears universally across architectures.

Like students settling into evenly spaced seats by the end of class.

2. Grokking

In some tasks, especially small algorithmic ones, models memorize quickly but only later suddenly generalize. The jump from memorization to understanding can happen long after training loss reaches zero.

Weight decay and longer training appear necessary for this phase transition.

Like cramming facts for an exam, then later realizing you truly understand.

3. SAM (Sharpness-Aware Minimization)

Instead of minimizing loss at a single point, SAM minimizes loss under small weight perturbations, finding flatter regions. Flatter minima tend to generalize better than sharp ones.

The optimizer seeks robustness to parameter noise.

Like choosing a wide hilltop instead of balancing on a sharp peak.

4. Mechanistic Interpretability

Researchers analyze activations and internal circuits to understand how specific computations are implemented inside models. The goal is reverse-engineering neural networks into understandable components.

This reveals attention heads, induction heads, and other interpretable patterns.

Like mapping the wiring of an unknown machine to see how it works.

5. Self-Training Instability

When models train on their own generated data, feedback loops can amplify small errors over time. Each iteration compounds mistakes, causing distributional drift.

Careful filtering and external grounding help mitigate this.

Like copying a copy repeatedly until the meaning drifts.

Quick Reference

Concept	One-liner
Neural Collapse	Late-stage geometric convergence of class representations
Grokking	Sudden generalization after prolonged memorization
SAM	Optimizing for flat loss regions under perturbations
Mechanistic Interpretability	Analyzing internal circuits of neural networks
Self-Training Instability	Feedback loops that amplify errors in self-generated data

Short, accurate ML explainers. Follow for more.