Five ML Concepts - #26 | Software Wrighter Lab Blog

Five ML concepts in under 30 seconds each: Data Augmentation (expanding training data with transformations), Caching Strategies (reducing latency by reusing computation), Constitutional AI (training models to follow explicit principles), Goodhart's Law (optimizing metrics distorts objectives), Manifold Hypothesis (data lies on lower-dimensional structures).

5 machine learning concepts. Under 30 seconds each.

Resource	Link
Papers	Links in References section
Video	Five ML Concepts #26
Comments	Discord

References

Concept	Reference
Data Augmentation	A survey on Image Data Augmentation (Shorten & Khoshgoftaar 2019)
Caching Strategies	Systems engineering practice (no canonical paper)
Constitutional AI	Constitutional AI: Harmlessness from AI Feedback (Bai et al. 2022)
Goodhart’s Law	Goodhart’s Law and Machine Learning (Sevilla et al. 2022)
Manifold Hypothesis	An Introduction to Variational Autoencoders (Kingma & Welling 2019)

Today’s Five

1. Data Augmentation

Creating additional training examples using label-preserving transformations. Rotate, flip, crop, or color-shift images without changing what they represent.

Effectively increases dataset size and improves generalization.

Like practicing piano pieces at different tempos to build flexibility.

2. Caching Strategies

Storing previous computation results to reduce repeated work and latency. Cache embeddings, KV states, or frequently requested outputs.

Essential for production inference at scale.

Like keeping frequently used books on your desk instead of the library.

3. Constitutional AI

Training models to follow explicit written principles alongside other alignment methods. The constitution provides clear rules for behavior.

Models critique and revise their own outputs against these principles.

Like giving someone written house rules instead of vague instructions.

4. Goodhart’s Law

When a measure becomes a target, it can stop being a good measure. Optimizing for a proxy metric can diverge from the true objective.

A core challenge in reward modeling and evaluation design.

Like studying only for the test instead of learning the subject.

5. Manifold Hypothesis

The idea that real-world data lies on lower-dimensional structures within high-dimensional space. Images of faces don’t fill all possible pixel combinations.

This structure is what representation learning exploits.

Like faces varying along a few key features instead of every pixel independently.

Quick Reference

Concept	One-liner
Data Augmentation	Expanding training data with transformations
Caching Strategies	Reducing latency by reusing computation
Constitutional AI	Training models to follow explicit principles
Goodhart’s Law	Optimizing metrics distorts objectives
Manifold Hypothesis	Data lies on lower-dimensional structures

Short, accurate ML explainers. Follow for more.