Five ML Concepts - #15 | Software Wrighter Lab Blog

Five ML concepts in under 30 seconds each: Perplexity (how surprised by data), Catastrophic Forgetting (new learning erases old), Weight Initialization (starting values matter), Curse of Dimensionality (high-D makes data sparse), Monitoring (track performance and drift).

5 machine learning concepts. Under 30 seconds each.

Resource	Link
Papers	Links in References section
Video	Five ML Concepts #15
Comments	Discord

References

Concept	Reference
Perplexity	A Neural Probabilistic Language Model (Bengio et al. 2003)
Catastrophic Forgetting	Overcoming Catastrophic Forgetting in Neural Networks (Kirkpatrick et al. 2017)
Weight Initialization	Understanding the Difficulty of Training Deep Feedforward Neural Networks (Glorot & Bengio 2010)
Curse of Dimensionality	The Elements of Statistical Learning (Hastie et al. 2009), Chapter 2
Monitoring & Drift	Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift (Rabanser et al. 2019)

Today’s Five

1. Perplexity

A metric for language models that reflects how well the model predicts the next token. Lower perplexity means better predictive performance.

Perplexity is the exponentiated average negative log-likelihood per token.

Like a test where lower scores mean you found the answers easier to guess.

2. Catastrophic Forgetting

When training on new tasks causes a model to lose performance on previously learned tasks. This is a key challenge in continual learning.

Techniques like elastic weight consolidation help preserve important weights.

Like learning a new phone number and forgetting the old one.

3. Weight Initialization

The starting values of model weights influence how well training progresses. Poor initialization can cause vanishing or exploding gradients.

Xavier and He initialization are common strategies for setting initial weights appropriately.

Like starting a race from a good position instead of stuck in a ditch.

4. Curse of Dimensionality

In high-dimensional spaces, data becomes sparse and distances behave differently, making learning harder. Points that seem close in low dimensions can be far apart in high dimensions.

Feature selection and dimensionality reduction help mitigate this effect.

Like searching for a friend in a city versus across the entire universe.

5. Monitoring & Drift Detection

Continuously tracking model performance and detecting shifts in input data distributions. Production models can degrade silently without proper monitoring.

Automated alerts help catch problems before they affect users.

Like a weather station alerting you when conditions change.

Quick Reference

Concept	One-liner
Perplexity	How surprised the model is by the data
Catastrophic Forgetting	New learning erases old knowledge
Weight Initialization	Starting values affect training stability
Curse of Dimensionality	High dimensions make data sparse
Monitoring & Drift	Track performance and data changes

Short, accurate ML explainers. Follow for more.