Five ML Concepts - #2 | Software Wrighter Lab Blog

Five ML concepts in under 30 seconds each: Gradient Descent (walk downhill to minimize error), Attention (focus on what matters), DPO (align from preference pairs), Learning Rate (step size tradeoff), Temperature (dial between predictable and creative).

5 machine learning concepts. Under 30 seconds each.

Resource	Link
Papers	Links in References section
Video	Five ML Concepts #2
Comments	Discord

References

Concept	Reference
Gradient Descent	An overview of gradient descent optimization algorithms (Ruder 2016)
Attention	Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et al. 2014)
DPO	Direct Preference Optimization (Rafailov et al. 2023)
Learning Rate	Cyclical Learning Rates (Smith 2015)
Temperature	On the Properties of Neural Machine Translation (Cho et al. 2014)

Today’s Five

1. Gradient Descent

A general optimization method used across machine learning. It improves a model by taking small steps in the direction that reduces error the most.

Many learning algorithms rely on it, especially neural networks.

Like walking downhill in fog, adjusting each step based on the slope beneath your feet.

2. Attention

A mechanism that lets models weigh different parts of the input by importance. Instead of treating everything equally, attention highlights what matters most.

This was key to breakthroughs in translation and language models.

Like reading a sentence and focusing more on the important words.

3. DPO (Direct Preference Optimization)

A method for aligning language models with human preferences. Unlike RLHF, it trains directly on preference comparisons and avoids an explicit reward model.

This simplifies training while achieving comparable alignment.

Like learning preferences by observing choices, not by designing a scoring system.

4. Learning Rate

Controls how large each update step is during training. Too large and learning becomes unstable. Too small and training is slow or gets stuck.

One of the most important hyperparameters to tune.

Like choosing how fast to walk downhill without losing balance.

5. Temperature

A parameter that controls randomness during text generation. Low temperature favors predictable, high-probability outputs. Higher temperature increases variety and surprise.

A tradeoff between consistency and creativity.

Like adjusting a dial from cautious to adventurous.

Quick Reference

Concept	One-liner
Gradient Descent	Walk downhill to minimize error
Attention	Focus on what matters in the input
DPO	Align models from preference pairs directly
Learning Rate	Step size that balances speed and stability
Temperature	Dial between predictable and creative

Short, accurate ML explainers. Follow for more.