Five ML Concepts - #6
491 words • 3 min read • Abstract

5 machine learning concepts. Under 30 seconds each.
| Resource | Link |
|---|---|
| Papers | Links in References section |
| Video | Five ML Concepts #6![]() |
References
| Concept | Reference |
|---|---|
| Regularization | Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Srivastava et al. 2014) |
| BERT | BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al. 2018) |
| RoPE | RoFormer: Enhanced Transformer with Rotary Position Embedding (Su et al. 2021) |
| Prompting | Language Models are Few-Shot Learners (Brown et al. 2020) |
| Positional Encoding | Attention Is All You Need (Vaswani et al. 2017) |
Today’s Five
1. Regularization
Techniques that reduce overfitting by adding constraints or penalties during training. Common examples include L2 weight decay, L1 sparsity, dropout, and early stopping.
The goal is better generalization, not just fitting the training set.
Like adding friction so a model can’t take the easiest overfit path.
2. BERT
Bidirectional Encoder Representations from Transformers. A transformer encoder trained with masked language modeling: predicting hidden tokens using context from both sides.
It was a major step forward for many NLP tasks after its 2018 release.
Like filling in blanks by reading the whole sentence, not just the words before it.
3. RoPE (Rotary Positional Embeddings)
A way to represent token position inside attention by rotating query and key vectors as a function of position. This gives attention information about relative order and distance.
It’s widely used in modern transformer models.
Like turning a dial differently for each position so the model can tell where tokens are.
4. Prompting
Crafting inputs to steer a model toward the output you want. Small changes in instructions, examples, or format can change behavior significantly.
A key skill for working effectively with language models.
Like asking a question in just the right way to get a useful answer.
5. Positional Encoding
Transformers need a way to represent token order, because attention alone doesn’t include sequence position. Different methods do this, including learned embeddings and rotary approaches like RoPE.
Without it, “the cat sat on the mat” would be indistinguishable from “mat the on sat cat the.”
Like numbering the pages of a shuffled book so you can read them in order.
Quick Reference
| Concept | One-liner |
|---|---|
| Regularization | Add constraints to prevent overfitting |
| BERT | Bidirectional masked language modeling |
| RoPE | Position info via rotation in attention |
| Prompting | Craft inputs to steer model outputs |
| Positional Encoding | Tell the model where tokens are in sequence |
Short, accurate ML explainers. Follow for more.
Part 6 of the Five ML Concepts series. View all parts | Next: Part 7 →
