Five ML Concepts - #11 | Software Wrighter Lab Blog

Five ML concepts in under 30 seconds each: RNN (sequential processing with memory), Chain of Thought (step-by-step reasoning), Softmax (scores to probabilities), MoE (route inputs to specialists), Distribution Shift (training vs deployment mismatch).

5 machine learning concepts. Under 30 seconds each.

Resource	Link
Papers	Links in References section
Video	Five ML Concepts #11
Comments	Discord

References

Concept	Reference
RNN	Learning representations by back-propagating errors (Rumelhart et al. 1986)
Chain of Thought	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al. 2022)
Softmax	Deep Learning (Goodfellow et al. 2016), Chapter 6
MoE	Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (Shazeer et al. 2017)
Distribution Shift	Dataset Shift in Machine Learning (Quiñonero-Candela et al. 2009)

Today’s Five

1. RNN (Recurrent Neural Network)

Networks designed for sequential data that maintain a hidden state carrying information across time steps. This makes them useful for language, time series, and audio.

LSTMs (Long Short-Term Memory) and GRUs (Gated Recurrent Units) are improved variants that better handle long-range dependencies.

Like reading a story while keeping mental notes about characters and plot as you go.

2. Chain of Thought

A prompting technique that encourages step-by-step reasoning in language models. Instead of producing an answer immediately, the model generates intermediate steps.

This can improve performance on math, logic, and multi-step problems.

Like showing your work on a math test instead of just writing the final answer.

3. Softmax

Converts a vector of scores into a probability distribution where each output falls between zero and one, and all outputs sum to one. It is commonly used in classification models.

Softmax makes raw scores easier to interpret as probabilities.

Like turning test scores into percentages that add up to 100%.

4. MoE (Mixture of Experts)

Instead of one large network, the model contains many smaller expert networks with a routing mechanism that selects which experts process each input. This allows models to scale capacity while keeping computation efficient.

Only a subset of experts activates for any given input.

Like a hospital with specialists where a receptionist directs you to the right doctor.

5. Distribution Shift

Occurs when deployment data differs from training data, causing a model trained on one environment to perform poorly in another. Common causes include seasonal changes, user behavior shifts, or new populations.

Monitoring for drift and retraining helps maintain performance.

Like a weather model trained on summer data struggling to predict winter storms.

Quick Reference

Concept	One-liner
RNN	Sequential processing with memory across time
Chain of Thought	Step-by-step reasoning in prompts
Softmax	Scores to normalized probabilities
MoE	Route inputs to specialized experts
Distribution Shift	Training vs deployment data mismatch

Short, accurate ML explainers. Follow for more.