β€” 6 Feb 2025

Entropy: Measuring Uncertainty in Data

Entropy is a key concept in Machine Learning and Deep Learning. It measures uncertainty and helps in decision trees, classification models, and loss functions like cross-entropy.

A Decision Tree is like a flowchart where each decision splits the data into smaller groups based on conditions. It’s used for both classification (e.g., "Is this email spam?") and regression (e.g., "How much will this house sell for?"). At each step, it picks the best feature to split the data, creating branches. The process continues until a stopping condition (like max depth) is reached.

A classification model is a type of machine learning model that categorizes data into classes or labels. It answers "What category does this belong to?"β€”like classifying emails as spam or not spam, detecting cats vs. dogs, or identifying fraudulent transactions.

🧠 1. What is Entropy?

Entropy tells us how unpredictable something is:

  • Low entropy = more predictable (e.g., always sunny 🌞).
  • High entropy = less predictable (e.g., random weather β˜€οΈπŸŒ§οΈβ„οΈ).

🎯 Example: Coin Flip

  • A fair coin (50% heads, 50% tails) has high entropy.
  • A biased coin (90% heads, 10% tails) has lower entropy (more predictable).

πŸ“ 2. How is Entropy Calculated?

The formula for entropy is:

H = - βˆ‘ P(x) * logβ‚‚(P(x))

Where:

  • H = Entropy (uncertainty)
  • P(x) = Probability of an event happening
  • logβ‚‚(P(x)) = Logarithm base 2

πŸ“ 3. Calculating Entropy with NumPy

πŸ”Ή Fair Coin (Maximum Entropy)

import numpy as np

p = np.array([0.5, 0.5])  # Probability of heads & tails
entropy = -np.sum(p * np.log2(p))

print(f"Entropy of a fair coin: {entropy:.2f} bits")  # Output: 1.00

βœ… Entropy = 1.0 bit (maximum uncertainty).

πŸ”Ή Biased Coin (Lower Entropy)

p = np.array([0.9, 0.1])  # 90% heads, 10% tails
entropy = -np.sum(p * np.log2(p))

print(f"Entropy of a biased coin: {entropy:.2f} bits")  # Output: ~0.47

βœ… Lower entropy means less randomness!

🌳 4. Entropy in Decision Trees (Feature Selection)

In Decision Trees, entropy helps decide which feature is the best to split on.

πŸ”Ή Example: Should we play outside?

Weather Play? (Yes/No)
Sunny Yes
Sunny Yes
Rainy No
Cloudy Yes
Rainy No

If we calculate entropy for different weather types, we pick the feature that reduces entropy the most!

πŸ‘‰ Lower entropy = better decision-making!

πŸ“ˆ 5. Entropy in Deep Learning (Cross-Entropy Loss)

In deep learning, cross-entropy loss is used for classification (e.g., predicting if an image is a cat or dog 🐱🐢).

πŸ”Ή Formula:

Loss = - (y * log(Ε·) + (1 - y) * log(1 - Ε·))

Where:

  • y = True label (0 or 1)
  • Ε· = Predicted probability

βœ… Cross-entropy measures how well the model's predictions match the true labels.

πŸ“ 6. Cross-Entropy Loss in PyTorch

πŸ”Ή Example: Predicting Cat vs. Dog

import torch
import torch.nn.functional as F

# Logits (raw model outputs)
logits = torch.tensor([[2.0, 1.0, 0.1]])  # Higher score for class 0 (cat)
labels = torch.tensor([0])  # True label is class 0 (cat)

# Cross-entropy loss
loss = F.cross_entropy(logits, labels)
print(f"Cross-Entropy Loss: {loss.item():.4f}")  # Lower loss = better prediction

βœ… Lower loss means the model is making good predictions!

🎯 7. Why is Entropy Important?

βœ… Measures uncertainty in data
βœ… Helps decision trees find the best splits
βœ… Used in classification models for training
βœ… Optimizing entropy improves model performance

πŸš€ Without entropy, ML models wouldn't know how to handle uncertainty! πŸš€

All rights reserved to Ahmad Mayahi