The role of logarithms in deep-learning why and how they matter
In the previous post, I discussed logarithm functions. Here, I will explain why log functions are needed in deep learning and how they are actually used.
Logarithmic functions (log
) are essential in Machine Learning (ML) and optimization. They prevent numerical errors, stabilize training, and improve model efficiency.
🔥 1. Understanding Underflow & Overflow Issues
✅ What Is Overflow?
Overflow happens when a number is too large for the computer to store, leading to infinity (inf
) or errors.
🔹 Example: Computing e^1000
(which is extremely large):
import numpy as np
big_number = np.exp(1000) # e^1000 is too large!
print(big_number) # Output: inf (overflow!)
⚠️ Problem: In ML, functions like softmax involve exponentials, which can cause overflow.
Read more about softmax function here.
🔹 Solution: Using log()
reduces the size of large numbers and prevents overflow.
✅ What Is Underflow?
Underflow happens when a number is too small for the computer to represent accurately, leading to zero (0.0
) or loss of precision.
🔹 Example: Multiplying very small probabilities:
tiny_number = 1e-300 * 1e-300 # Extremely small number
print(tiny_number) # Output: 0.0 (underflow!)
⚠️ Problem: Many ML models deal with very small probabilities, leading to underflow.
🔹 Solution: Instead of multiplying probabilities directly, take log to convert multiplication into addition:
log(A × B) = log(A) + log(B)
👉 This keeps computations stable! ✅
🚀 2. How Logarithms Solve Underflow & Overflow Issues
🔹 Instead of multiplying tiny probabilities:
P = P₁ × P₂ × P₃ × ... × Pₙ
🔹 Use logarithms to sum log-probabilities:
log(P) = log(P₁) + log(P₂) + log(P₃) + ... + log(Pₙ)
✅ Why?
- Prevents underflow & overflow
- Makes calculations faster & stable
✅ Example: Probability Computation with Logarithms
❌ Without Logarithms (Risk of Underflow)
p1, p2, p3 = 1e-100, 1e-100, 1e-100 # Very small probabilities
result = p1 * p2 * p3 # Underflows to 0.0
print(result) # Output: 0.0 (incorrect)
✅ With Logarithms (Stable Computation)
import math
log_result = math.log(p1) + math.log(p2) + math.log(p3)
print(log_result) # A stable sum instead of an unstable product!
🔍 3. Log-Likelihood in Probabilistic Models
In probability-based ML models, we often compute likelihoods:
L = P₁ × P₂ × P₃ × ... × Pₙ
⚠️ Problem: Directly multiplying probabilities can lead to underflow.
✅ Solution: Use log-likelihood instead:
log L = log(P₁) + log(P₂) + log(P₃) + ... + log(Pₙ)
✅ Example: Log-Likelihood Calculation in Python
Used in Naive Bayes, Logistic Regression, and Hidden Markov Models.
import numpy as np
probabilities = np.array([0.9, 0.8, 0.7])
log_likelihood = np.sum(np.log(probabilities))
print(log_likelihood) # Stable computation
🎯 4. Logarithm-Based Loss Functions in ML
🔥 Cross-Entropy Loss (Used in Classification)
In classification models, we use cross-entropy loss, which is based on logarithms.
🔹 Formula:
Loss = - (y × log(ŷ) + (1 - y) × log(1 - ŷ))
where:
-
y
= actual label (0 or 1) -
ŷ
= predicted probability
✅ Why Use Logarithms?
- Prevents underflow in probabilities
- Enhances numerical stability
- Improves training efficiency
✅ Example: Cross-Entropy Loss in PyTorch
import torch.nn.functional as F
import torch
predictions = torch.tensor([[2.0, 1.0, 0.1]]) # Logits (raw scores)
labels = torch.tensor([0]) # True label
loss = F.cross_entropy(predictions, labels) # Uses log internally
print(loss)
📈 5. Logarithms in Gradient Descent & Optimization
When training ML models, we often minimize a loss function using gradient descent.
🔹 If a function grows exponentially, its derivative can explode, making optimization unstable.
🔹 Taking the log slows down the growth and makes gradient descent smoother.
✅ Example: Logarithmic Smoothing
Consider two loss functions:
- Without Log:
f(x) = e^x (derivative grows exponentially) 🚀
- With Log:
f(x) = log(x) (derivative grows slowly) ✅
👉 Log functions help prevent the "exploding gradient" problem in deep learning!
🔢 6. Logarithms for Feature Scaling in ML
Some ML models perform better when input features are scaled.
🔹 Why Use Log Scaling?
- Large values can dominate small values in models like Linear Regression, Decision Trees, and Neural Networks.
- Taking log() makes large values smaller and more manageable.
✅ Example: Log-Scaling Features
Before:
data = np.array([1000, 10000, 100000])
After applying log()
:
log_scaled = np.log(data)
print(log_scaled) # The values are now much smaller and balanced!
👉 Used in finance (stock prices), biology (population growth), and NLP (word frequencies).
🔑 Final Summary: Why Logs Matter in ML & Optimization
✅ Prevent underflow & overflow (probability calculations)
✅ Simplify mathematical expressions (log-likelihood, multiplication → addition)
✅ Improve stability in loss functions (cross-entropy, MLE)
✅ Help optimization algorithms (gradient descent, smooth learning)
✅ Feature scaling (handle large numbers efficiently)
🚀 Without logs, many ML models would be unstable or inefficient! 🚀