5 Feb 2025

The role of logarithms in deep-learning why and how they matter

In the previous post, I discussed logarithm functions. Here, I will explain why log functions are needed in deep learning and how they are actually used.

Logarithmic functions (log) are essential in Machine Learning (ML) and optimization. They prevent numerical errors, stabilize training, and improve model efficiency.


🔥 1. Understanding Underflow & Overflow Issues

What Is Overflow?

Overflow happens when a number is too large for the computer to store, leading to infinity (inf) or errors.

🔹 Example: Computing e^1000 (which is extremely large):

import numpy as np

big_number = np.exp(1000)  # e^1000 is too large!
print(big_number)  # Output: inf (overflow!)

⚠️ Problem: In ML, functions like softmax involve exponentials, which can cause overflow.

Read more about softmax function here.

🔹 Solution: Using log() reduces the size of large numbers and prevents overflow.


What Is Underflow?

Underflow happens when a number is too small for the computer to represent accurately, leading to zero (0.0) or loss of precision.

🔹 Example: Multiplying very small probabilities:

tiny_number = 1e-300 * 1e-300  # Extremely small number
print(tiny_number)  # Output: 0.0 (underflow!)

⚠️ Problem: Many ML models deal with very small probabilities, leading to underflow.

🔹 Solution: Instead of multiplying probabilities directly, take log to convert multiplication into addition:

log(A × B) = log(A) + log(B)

👉 This keeps computations stable!


🚀 2. How Logarithms Solve Underflow & Overflow Issues

🔹 Instead of multiplying tiny probabilities:

P = P₁ × P₂ × P₃ × ... × Pₙ

🔹 Use logarithms to sum log-probabilities:

log(P) = log(P₁) + log(P₂) + log(P₃) + ... + log(Pₙ)

Why?

  • Prevents underflow & overflow
  • Makes calculations faster & stable

Example: Probability Computation with Logarithms

Without Logarithms (Risk of Underflow)
p1, p2, p3 = 1e-100, 1e-100, 1e-100  # Very small probabilities
result = p1 * p2 * p3  # Underflows to 0.0
print(result)  # Output: 0.0 (incorrect)
With Logarithms (Stable Computation)
import math

log_result = math.log(p1) + math.log(p2) + math.log(p3)
print(log_result)  # A stable sum instead of an unstable product!

🔍 3. Log-Likelihood in Probabilistic Models

In probability-based ML models, we often compute likelihoods:

L = P₁ × P₂ × P₃ × ... × Pₙ

⚠️ Problem: Directly multiplying probabilities can lead to underflow.

Solution: Use log-likelihood instead:

log L = log(P₁) + log(P₂) + log(P₃) + ... + log(Pₙ)

Example: Log-Likelihood Calculation in Python

Used in Naive Bayes, Logistic Regression, and Hidden Markov Models.

import numpy as np

probabilities = np.array([0.9, 0.8, 0.7])
log_likelihood = np.sum(np.log(probabilities))
print(log_likelihood)  # Stable computation

🎯 4. Logarithm-Based Loss Functions in ML

🔥 Cross-Entropy Loss (Used in Classification)

In classification models, we use cross-entropy loss, which is based on logarithms.

🔹 Formula:

Loss = - (y × log(ŷ) + (1 - y) × log(1 - ŷ))

where:

  • y = actual label (0 or 1)
  • ŷ = predicted probability

Why Use Logarithms?

  • Prevents underflow in probabilities
  • Enhances numerical stability
  • Improves training efficiency

Example: Cross-Entropy Loss in PyTorch

import torch.nn.functional as F
import torch

predictions = torch.tensor([[2.0, 1.0, 0.1]])  # Logits (raw scores)
labels = torch.tensor([0])  # True label

loss = F.cross_entropy(predictions, labels)  # Uses log internally
print(loss)

📈 5. Logarithms in Gradient Descent & Optimization

When training ML models, we often minimize a loss function using gradient descent.

🔹 If a function grows exponentially, its derivative can explode, making optimization unstable.
🔹 Taking the log slows down the growth and makes gradient descent smoother.

Example: Logarithmic Smoothing

Consider two loss functions:

  1. Without Log:
f(x) = e^x   (derivative grows exponentially) 🚀  
  1. With Log:
f(x) = log(x)   (derivative grows slowly) ✅  

👉 Log functions help prevent the "exploding gradient" problem in deep learning!


🔢 6. Logarithms for Feature Scaling in ML

Some ML models perform better when input features are scaled.

🔹 Why Use Log Scaling?

  • Large values can dominate small values in models like Linear Regression, Decision Trees, and Neural Networks.
  • Taking log() makes large values smaller and more manageable.

Example: Log-Scaling Features

Before:

data = np.array([1000, 10000, 100000])

After applying log():

log_scaled = np.log(data)
print(log_scaled)  # The values are now much smaller and balanced!

👉 Used in finance (stock prices), biology (population growth), and NLP (word frequencies).


🔑 Final Summary: Why Logs Matter in ML & Optimization

Prevent underflow & overflow (probability calculations)
Simplify mathematical expressions (log-likelihood, multiplication → addition)
Improve stability in loss functions (cross-entropy, MLE)
Help optimization algorithms (gradient descent, smooth learning)
Feature scaling (handle large numbers efficiently)

🚀 Without logs, many ML models would be unstable or inefficient! 🚀

All rights reserved to Ahmad Mayahi