In supervised learning, how do you estimate how computationally expensive is traning and prediction? That is important because depending on that you make different choices in terms of the approach.

Estimating Computational Cost in Supervised Learning

To make informed choices about model selection, hardware requirements, and scalability, you need to estimate the computational cost for both training and prediction (inference). The cost depends on several factors, including data size, model complexity, and hardware.


1. Training Cost Estimation

The computational cost of training depends on:

Key Factors

  • Dataset Size (NNN) → Number of training examples.
  • Feature Dimensionality (ddd) → Number of input features per sample.
  • Model Complexity → Number of parameters or layers in the model.
  • Number of Training Iterations (TTT) → Epochs or convergence time.
  • Computational Complexity of Model (O(f(N,d))O(f(N, d))O(f(N,d))) → How the cost scales with inputs and parameters.

Rough Complexity Estimates for Different Models

Model TypeComplexity per Training Step
Linear Regression / Logistic RegressionO(Nd)O(Nd)O(Nd)
Decision TreesO(Ndlog⁡N)O(Nd \log N)O(NdlogN)
Random Forest (with kkk trees)O(kNdlog⁡N)O(k N d \log N)O(kNdlogN)
SVM (linear kernel)O(N2d)O(N^2 d)O(N2d) (for solving quadratic programming)
SVM (RBF kernel)O(N3)O(N^3)O(N3)
Neural NetworksO(NdL)O(N d L)O(NdL) per layer, where LLL is the number of layers
Transformer Models (e.g., BERT, GPT)O(Nd2)O(N d^2)O(Nd2) per layer due to self-attention

How to Estimate Training Time

  1. Measure a single batch → Run a forward and backward pass on a sample batch and time it.
  2. Scale up → Multiply by (number of batches per epoch) × (number of epochs).
  3. Consider hardware → GPU vs. CPU significantly affects performance.

2. Prediction (Inference) Cost Estimation

Inference cost is often different from training cost because:

  • Some operations (like backpropagation) are only used in training.
  • The batch size for inference is often smaller (sometimes just one sample).

Key Factors

  • Model Size → Number of parameters affects memory requirements.
  • Feature Complexity → High-dimensional inputs increase processing time.
  • Computational Complexity → How expensive each prediction is.

Complexity Estimates for Prediction

Model TypeComplexity per Prediction
Linear ModelsO(d)O(d)O(d)
Decision TreesO(log⁡N)O(\log N)O(logN) (depth of the tree)
Random Forest (k trees)O(klog⁡N)O(k \log N)O(klogN)
SVM (Linear)O(d)O(d)O(d)
SVM (RBF Kernel, kkk support vectors)O(kd)O(kd)O(kd)
Neural Networks (L layers, d neurons per layer)O(Ld2)O(Ld^2)O(Ld2)
Transformer Models (BERT, GPT)O(d2)O(d^2)O(d2) per layer

How to Estimate Inference Time

  1. Time a single prediction using a sample input.
  2. Measure batch inference if running in production.
  3. Optimize using quantization, pruning, or distillation if needed.

3. Making Design Decisions Based on Computational Cost

ScenarioRecommended Approach
Low computational budgetUse simple models (linear/logistic regression, small decision trees)
Real-time inference requiredUse fast models (decision trees, pruned neural networks)
Large dataset with high-dimensional featuresUse scalable methods (SGD, mini-batch training)
Limited memory (e.g., edge devices, mobile)Use quantized or distilled models
High-accuracy requirement, less concern for speedUse deep learning or ensembles

Would you like me to estimate the computational cost for a specific model you are considering?

This might be confusing, because among the criteria that you use to determine the approach that you want to follow, there are considerations about how much computationally expensive is the training and the prediction. For example, low computational resources lead to choices like decision trees, or logistic regression, whereas high computational resources lead to choices like neural networks. So we are in a loop: to identify the best approach we need to make considerations about computational resources, but to make those considerations, we need to know the approach.

You’re absolutely right—there’s a circular dependency in choosing the best approach:

  • We need to estimate computational cost to select a model.
  • But we need to know the model to estimate computational cost.

Breaking the Loop: A Step-by-Step Strategy

To resolve this, we can use an iterative approximation approach:


1. Start with Baseline Estimates

Before choosing a model, approximate computational constraints based on:

  • Available hardware (e.g., CPU, GPU, memory).
  • Expected data size (small = <100K rows, medium = ~1M rows, large = 100M+).
  • Real-time vs. batch processing needs.

Example Baseline Estimates

Resource ConstraintsLikely Feasible Models
Low resources (CPU-only, small dataset, real-time)Logistic Regression, Decision Trees
Moderate resources (multi-core CPU, medium dataset)Random Forest, SVM (linear)
High resources (GPU, large dataset, batch processing)Deep Learning (CNNs, Transformers)

👉 This gives an initial guess of what’s feasible without calculating exact costs.


2. Estimate Model Complexity for Training & Inference

Once you have a rough idea, estimate big-O complexity and empirical benchmarks for:

  • Training time (important for large datasets).
  • Inference time (critical for real-time systems).

Simplified Complexity Estimates

ModelTraining CostPrediction Cost
Logistic RegressionO(Nd)O(Nd)O(Nd)O(d)O(d)O(d)
Decision TreesO(Ndlog⁡N)O(Nd \log N)O(NdlogN)O(log⁡N)O(\log N)O(logN)
Random ForestO(kNdlog⁡N)O(k N d \log N)O(kNdlogN)O(klog⁡N)O(k \log N)O(klogN)
SVM (Linear)O(N2d)O(N^2 d)O(N2d)O(d)O(d)O(d)
SVM (RBF)O(N3)O(N^3)O(N3)O(kd)O(kd)O(kd)
Neural Networks (Deep)O(NdL)O(N d L)O(NdL)O(Ld2)O(Ld^2)O(Ld2)
TransformerO(Nd2L)O(N d^2 L)O(Nd2L)O(d2L)O(d^2 L)O(d2L)

👉 This helps refine your choice before full implementation.


3. Run Small-Scale Experiments

Once you narrow down options:

  • Train a small version of the model on a subset of data.
  • Measure training time and inference time per sample.
  • Scale up estimates using data growth trends.

Example: Estimating Training Time

  1. Train on 10% of the dataset.
  2. Measure training time T10%T_{10\%}T10%​.
  3. Extrapolate for full dataset: Tfull=T10%×NfullN10%T_{\text{full}} = T_{10\%} \times \frac{N_{\text{full}}}{N_{10\%}}Tfull​=T10%​×N10%​Nfull​​
  4. If it’s too slow, move to a simpler model.

👉 This avoids committing to an expensive model too early.


4. Optimize if Needed

If the chosen model is too slow:

  • Reduce feature size (ddd): Use PCA, feature selection.
  • Reduce dataset size (NNN): Use sampling or mini-batch training.
  • Use hardware acceleration (GPUs, TPUs).
  • Switch to a more efficient model if necessary.

Final Thought: Iterative Refinement

🔁 Instead of choosing a model upfront, you can iteratively refine:

  1. Estimate available resources.
  2. Select a model class (simple, moderate, complex).
  3. Estimate complexity + run small tests.
  4. Adjust model choice based on empirical results.

Would you like help estimating for a specific use case you’re working on? 🚀

Preceding question:

What thought process should I follow to identify the best supervised learning approach?


Leave a Reply

Your email address will not be published. Required fields are marked *