Definition

A classification model is a machine learning model that predicts a categorical outcome (class label) based on input features.

  • Output: discrete classes (e.g., spam vs not spam, disease vs no disease).
  • Unlike regression (predicts continuous values), classification predicts which group an observation belongs to.

Types of Classification

  1. Binary Classification
    • Two classes only (0/1, yes/no).
    • Examples:
      • Email → spam or not
      • Patient → sick or healthy
  2. Multiclass Classification
    • More than two classes.
    • Examples:
      • Handwritten digit recognition (0–9)
      • Iris flower classification (Setosa, Versicolor, Virginica)
  3. Multilabel Classification
    • Each instance can belong to multiple classes simultaneously.
    • Example: A movie can be tagged as [Comedy, Romance, Drama].

Common Classification Models

  1. Logistic Regression
    • Despite the name, used for classification.
    • Outputs probability via sigmoid (binary) or softmax (multiclass).
  2. Decision Trees
    • Rule-based splits on features.
    • Easy to interpret, but prone to overfitting.
  3. Random Forests
    • Ensemble of decision trees.
    • More stable and accurate than a single tree.
  4. Support Vector Machines (SVM)
    • Finds the hyperplane that best separates classes.
    • Works well in high dimensions.
  5. k-Nearest Neighbors (kNN)
    • Assigns label based on the majority class of nearest neighbors.
    • Simple, but slow on large datasets.
  6. Naïve Bayes
    • Probabilistic model based on Bayes’ theorem.
    • Works well with text classification (spam detection).
  7. Neural Networks (Deep Learning)
    • Powerful for image, speech, text classification.
    • Use nonlinear transformations and softmax outputs.

Evaluation Metrics

Since accuracy alone can be misleading (especially with imbalanced data), we use:

  • Confusion Matrix (TP, TN, FP, FN)
  • Accuracy: % of correct predictions
  • Precision: % of predicted positives that are correct
  • Recall (Sensitivity): % of actual positives correctly predicted
  • F1 Score: Harmonic mean of precision and recall
  • ROC-AUC / PR-AUC: Curve-based metrics for probability-based classifiers

Applications

  • Healthcare: Predict disease presence from symptoms.
  • Finance: Fraud detection, credit risk modeling.
  • Marketing: Customer churn prediction, ad click prediction.
  • NLP: Spam detection, sentiment analysis.
  • Computer Vision: Object recognition, facial detection.

In short:
Classification models predict categories (binary, multiclass, or multilabel).
They range from simple (logistic regression) to complex (deep neural networks), and are evaluated using metrics like precision, recall, and AUC.