Definition
A classification model is a machine learning model that predicts a categorical outcome (class label) based on input features.
- Output: discrete classes (e.g., spam vs not spam, disease vs no disease).
- Unlike regression (predicts continuous values), classification predicts which group an observation belongs to.
Types of Classification
- Binary Classification
- Two classes only (0/1, yes/no).
- Examples:
- Email → spam or not
- Patient → sick or healthy
- Multiclass Classification
- More than two classes.
- Examples:
- Handwritten digit recognition (0–9)
- Iris flower classification (Setosa, Versicolor, Virginica)
- Multilabel Classification
- Each instance can belong to multiple classes simultaneously.
- Example: A movie can be tagged as [Comedy, Romance, Drama].
Common Classification Models
- Logistic Regression
- Despite the name, used for classification.
- Outputs probability via sigmoid (binary) or softmax (multiclass).
- Decision Trees
- Rule-based splits on features.
- Easy to interpret, but prone to overfitting.
- Random Forests
- Ensemble of decision trees.
- More stable and accurate than a single tree.
- Support Vector Machines (SVM)
- Finds the hyperplane that best separates classes.
- Works well in high dimensions.
- k-Nearest Neighbors (kNN)
- Assigns label based on the majority class of nearest neighbors.
- Simple, but slow on large datasets.
- Naïve Bayes
- Probabilistic model based on Bayes’ theorem.
- Works well with text classification (spam detection).
- Neural Networks (Deep Learning)
- Powerful for image, speech, text classification.
- Use nonlinear transformations and softmax outputs.
Evaluation Metrics
Since accuracy alone can be misleading (especially with imbalanced data), we use:
- Confusion Matrix (TP, TN, FP, FN)
- Accuracy: % of correct predictions
- Precision: % of predicted positives that are correct
- Recall (Sensitivity): % of actual positives correctly predicted
- F1 Score: Harmonic mean of precision and recall
- ROC-AUC / PR-AUC: Curve-based metrics for probability-based classifiers
Applications
- Healthcare: Predict disease presence from symptoms.
- Finance: Fraud detection, credit risk modeling.
- Marketing: Customer churn prediction, ad click prediction.
- NLP: Spam detection, sentiment analysis.
- Computer Vision: Object recognition, facial detection.
In short:
Classification models predict categories (binary, multiclass, or multilabel).
They range from simple (logistic regression) to complex (deep neural networks), and are evaluated using metrics like precision, recall, and AUC.
