Definition
A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification (mainly) and regression.
- Core idea: Find the best separating hyperplane that maximizes the margin between different classes.
- It uses support vectors (the critical data points closest to the decision boundary) to define that hyperplane.
Intuition
- Suppose you have two classes of points in 2D. Many lines can separate them.
- SVM chooses the one that maximizes the margin (distance between boundary and nearest points of each class).
- This leads to better generalization.
Mathematics
- Decision Function (linear case):
$f(x) = w \cdot x + b$
- $w$: weight vector (normal to the hyperplane)
- $b$: bias (offset)
- Classification Rule:
$\hat{y} = \text{sign}(w \cdot x + b)$
- Margin:
$\text{Margin} = \frac{2}{\|w\|}$
- SVM maximizes this margin while correctly classifying the training data (or with minimal errors).
Support Vectors
- The data points closest to the hyperplane.
- They determine the position and orientation of the decision boundary.
- Other points don’t matter once the hyperplane is set.
Kernel Trick
- When data is not linearly separable, SVM uses a kernel function to map data into a higher-dimensional space where it becomes separable.
- Common kernels:
- Linear
- Polynomial
- Radial Basis Function (RBF / Gaussian)
- Sigmoid
Example: In 2D, a circle vs background is not linearly separable. Mapping to higher dimension (radius) makes it separable.
Types of SVMs
- Hard-Margin SVM
- Assumes perfect separation (no misclassification).
- Works only if data is clean and separable.
- Soft-Margin SVM
- Allows some misclassification (controlled by penalty parameter $C$).
- Balances margin maximization with classification errors.
- Support Vector Regression (SVR)
- Extension of SVM for regression tasks.
- Fits within an ϵ\epsilonϵ-tube around the true values.
Advantages
Works well with high-dimensional data.
Effective when classes are separable or nearly separable.
Kernel trick allows flexible decision boundaries.
Uses only support vectors → efficient in memory.
Disadvantages
Training can be slow on very large datasets.
Choice of kernel and parameters is critical.
Less interpretable than linear regression or logistic regression.
Applications
- Text classification (spam detection, sentiment analysis)
- Image recognition (face detection, handwriting recognition)
- Bioinformatics (protein classification, cancer detection)
- Finance (fraud detection, credit scoring)
In short:
SVMs classify data by finding the maximum-margin hyperplane that separates classes, using only the critical points (support vectors). With the kernel trick, they handle nonlinear boundaries very effectively.
