Characteristics of Machine Learning Algorithms
Discriminative / Generative algorithms
Many machine learning algorithms are based on probabilistic models. Discriminative algorithms learn the conditional probability distribution of y given x p(y|x). Generative algorithms learn the joint probability distribution p(y,x) = p(y|x) * p(x), and therefore take in consideration the distribution of x.
Parametric & Non-Parametric algorithms
Neural Network, SVM,… are non-parametric algorithms because they have no fixed parameter vector; depending on the richness of the data we can choose the size of the parameter vector that we need (ex. we can add hidden units, change SVM kernels,…)
Capacity
The capacity is controlled by parameters called Hyperparameters (ex. degree p of a polynomial predictor, kernel choice in SVM, number of layers in a neural network).
Summary of Machine Learning Algorithms
The table below describes briefly each machine learning algorithm.
Algorithm |
Description |
Characteristics |
Linear regression |
To use when Y is normally-distributed |
Discriminative Parametric |
Logistic regression |
To use when Y is Bernoulli-distributed |
Discriminative Parametric |
Multinomial logistic regression (softmax regression) |
To use when Y is multinomially-distributed There are two versions of the algorithm, one based on maximum likelihood maximization, and one based on cross-entropy minimization. |
Discriminative Parametric |
Gaussian Discriminant Analysis |
Supervised classification algorithm To use when Y is Bernoulli distributed and the conditional distribution of X given Y is multivariate Gaussian |
Generative Parametric |
Naive Bayes Algorithm |
Supervised classification algorithm To use when Y is Bernoulli, and the conditional distribution of X given Y is Bernoulli and X features are conditionally independent |
Generative Parametric |
EM |
Unsupervised soft-clustering algorithm |
Generative Parametric |
Principal Component Analysis |
Reduce the dimensionality of X. Calculate eigenvectors for \(XX^T\). Use eigenvectors with higher eigenvalues to transform data \(x^{(i)} := (u_1^T x^{(i)}, u_2^T x^{(i)},…, , u_k^T x^{(i)})\) |
|
Factor Analysis |
Reduce the dimensionality of X To use when X is Gaussian. Transform X to Z matrix with lower dimensionality (x ? ?+?z). |
|
Neural Network |
Non-linear classifier |
Discriminative Non-Parametric |
K-Nearest Neighbor Regression |
Predict y as the average of values y1,y2,…,yk of k nearest neighbors of x |
Discriminative Non-Parametric |
K-Nearest Neighbor Classifier |
Predict y as the most common class among the k nearest neighbors of x |
Discriminative Non-Parametric |
Support Vector Machine |
Find a separator that maximizes the margin between two classes |
Discriminative Non-Parametric |
K-means |
Unsupervised hard-clustering algorithm |
Discriminative Non-Parametric |