Characteristics of Machine Learning Algorithms
Discriminative / Generative algorithms
Many machine learning algorithms are based on probabilistic models. Discriminative algorithms learn the conditional probability distribution of y given x p(yx). Generative algorithms learn the joint probability distribution p(y,x) = p(yx) * p(x), and therefore take in consideration the distribution of x.
Parametric & NonParametric algorithms
Neural Network, SVM,… are nonparametric algorithms because they have no fixed parameter vector; depending on the richness of the data we can choose the size of the parameter vector that we need (ex. we can add hidden units, change SVM kernels,…)
Capacity
The capacity is controlled by parameters called Hyperparameters (ex. degree p of a polynomial predictor, kernel choice in SVM, number of layers in a neural network).
Summary of Machine Learning Algorithms
The table below describes briefly each machine learning algorithm.
Algorithm 
Description 
Characteristics 
Linear regression 
To use when Y is normallydistributed 
Discriminative Parametric 
Logistic regression 
To use when Y is Bernoullidistributed 
Discriminative Parametric 
Multinomial logistic regression (softmax regression) 
To use when Y is multinomiallydistributed There are two versions of the algorithm, one based on maximum likelihood maximization, and one based on crossentropy minimization. 
Discriminative Parametric 
Gaussian Discriminant Analysis 
Supervised classification algorithm To use when Y is Bernoulli distributed and the conditional distribution of X given Y is multivariate Gaussian 
Generative Parametric 
Naive Bayes Algorithm 
Supervised classification algorithm To use when Y is Bernoulli, and the conditional distribution of X given Y is Bernoulli and X features are conditionally independent 
Generative Parametric 
EM 
Unsupervised softclustering algorithm 
Generative Parametric 
Principal Component Analysis 
Reduce the dimensionality of X. Calculate eigenvectors for \(XX^T\). Use eigenvectors with higher eigenvalues to transform data \(x^{(i)} := (u_1^T x^{(i)}, u_2^T x^{(i)},…, , u_k^T x^{(i)})\) 

Factor Analysis 
Reduce the dimensionality of X To use when X is Gaussian. Transform X to Z matrix with lower dimensionality (x ? ?+?z). 

Neural Network 
Nonlinear classifier 
Discriminative NonParametric 
KNearest Neighbor Regression 
Predict y as the average of values y1,y2,…,yk of k nearest neighbors of x 
Discriminative NonParametric 
KNearest Neighbor Classifier 
Predict y as the most common class among the k nearest neighbors of x 
Discriminative NonParametric 
Support Vector Machine 
Find a separator that maximizes the margin between two classes 
Discriminative NonParametric 
Kmeans 
Unsupervised hardclustering algorithm 
Discriminative NonParametric 