# Characteristics of Machine Learning Algorithms

Discriminative / Generative algorithms

Many machine learning algorithms are based on probabilistic models. Discriminative algorithms learn the conditional probability distribution of y given x p(y|x). Generative algorithms learn the joint probability distribution p(y,x) = p(y|x) * p(x), and therefore take in consideration the distribution of x.

Parametric & Non-Parametric algorithms

Neural Network, SVM,… are non-parametric algorithms because they have no fixed parameter vector; depending on the richness of the data we can choose the size of the parameter vector that we need (ex. we can add hidden units, change SVM kernels,…)

Capacity

The capacity is controlled by parameters called Hyperparameters (ex. degree p of a polynomial predictor, kernel choice in SVM, number of layers in a neural network).

# Summary of Machine Learning Algorithms

The table below describes briefly each machine learning algorithm.

 Algorithm Description Characteristics Linear regression To use when Y is normally-distributed Discriminative Parametric Logistic regression To use when Y is Bernoulli-distributed Discriminative Parametric Multinomial logistic regression (softmax regression) To use when Y is multinomially-distributed There are two versions of the algorithm, one based on maximum likelihood maximization, and one based on cross-entropy minimization. Discriminative Parametric Gaussian Discriminant Analysis Supervised classification algorithm To use when Y is Bernoulli distributed and the conditional distribution of X given Y is multivariate Gaussian Generative Parametric Naive Bayes Algorithm Supervised classification algorithm To use when Y is Bernoulli, and the conditional distribution of X given Y is Bernoulli and X features are conditionally independent Generative Parametric EM Unsupervised soft-clustering algorithm Generative Parametric Principal Component Analysis Reduce the dimensionality of X. Calculate eigenvectors for $$XX^T$$. Use eigenvectors with higher eigenvalues to transform data $$x^{(i)} := (u_1^T x^{(i)}, u_2^T x^{(i)},…, , u_k^T x^{(i)})$$ Factor Analysis Reduce the dimensionality of X To use when X is Gaussian. Transform X to Z matrix with lower dimensionality (x ? ?+?z). Neural Network Non-linear classifier Discriminative Non-Parametric K-Nearest Neighbor Regression Predict y as the average of values y1,y2,�,yk of k nearest neighbors of x Discriminative Non-Parametric K-Nearest Neighbor Classifier Predict y as the most common class among the k nearest neighbors of x Discriminative Non-Parametric Support Vector Machine Find a separator that maximizes the margin between two classes Discriminative Non-Parametric K-means Unsupervised hard-clustering algorithm Discriminative Non-Parametric