- accelerate the ML learning process
- visualize the data (<= 3 dimensions)
- avoid over-fitting
SVD: Advantages: It’s very efficient (via Lanczos algorithm or similar it can be applied to really big matrices) The basis is hierarchical, ordered by relevance It tends to perform quite well for most data sets Disadvantages: If the data is strongly non-linear it may not work so well Results are not always the best for visualization It is difficult to interpret Strongly focused in variance, sometimes there’s not a direct relationship between variance and predictive power so it can discard useful information Conclusion: De-facto method for dimensionality reduction in generic datasets. PCA: Advantages: Same as SVD Disadvantages: Same as SVD plus it is not as efficient to compute as the SVD Conclusion: It’s the same as the SVD but not as efficient. Never use it. T-SNE,ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps Advantages: Can work well when Data is strongly non-linear Can work very well for visualization Disadvantages: Can be inefficient for large data Certainly not a good idea unless the data is strongly non-linear Sometimes they just work well for visualization but not for dimensionality reduction NMF (Nonnegative Matrix Factorization) Advantages: Results are easier to interpret than the SVD Provide an additive basis to represent the data (sometimes this is good) Disadvantages: Can overfit, frequently millions of solutions are possible, which one is the right one? There’s no hierarchy in the basis (sometimes this is bad) Feature Hashing / Hash Kernels / The Hashing Trick Advantages: Preserve the inner product between vectors in the original space, so distance and similarity can be preserved. Works great for sparse data, it can create sparse representations Extremely fast and simple Can filter some noise Disadvantages: Limited to what the original data can do Not suitable for data visualization K-Means Based Methods for Dimensionality Reduction Advantages: Quite efficient Can work well with non-linear data The learned basis is useful to represent the data (compression) In some cases can as well as deep learning methods Disadvantages: Not very popular Can create different representations based on different initializations Might need a little tuning to get it working Autoencoders and Deep Autoencoders Advantages: Can find different levels of features Probably state of the art for representing data at different levels Can be trained to denoise data or to generate data Disadvantages: Can overfit big time Very nice in theory but not to many practical applications Can be inefficient for massive data
Which method to choose
If your goal is data visualization then T-SNE is quite standard, if T-SNE doesn’t work well then try ISOMAP, Laplacian Eigenmaps, etc.
If you are dealing with generic data and you are not sure the SVD is your swiss army knife.
There are many NMF algorithms chances are one of them can work very well for your data.
If you have sparse data in a very high dimensional space then feature hashing is probably your celestial solution.
If you are working with images, sound, speech, music, then some variation of an autoencoder is probably your best solution.
In all of the above cases the K-Means based methods can work so they can’t be discarded.
PCA is the same as the SVD so nobody should really use it.