Dimensionality reduction

Used to

accelerate the ML learning process
visualize the data (<= 3 dimensions)
avoid over-fitting

Methods (pro/cons)

SVD:

Advantages:

    It’s very efficient (via Lanczos algorithm or similar it can be applied to really big matrices)
    The basis is hierarchical, ordered by relevance
    It tends to perform quite well for most data sets

Disadvantages:

    If the data is strongly non-linear it may not work so well
    Results are not always the best for visualization
    It is difficult to interpret
    Strongly focused in variance, sometimes there’s not a direct relationship between variance and predictive power so it can discard useful information

Conclusion: De-facto method for dimensionality reduction in generic datasets.

PCA:

Advantages:

    Same as SVD

Disadvantages:

    Same as SVD plus it is not as efficient to compute as the SVD

Conclusion: It’s the same as the SVD but not as efficient. Never use it.

T-SNE,ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps

Advantages:

    Can work well when Data is strongly non-linear
    Can work very well for visualization

Disadvantages:

    Can be inefficient for large data
    Certainly not a good idea unless the data is strongly non-linear
    Sometimes they just work well for visualization but not for dimensionality reduction

NMF (Nonnegative Matrix Factorization)

Advantages:

    Results are easier to interpret than the SVD
    Provide an additive basis to represent the data (sometimes this is good)

Disadvantages:

    Can overfit, frequently millions of solutions are possible, which one is the right one?
    There’s no hierarchy in the basis (sometimes this is bad)

Feature Hashing / Hash Kernels / The Hashing Trick

Advantages:

    Preserve the inner product between vectors in the original space, so distance and similarity can be preserved.
    Works great for sparse data, it can create sparse representations
    Extremely fast and simple
    Can filter some noise

Disadvantages:

    Limited to what the original data can do
    Not suitable for data visualization

K-Means Based Methods for Dimensionality Reduction

Advantages:

    Quite efficient
    Can work well with non-linear data
    The learned basis is useful to represent the data (compression)
    In some cases can as well as deep learning methods

Disadvantages:

    Not very popular
    Can create different representations based on different initializations
    Might need a little tuning to get it working

Autoencoders and Deep Autoencoders

Advantages:

    Can find different levels of features
    Probably state of the art for representing data at different levels
    Can be trained to denoise data or to generate data

Disadvantages:

    Can overfit big time
    Very nice in theory but not to many practical applications
    Can be inefficient for massive data

Source

Which method to choose

If your goal is data visualization then T-SNE is quite standard, if T-SNE doesn’t work well then try ISOMAP, Laplacian Eigenmaps, etc.

If you are dealing with generic data and you are not sure the SVD is your swiss army knife.

There are many NMF algorithms chances are one of them can work very well for your data.

If you have sparse data in a very high dimensional space then feature hashing is probably your celestial solution.

If you are working with images, sound, speech, music, then some variation of an autoencoder is probably your best solution.

In all of the above cases the K-Means based methods can work so they can’t be discarded.

PCA is the same as the SVD so nobody should really use it.

Source