Dimensionality reduction

    Used to

    • accelerate the ML learning process
    • visualize the data (<= 3 dimensions)
    • avoid over-fitting

    Methods (pro/cons)

    SVD:
    
    Advantages:
    
        It’s very efficient (via Lanczos algorithm or similar it can be applied to really big matrices)
        The basis is hierarchical, ordered by relevance
        It tends to perform quite well for most data sets
    
    Disadvantages:
    
        If the data is strongly non-linear it may not work so well
        Results are not always the best for visualization
        It is difficult to interpret
        Strongly focused in variance, sometimes there’s not a direct relationship between variance and predictive power so it can discard useful information
    
    Conclusion: De-facto method for dimensionality reduction in generic datasets.
    
    PCA:
    
    Advantages:
    
        Same as SVD
    
    Disadvantages:
    
        Same as SVD plus it is not as efficient to compute as the SVD
    
    Conclusion: It’s the same as the SVD but not as efficient. Never use it.
    
    T-SNE,ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps
    
    Advantages:
    
        Can work well when Data is strongly non-linear
        Can work very well for visualization
    
    Disadvantages:
    
        Can be inefficient for large data
        Certainly not a good idea unless the data is strongly non-linear
        Sometimes they just work well for visualization but not for dimensionality reduction
    
    NMF (Nonnegative Matrix Factorization)
    
    Advantages:
    
        Results are easier to interpret than the SVD
        Provide an additive basis to represent the data (sometimes this is good)
    
    Disadvantages:
    
        Can overfit, frequently millions of solutions are possible, which one is the right one?
        There’s no hierarchy in the basis (sometimes this is bad)
    
    Feature Hashing / Hash Kernels / The Hashing Trick
    
    Advantages:
    
        Preserve the inner product between vectors in the original space, so distance and similarity can be preserved.
        Works great for sparse data, it can create sparse representations
        Extremely fast and simple
        Can filter some noise
    
    Disadvantages:
    
        Limited to what the original data can do
        Not suitable for data visualization
    
    K-Means Based Methods for Dimensionality Reduction
    
    Advantages:
    
        Quite efficient
        Can work well with non-linear data
        The learned basis is useful to represent the data (compression)
        In some cases can as well as deep learning methods
    
    Disadvantages:
    
        Not very popular
        Can create different representations based on different initializations
        Might need a little tuning to get it working
    
    Autoencoders and Deep Autoencoders
    
    Advantages:
    
        Can find different levels of features
        Probably state of the art for representing data at different levels
        Can be trained to denoise data or to generate data
    
    Disadvantages:
    
        Can overfit big time
        Very nice in theory but not to many practical applications
        Can be inefficient for massive data
    

    Source

    Which method to choose

    If your goal is data visualization then T-SNE is quite standard, if T-SNE doesn’t work well then try ISOMAP, Laplacian Eigenmaps, etc.

    If you are dealing with generic data and you are not sure the SVD is your swiss army knife.

    There are many NMF algorithms chances are one of them can work very well for your data.

    If you have sparse data in a very high dimensional space then feature hashing is probably your celestial solution.

    If you are working with images, sound, speech, music, then some variation of an autoencoder is probably your best solution.

    In all of the above cases the K-Means based methods can work so they can’t be discarded.

    PCA is the same as the SVD so nobody should really use it.

    Source