Preprocessing methods

    Sources:

    • https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02

    One Hot Encoding

    • (-) n labels will create 2^n variables.
    • (+) interpretable

    Hashing Encoding

    • (+) considerably reduce the number of variables
    • (-) interpretable, loose information

    Classic Hash

    • (-) collisions with unrelated labels

    Local Sensitive Hashing (LSH)

    • (+/-) close elements are considered similar

    Target Encoding (Level Encoding)

    • (+) great for high cardinality of categorical variables

    source