Ad Code

Data Science labs blog

How, When, and Why Should You Normalize / Standardize / Rescale Your Data?

 

How, When, and Why Should You Normalize / Standardize / Rescale Your Data?

Image for post

Why Should You Standardize / Normalize Variables:

Standardization:

Normalization:

When Should You Use Normalization And Standardization:

Dataset:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
cols = ['loan_amnt', 'int_rate', 'installment']
data = pd.read_csv('loan.csv', nrows = 30000, usecols = cols)
data.describe()
Image for post

Standardization (Standard Scalar) :

μ=0 and σ=1
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
print(data_scaled.mean(axis=0))
print(data_scaled.std(axis=0))
Image for post
print('Min values (Loan Amount, Int rate and Installment): ', data_scaled.min(axis=0))
print('Max values (Loan Amount, Int rate and Installment): ', data_scaled.max(axis=0))
Image for post

Normalization (Min-Max Scalar) :

Image for post
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)
print('means (Loan Amount, Int rate and Installment): ', data_scaled.mean(axis=0))
print('std (Loan Amount, Int rate and Installment): ', data_scaled.std(axis=0))
Image for post
print('Min (Loan Amount, Int rate and Installment): ', data_scaled.min(axis=0))
print('Max (Loan Amount, Int rate and Installment): ', data_scaled.max(axis=0))
Image for post

Robust Scalar (Scaling to median and quantiles) :

IQR = 75th quantile — 25th quantile
X_scaled = (X — X.median) / IQR
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
data_scaled = scaler.fit_transform(data)
print('means (Loan Amount, Int rate and Installment): ', data_scaled.mean(axis=0))
print('std (Loan Amount, Int rate and Installment): ', data_scaled.std(axis=0))
Image for post
print('Min (Loan Amount, Int rate and Installment): ', data_scaled.min(axis=0))
print('Max (Loan Amount, Int rate and Installment): ', data_scaled.max(axis=0))
Image for post

References

Reactions

Post a Comment

0 Comments