Skip to content

Commit

Permalink
Update Day 3_Data_PreProcessing.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Harun-Ur-Rashid(Shimanto) authored Aug 9, 2018
1 parent 9cfe630 commit 7745741
Showing 1 changed file with 1 addition and 45 deletions.
46 changes: 1 addition & 45 deletions Data_Preprocessing/Day 3_Data_PreProcessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,50 +6,6 @@
:point_up: This infograph Design by my buddy [Sheikh Anikul Islam Hani](https://github.com/SheikhAnikulIslam):octocat:.But Information get from me. :v:

As shown in the infograph we will break down data preprocessing in 6 essential steps.
Get the dataset from [Here](https://github.com/harunshimanto/100-Days-Of-ML-Code/blob/master/Datasets/ShopSellData.csv) that is used in this example.
Get the dataset from [Here](https://github.com/harunshimanto/100-Days-Of-ML-Code/blob/master/Datasets/ShopSellData.csv) that is used in this [example](http://bit.ly/2KDkTfT).

## Step 1: Importing the libraries
```Python
import numpy as np
import pandas as pd
```
## Step 2: Importing dataset
```python
dataset = pd.read_csv('ShopSellData.csv')
X = dataset.iloc[ : , :-1].values
Y = dataset.iloc[ : , 3].values
```
## Step 3: Handling the missing data
```python
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = "NaN", strategy = "mean", axis = 0)
imputer = imputer.fit(X[ : , 1:3])
X[ : , 1:3] = imputer.transform(X[ : , 1:3])
```
## Step 4: Encoding categorical data
```python
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])
```
### Creating a dummy variable
```python
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)
```
## Step 5: Splitting the datasets into training sets and Test sets
```python
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)
```

## Step 6: Feature Scaling
```python
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.fit_transform(X_test)
```
### Done :v:

0 comments on commit 7745741

Please sign in to comment.