Skip to content

Commit

Permalink
added ML track (#41)
Browse files Browse the repository at this point in the history
  • Loading branch information
kabirnagpal authored Jul 16, 2020
1 parent 01bbfe5 commit cc3cc93
Show file tree
Hide file tree
Showing 7 changed files with 185 additions and 11 deletions.
29 changes: 18 additions & 11 deletions soa/tracks/ml/1.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,20 @@
## Example
# ML Track
Welcome to the ML track. We hope you're really excited for this.
For starters we'll brush up your Python Skills. This includes your understanding of
- [Numpy](https://numpy.org/)
- [Pandas](https://pandas.pydata.org/)
- [Matplotlib](https://matplotlib.org/)

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%201.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).

Nothing to see here yet. Example code.

<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return '=' == s.replace(' ', '').strip()
</code>
</form>
How to get mean of each column in a Data Frame named `df`?
Please write the full command. ( answer is case sensitive )
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s == 'df.mean()'
</code>
</form>
28 changes: 28 additions & 0 deletions soa/tracks/ml/2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# ML Track - Week 2
We hope you're really excited to get started with actual Machine Learning. But just hold on!!
A big problem in machine learning algorithms is that, they're not humans. They are just bunch of formulas being applied in a loop of conditional statements.
So it cannot handle certain types of data like Strings. Also it will not be able to handle missing values.
These concept were discussed in last week tracks, and now is the time to learn in depth.
This week we'll learn about:

- One Hot encoding
- Label Encoding
- Normalization
- Dealing with Missing values
- Introduction to Machine learning
- Types of Learning (Supervised, Unsupervised and Reinforcement)
- Application of Machine Learning

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%202.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).


Write the command to One Hot encode Column named 'company' using pandas function on data frame `df`.
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s == "df['company'].get_dummies()"
</code>
</form>
24 changes: 24 additions & 0 deletions soa/tracks/ml/3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# ML Track - Week 3
Congratulations for making upto here!
As now we've completed the preprocessing methods, we can start with Machine Learing Algorithms.
We'll start with **Regression**.
Regression analysis is a supervised method, used to predict **Continous**, **Independent** variable using dependent variables.
This week will require you to have prior knowledge in linear, quadratic and polynomial equations.
This week we'll learn about:

- Linear Regression
- Multiple Linear Regression
- Polynomial Regression

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%203.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)

`mean_squared_error` is a method from which class in Sklearn?
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'metrics'
</code>
</form>
28 changes: 28 additions & 0 deletions soa/tracks/ml/4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# ML Track - Week 4
This week we are going to learn a type of widely used supervised machine learning algorithm - **Classification**.

Classification is the process of predicting the class of given data points.

For example, spam detection in email service providers can be identified as a classification problem. This is a binary classification since there are only 2 classes : spam and not spam. A classifier utilizes some training data to understand how given input variables relate to the class.

In this week, we will cover the following classifier algorithnms:

- Support Vector Classifier (SVC)
- Decision Tree Classifier
- Random Forest Classifier
- Voting Classifier

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%202.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)


#### What is the number of estimators used for Random Forest Classifier?
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>

def answer(s):
return s=='200'
</code>
</form>
31 changes: 31 additions & 0 deletions soa/tracks/ml/5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# ML Track - Week 5
Congratulations, You have come mid-way!
Now, let's learn how good or bad our model is performing and why?

Topics covered in this week:
- Underfitting
- Overfitting
- Bias Variance Trade-off
- Regularization
- Support Vector Machine

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%206.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)

I hope this that week would have proven useful to you and let's wind it up with a quick question .

Ques) In terms of the bias-variance trade-off, which of the following is/are substantially more harmful to the test error than the training error? (Input the correct option)
a) Bias
b) Loss
c) Variance
d) Risk


<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'c'
</code>
</form>
30 changes: 30 additions & 0 deletions soa/tracks/ml/6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# ML Track - Week 6
Congratulations,
You have come a long way! Till now we have been working on supervised machine learning , so now gear up for the first chapter of unsupervised machine learning - Clustering .

Clustering is basically the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups.

So in this week we are going to dive deep into the clustering and cover the following topics:

- what is clustering
- Difference between clustering and clasification
- K-means clustering
-- Silhouette Score
- Hierarchical clustering


### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%206.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)

I hope this that week would have proven useful to you and let's wind it up with a quick question .
Ques) What is the name of the linkage that we have used in Agglomerative Clustering?
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'ward'
</code>
</form>


26 changes: 26 additions & 0 deletions soa/tracks/ml/7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# ML Track-Week 7

Congratulations for making it upto here !

This week will introduce you to Dimensionality Reduction Techniques and Model Selection strategies like K cross fold validation, Grid Search and Stacking.

Dimensionality Reduction means reducing the number of features(columns) in a given dataset.Imagine working with a dataset with nearly 20000 features. Having
so many features makes it problematic to draw insights from the data. It’s not feasible to analyze each and every variable at a microscopic level. Hence, we use Dimensionality Reduction techniques.

Model selection,on the hand, is the process of selecting one final machine learning model from among a collection of candidate machine learning models
for a training dataset.

Let's start then ,shall we ?

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%207.ipynb).If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)

Question to be answered after you complete your notebook
Kernel PCA cannot be used for non linear data. (True / False)
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'false'
</code>
</form>

0 comments on commit cc3cc93

Please sign in to comment.