diff --git a/README.md b/README.md index e014db6d..531c8bb6 100644 --- a/README.md +++ b/README.md @@ -83,7 +83,7 @@ Excerpts from the [Foreword](./docs/foreword_ro.pdf) and [Preface](./docs/prefac - [How important do you think having a mentor is to the learning process?](./faq/mentor.md) - [Where are the best online communities centered around data science/machine learning or python?](./faq/ml-python-communities.md) - [How would you explain machine learning to a software engineer?](./ml-to-a-programmer.md) - +- [How would your curriculum for a machine learning beginner look like?](./ml-curriculum.md) ### Questions about ML Concepts diff --git a/faq/README.md b/faq/README.md index 23a79283..e52b0c89 100644 --- a/faq/README.md +++ b/faq/README.md @@ -27,6 +27,7 @@ Sebastian - [How important do you think having a mentor is to the learning process?](./mentor.md) - [Where are the best online communities centered around data science/machine learning or python?](./ml-python-communities.md) - [How would you explain machine learning to a software engineer?](./ml-to-a-programmer.md) +- [How would your curriculum for a machine learning beginner look like?](./ml-curriculum.md) ### Questions about Machine Learning Concepts diff --git a/faq/datamining-vs-ml.md b/faq/datamining-vs-ml.md index 927983be..1d333761 100644 --- a/faq/datamining-vs-ml.md +++ b/faq/datamining-vs-ml.md @@ -1,4 +1,4 @@ -# What are differences in research nature between the two fields: Machine Learning & Data Mining? +# What are differences in research nature between the two fields: Machine Learning & Data Mining? In a nutshell, Data Mining is about the discovery of patterns in datasets or "gaining knowledge and insights" from data. Machine Learning is closely related though. We can think of Machine Learning algorithms as one of he work horses of Data Mining; most Data Mining approaches are based on Machine Learning algorithms. Maybe it helps to think of Data Mining as a pipeline of steps and approaches, and the use of a Machine Learning algorithm is one part of this pipeline. Or in other words, Data Mining is not "just" Machine Learning. E.g., data visualization or summarization is also part of Data Mining. What I was trying to say is that Machine Learning is one part, one set of techniques, that is/are being used in Data Mining. diff --git a/faq/ml-curriculum.md b/faq/ml-curriculum.md new file mode 100644 index 00000000..4de1df83 --- /dev/null +++ b/faq/ml-curriculum.md @@ -0,0 +1,40 @@ +# How would your curriculum for a machine learning beginner look like? +If I had to put together a study plan for a beginner, I would probably start with an easy-going intro course such as + +- Andrew Ng's [Machine Learning course on Coursera](https://class.coursera.org/ml-005/lecture) + +![](./ml-curriculum/ng.png) + +Next, I would recommend a good intro book on 'Data Mining' (data mining is essentially about extracting knowledge from data, mainly using machine learning algorithms). I can highly recommend the following book written by one of my former professors: + +- P.-N. Tan, M. Steinbach, and V. Kumar. [Introduction to Data Mining](http://www-users.cs.umn.edu/~kumar/dmbook/index.php), (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2005. + +![](./ml-curriculum/tan.jpeg) + +This book will provide you with a great overview of what's currently out there; you will not only learn about different machine learning techniques, but also learn how to "understand" and "handle" and interpret data -- remember; without "good," informative data, a machine learning algorithm is practically worthless. Additionally, you will learn about alternative techniques since machine learning is not always the only and best solution to a problem + +> if all you have is a hammer, everything looks like a nail ... + +Now, After completing the Coursera course, you will have a basic understanding of ML and broadened your understanding via the Data Mining book. +I don't want to self-advertise here, but I think my book would be a good follow-up to learn ML in more depth, understand the algorithms, learn about different data processing pipelines and evaluation techniques, best practices, and learn how to put in into action using Python, NumPy, scikit-learn, and Theano so that you can start working on your personal projects. + +![](./ml-curriculum/raschka.jpeg) + +While you work on your individual projects, I would maybe deepen your (statistical learning) knowledge via one of the three below: + + +- T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani. [The Elements of Statistical Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/), volume 2. Springer, 2009. +- C. M. Bishop et al. [Pattern recognition and machine learning](http://www.springer.com/us/book/9780387310732), volume 1. springer New York, 2006. +- Duda, Richard O., Peter E. Hart, and David G. Stork. [Pattern classification](http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471056693.html). John Wiley & Sons, 2012. + +![](./ml-curriculum/three.png) + +When you are through all of that and still hungry to learn more, I recommend + +- [the Deep Learning book](http://www.iro.umontreal.ca/~bengioy/dlbook/) by Yoshua Bengio, Ian Goodfellow, and Aaron Courville. The release date is set around 2016, but the 613-page manuscript is already available as as of today (online and for free). + +![](./ml-curriculum/bengio.png) + +- And in-between, if you are looking for a less technical yet very inspirational free-time read, I highly recommend [Pedro Domingo's The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World](https://homes.cs.washington.edu/~pedrod/) + +![](./ml-curriculum/domingos.png) diff --git a/faq/ml-curriculum/bengio.png b/faq/ml-curriculum/bengio.png new file mode 100644 index 00000000..f81c2c76 Binary files /dev/null and b/faq/ml-curriculum/bengio.png differ diff --git a/faq/ml-curriculum/bishop.jpeg b/faq/ml-curriculum/bishop.jpeg new file mode 100644 index 00000000..6f463be0 Binary files /dev/null and b/faq/ml-curriculum/bishop.jpeg differ diff --git a/faq/ml-curriculum/domingos.png b/faq/ml-curriculum/domingos.png new file mode 100644 index 00000000..704e6778 Binary files /dev/null and b/faq/ml-curriculum/domingos.png differ diff --git a/faq/ml-curriculum/duda.jpg b/faq/ml-curriculum/duda.jpg new file mode 100644 index 00000000..9c76404c Binary files /dev/null and b/faq/ml-curriculum/duda.jpg differ diff --git a/faq/ml-curriculum/ng.png b/faq/ml-curriculum/ng.png new file mode 100644 index 00000000..58c94ebf Binary files /dev/null and b/faq/ml-curriculum/ng.png differ diff --git a/faq/ml-curriculum/raschka.jpeg b/faq/ml-curriculum/raschka.jpeg new file mode 100644 index 00000000..56424eb2 Binary files /dev/null and b/faq/ml-curriculum/raschka.jpeg differ diff --git a/faq/ml-curriculum/tan.jpeg b/faq/ml-curriculum/tan.jpeg new file mode 100644 index 00000000..4915fde5 Binary files /dev/null and b/faq/ml-curriculum/tan.jpeg differ diff --git a/faq/ml-curriculum/three.png b/faq/ml-curriculum/three.png new file mode 100644 index 00000000..4c3db089 Binary files /dev/null and b/faq/ml-curriculum/three.png differ diff --git a/faq/ml-curriculum/tibshirani.jpeg b/faq/ml-curriculum/tibshirani.jpeg new file mode 100644 index 00000000..31cd133b Binary files /dev/null and b/faq/ml-curriculum/tibshirani.jpeg differ diff --git a/faq/ml-to-a-programmer.md b/faq/ml-to-a-programmer.md index b7500299..ff9d3da2 100644 --- a/faq/ml-to-a-programmer.md +++ b/faq/ml-to-a-programmer.md @@ -30,6 +30,7 @@ In machine learning, we take data (e.g., e-mails), provide information about the **machine learning:** + - results + data -> machine learning algorithm + computer -> set of rules