-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path00-introduction.Rmd
158 lines (117 loc) · 6.06 KB
/
00-introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
<!--
kate: indent-width 4; word-wrap-column 74; default-dictionary en_AU
Copyright (C) 2020-2022, Marek Gagolewski <https://www.gagolewski.com>
This material is licensed under the Creative Commons BY-NC-ND 4.0 License.
-->
# Preface {.unnumbered}
```{r chapter-header-motd,echo=FALSE,results="asis"}
cat(readLines("chapter-header-motd.md"), sep="\n")
```
Copyright (C) 2020-2022, [Marek Gagolewski](https://www.gagolewski.com).
This material is licensed under the Creative Commons
[Attribution-NonCommercial-NoDerivatives 4.0 International](https://creativecommons.org/licenses/by-nc-nd/4.0/)
License (CC BY-NC-ND 4.0).
You can access this textbook at:
* https://lmlcr.gagolewski.com/ (a browser-friendly version)
* https://lmlcr.gagolewski.com/lmlcr.pdf (PDF)
* https://github.com/gagolews/lmlcr (source code)
<!-- ![](figures/cover) -->
#### Aims and Scope {.unnumbered}
Machine learning has numerous exciting real-world applications,
including stock market prediction, speech recognition,
computer-aided medical diagnosis, content and product recommendation,
anomaly detection in security camera footage, game playing,
autonomous vehicle operation, and many others.
In this book we will take an unpretentious glance at the most fundamental
algorithms that have stood the test of time and which form
the basis for state-of-the-art solutions of modern AI,
which is principally (big) data-driven.
We will learn how to use the R language [@rproject]
for implementing various stages
of data processing and modelling activities.
For a more in-depth treatment of R, refer to this book's Appendices
and, for instance, [@Rintro; @rprogdatascience; @r4ds].
These pages contain solid underpinnings for further studies
related to statistical learning, machine learning
data science, data analytics, and artificial intelligence,
including [@islr; @esl; @bishop].
We will also appreciate the vital role of mathematics as a universal
language for formalising data-intense problems and communicating their
solutions. The book is aimed at readers who are yet to be fluent with
university-level linear algebra, calculus and probability theory,
such as 1st year undergrads or those who have forgotten
all the maths they have learned and need a gentle, non-invasive,
yet rigorous introduction to the topic.
For a nice, machine learning-focused introduction to mathematics alone,
see, e.g., [@mml].
<!-- TODO: This book is structured as follows. -->
<!-- TODO: This book is non-standard in the way that:
* it doesn't assume that the reader is familiar with the concept
of probability, unlike many other books on statistical learning (especially with the
use of the R language)
* it only assumes high school-level mathematics; basic linear algebra concepts
(of which matrix multiplication is the most difficult one) are explained;
there is some reference to function differentiation, however the material is optional
and marked with an asterisk;
I am aware of the fact that some readers might struggle with the $\sum$ notation
for summation, but this is explained in the book;
I hope this book is a nice appreciation of the higher maths courses such as
linear algebra, calculus, discrete maths as well as probability and statistics
* then, once the math groundwork is laid, the reader will be ready for other
excellent books, like ESL, Bishop etc.
* it takes an algorithmic approach to the description of machine learning models
(yet, the models are not treated as black-boxes; we don't just apply functions
from different packages, but try to implement them on our own)
* it doesn't assume any prior exposure to programming languages;
essential R basics are introduced in a rigorous manner and are self-contained;
it's not going to be easy, but all the building blocks (together with exercises)
are provided
* it demystifies and deconstructs the "coolness" of AI -- putting
all the hype aside, super-duper-tensor-GPU-cloud-self-driving-1-billion-revenue-
you-must-buy-our-products, it demonstrates that the basic concepts behind most methods
have been known for decades (basically, our computers are now much faster
and we have much more data, that's it); you don't need to be excited
about each new product, new toolbox, new algorithms (and don't FOMO!);
stay calm, it's just an old dog that learned new tricks.
instead of finding examples of datasets where ml methods work
("good datasets are hard to find")
the author took a few realistic databases and.... well, showed
how to solve the problems with the methods that don't work ;)
you might want to become a ML engineer or data analyst/scientist
or use the methods in your research
greedy companies store a lot of data about their customers --
you might want to understand the principles behind the methods they use
to recommend you that ad or product
-->
#### About the Author {.unnumbered}
[Marek Gagolewski][1]
is currently a Senior Lecturer in Applied AI at Deakin University
in Melbourne, VIC, Australia
and an Associate Professor in Data Science (on long-term leave)
at the Faculty of Mathematics and Information Science, Warsaw University
of Technology, Poland and Systems Research Institute of the Polish
Academy of Sciences.
His research interests include machine learning, data aggregation and
clustering, computational statistics, mathematical modelling
(science of science, sport, economics, etc.),
and free (libre) data analysis software
([stringi](https://stringi.gagolewski.com),
[genieclust](https://genieclust.gagolewski.com),
among [others](https://github.com/gagolews)).
#### Acknowledgements {.unnumbered}
This book has been prepared with pandoc, Markdown, and GitBook.
R code chunks have been processed with knitr.
A little help of bookdown, good ol' Makefiles, and shell scripts
did the trick.
```{r,echo=FALSE}
library("knitr")
library("bookdown")
library("stringi")
fs <- list.files(".", "\\.Rmd$", full=TRUE)
pkgs <- stri_unique(stri_sort(unlist(lapply(fs, function(f)
na.omit(stri_match_first_regex(
readLines(f), "^library\\(['\"]?(.*?)['\"]?\\)")[,2])))))
```
The following R packages are used or referred to in the text:
`r stri_flatten(pkgs, collapse=", ")`.
{ LATEX \mainmatter }