diff --git a/README.md b/README.md index 7c09ff0c..6b77d775 100644 --- a/README.md +++ b/README.md @@ -91,6 +91,7 @@ Excerpts from the [Foreword](./docs/foreword_ro.pdf) and [Preface](./docs/prefac - [What is the probabilistic interpretation of regularized logistic regression?](./faq/probablistic-logistic-regression.md) - [Can you give a visual explanation for the back propagation algorithm for neural networks?](./faq/visual-backpropagation.md) - [How do I evaluate a model?](./faq/evaluate-a-model.md) +- [What exactly is the "softmax and the multinomial logistic loss" in the context of machine learning?](./faq/softmax.md) - [Why do we re-use parameters from the training set to standardize the test set and new data?](./faq/standardize-param-reuse.md) - [What are some of the issues with clustering?](./faq/issues-with-clustering.md) - [What is the difference between deep learning and usual machine learning?](./faq/difference-deep-and-normal-learning.md) diff --git a/faq/README.md b/faq/README.md index a1b3f2c7..4c3d915a 100644 --- a/faq/README.md +++ b/faq/README.md @@ -34,6 +34,7 @@ Sebastian - [What is the probabilistic interpretation of regularized logistic regression?](./probablistic-logistic-regression.md) - [Can you give a visual explanation for the back propagation algorithm for neural networks?](./visual-backpropagation.md) - [How do I evaluate a model?](./evaluate-a-model.md) +- [What exactly is the "softmax and the multinomial logistic loss" in the context of machine learning?](./softmax.md) - [Why do we re-use parameters from the training set to standardize the test set and new data?](./standardize-param-reuse.md) - [What are some of the issues with clustering?](./issues-with-clustering.md) - [What is the difference between deep learning and usual machine learning?](./difference-deep-and-normal-learning.md) diff --git a/faq/softmax.md b/faq/softmax.md new file mode 100644 index 00000000..df701842 --- /dev/null +++ b/faq/softmax.md @@ -0,0 +1,15 @@ +# What exactly is the "softmax and the multinomial logistic loss" in the context of machine learning? + +The softmax function is simply a generalization of the logistic function that allows us to compute meaningful class-probabilities in multi-class settings (multinomial logistic regression). In softmax, we compute the probability that a particular sample (with net input z) belongs to the *i*th class using a normalization term in the denominator that is the sum of all *M* linear functions: + +![](./softmax/softmax_1.png) + +In contrast, the logistic function: + +![](./softmax/logistic.png) + +And for completeness, we define the net input as + +![](./softmax/net_input.png) + +where the weight coefficients of your model are stored as vector "w" and "x" is the feature vector of your sample. diff --git a/faq/softmax/logistic.png b/faq/softmax/logistic.png new file mode 100644 index 00000000..de9b14cc Binary files /dev/null and b/faq/softmax/logistic.png differ diff --git a/faq/softmax/net_input.png b/faq/softmax/net_input.png new file mode 100644 index 00000000..d9af13d1 Binary files /dev/null and b/faq/softmax/net_input.png differ diff --git a/faq/softmax/softmax_1.png b/faq/softmax/softmax_1.png new file mode 100644 index 00000000..edff52e6 Binary files /dev/null and b/faq/softmax/softmax_1.png differ