diff --git a/README.md b/README.md
index 7c09ff0c..6b77d775 100644
--- a/README.md
+++ b/README.md
@@ -91,6 +91,7 @@ Excerpts from the [Foreword](./docs/foreword_ro.pdf) and [Preface](./docs/prefac
 - [What is the probabilistic interpretation of regularized logistic regression?](./faq/probablistic-logistic-regression.md)
 - [Can you give a visual explanation for the back propagation algorithm for neural networks?](./faq/visual-backpropagation.md)
 - [How do I evaluate a model?](./faq/evaluate-a-model.md)
+- [What exactly is the "softmax and the multinomial logistic loss" in the context of machine learning?](./faq/softmax.md)
 - [Why do we re-use parameters from the training set to standardize the test set and new data?](./faq/standardize-param-reuse.md)
 - [What are some of the issues with clustering?](./faq/issues-with-clustering.md)
 - [What is the difference between deep learning and usual machine learning?](./faq/difference-deep-and-normal-learning.md)
diff --git a/faq/README.md b/faq/README.md
index a1b3f2c7..4c3d915a 100644
--- a/faq/README.md
+++ b/faq/README.md
@@ -34,6 +34,7 @@ Sebastian
 - [What is the probabilistic interpretation of regularized logistic regression?](./probablistic-logistic-regression.md)
 - [Can you give a visual explanation for the back propagation algorithm for neural networks?](./visual-backpropagation.md)
 - [How do I evaluate a model?](./evaluate-a-model.md)
+- [What exactly is the "softmax and the multinomial logistic loss" in the context of machine learning?](./softmax.md)
 - [Why do we re-use parameters from the training set to standardize the test set and new data?](./standardize-param-reuse.md)
 - [What are some of the issues with clustering?](./issues-with-clustering.md)
 - [What is the difference between deep learning and usual machine learning?](./difference-deep-and-normal-learning.md)
diff --git a/faq/softmax.md b/faq/softmax.md
new file mode 100644
index 00000000..df701842
--- /dev/null
+++ b/faq/softmax.md
@@ -0,0 +1,15 @@
+# What exactly is the "softmax and the multinomial logistic loss" in the context of machine learning?
+
+The softmax function is simply a generalization of the logistic function that allows us to compute meaningful class-probabilities in multi-class settings (multinomial logistic regression). In softmax, we compute the probability that a particular sample (with net input z) belongs to the *i*th class using a normalization term in the denominator that is the sum of all *M* linear functions:
+
+![](./softmax/softmax_1.png)
+
+In contrast, the logistic function:
+
+![](./softmax/logistic.png)
+
+And for completeness, we define the net input as
+ 
+![](./softmax/net_input.png)
+
+where the weight coefficients of your model are stored as vector "w" and "x" is the feature vector of  your sample.  
diff --git a/faq/softmax/logistic.png b/faq/softmax/logistic.png
new file mode 100644
index 00000000..de9b14cc
Binary files /dev/null and b/faq/softmax/logistic.png differ
diff --git a/faq/softmax/net_input.png b/faq/softmax/net_input.png
new file mode 100644
index 00000000..d9af13d1
Binary files /dev/null and b/faq/softmax/net_input.png differ
diff --git a/faq/softmax/softmax_1.png b/faq/softmax/softmax_1.png
new file mode 100644
index 00000000..edff52e6
Binary files /dev/null and b/faq/softmax/softmax_1.png differ