From af865429e448f261c5f860811e13dfe031e8d7ee Mon Sep 17 00:00:00 2001 From: Tigran Hakobyan Date: Fri, 26 May 2023 12:43:55 -0400 Subject: [PATCH] Update basic-ml-review.md A minor fix. --- basic-ml-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/basic-ml-review.md b/basic-ml-review.md index 019ea1e..fdb8a2e 100644 --- a/basic-ml-review.md +++ b/basic-ml-review.md @@ -116,7 +116,7 @@ While developing your model, if time permits, you should experiment with differe ### Learning Procedure -Learning procedures the procedures that help your model find the set of parameters that minimize a given objective function for a given set of data, are diverse[^3]. In some cases, the parameters might be calculated exactly. For example, in the case of linear functions, the values of w and b can be calculated from the averages and variances of x and y. In most cases, however, the values of parameters can’t be calculated exactly and have to be approximated, usually via an iterative procedure. For example, K-means clustering uses an iterative procedure called expectation–maximization algorithm. +Learning procedures are procedures that help your model find the set of parameters that minimize a given objective function for a given set of data, are diverse[^3]. In some cases, the parameters might be calculated exactly. For example, in the case of linear functions, the values of w and b can be calculated from the averages and variances of x and y. In most cases, however, the values of parameters can’t be calculated exactly and have to be approximated, usually via an iterative procedure. For example, K-means clustering uses an iterative procedure called expectation–maximization algorithm. The most popular family of iterative procedures today is undoubtedly **gradient descent**. The loss of a model at a given train step is given by the objective function. The gradient of an objective function with respect to a parameter tells us how much that parameter contributes to the loss. In other words, the gradient is the direction that lowers the loss from a current value the most. The idea is to subject that gradient value from that parameter, hoping that this would make the parameter contribute less to the loss, and eventually drive the loss down to 0.