The average rate of change for
Numerically, the slope of a secant line equals rise over run:
We can find a full formula of a secant line using a line formula
In the example above, we calculated the average rate of change on the interval
So, the average rate of change of the distance I walked on the interval from the 5th hour of my trip until the 7th hour
is approximately $0.145. What's the average rate of change of the distance? It's speed (or velocity in this case,
to be precise). This means that my average velocity at this interval was approximately
The idea of the instant rate of change (which is also called a derivative) is similar to the average rate of change, but the run is approaching zero.
If we set
To get the formula of the instant rate of change (derivative),
It doesn't matter if we approach the point from the right or the left side (assuming both limits exist) we get the same result:
Conceptually, the idea of an instant rate of change doesn't make any sense, cause no change happens instantly. But if we look at a graph of a function, we can clearly tell what the function is doing (like increasing or decreasing) at any given point where it's defined.
Graphically, the instant rate of change for
Let's again give these values some real-world meaning. If this graphic represents a distance from me to my home over time when I went for a walk (just like in the average rate of change example), the instant rate of change actually represents my real velocity at any given moment of time.
The approach above is usually used in computers to calculate the derivative, but it only gives an approximated
result. If we directly plug in
To calculate the exact value, we need to use some algebra. Let's take a simple function
Now, if we plug in
This is exactly the derivative that we get for
Let's manually calculate the exact derivative for the example function of the previous section using the power rule of computation. The function I used in that example is (I know it's not pretty):
The derivative of this function is:
Now, if we plug
Using
While manually calculating a derivative of a function, we clearly saw that the derivative itself is also a function. It means that we can take a derivative of a derivative. The derivative of a derivative is called a second-order derivative, but we can repeat this process as many times as we need to get any higher-order derivatives.
Let's give these values some real-world meaning again. We already know that the first derivative of a distance function represents velocity. But what is the instant change of velocity (the second derivative)? It's acceleration.
It's usually hard to find meaning for even higher order derivatives, but they are still often used, for example, in statistics or machine learning.
If you look closely at the graph of the derivatives you'll see that when the derivative of a function crosses the
It totally makes sense if you go back to the slope definition of the derivative. When the sign of a slope changes, the function changes the direction of movement and the point where this change happens usually represents either a local maximum or a local minimum value. The same works for higher derivatives.
For multivariable functions, for example
To calculate a partial derivative, we calculate a derivative with respect to one of the variables, with the other ones held constant.
Let's calculate a partial derivative of this function with respect to the variable
If we choose any point of this function, the value of the partial derivative with respect to
Graphically, we can represent a partial derivative of a multivariable function in 3D exactly the same way as we do with
a single variable function in 2D. The value of a partial derivative of
Partial derivatives are not limited to the 2-variable functions, but it's hard to visualize any space with more than 3 dimensions.
Using partial derivatives, we can calculate how an increase of each variable of a multivariable function increases the function at any given point. If we put the values of partial derivatives at any given point in a vector, it will point to the direction of the fastest increase of the function at this point. This vector is called the gradient vector.
The Gradient is denoted by the nabla symbol
Graphically it looks like this (note that the gradient vector is actually a 2D vector in
Again, mathematically we can work with as many dimensions as we need, so the generic notation looks like this:
It might be a little confusing why putting these values in a vector gives us the direction of the fastest increase of the function, but it's actually pretty simple. It's logical that we should nudge the variable that gives the function a bigger increase more than the other variables. But how more? This is exactly what partial derivatives tell us. Increasing variables proportionally to their derivatives gives us the best increase of the function overall. And this is exactly what gradient vector does.
Pay attention that since it's the calculus world the "increase" is actually very small (ideally approaches 0). To get the shortest path to the local maximum of a function, we need to recalculate the gradient vector after every increase. The smaller steps we take the shorter path we get.
We already know the derivative of a distance function gives us velocity. But we can also go backwards and find the distance function from the velocity one. The process of finding an antiderivative function is called integration.
The integral notation is written like this:
We know that the derivative of a constant equals
Let's find an antiderivative of a simple function using the power rule:
Notice, that
Let's again assume that the function represents velocity. If the velocity is constant on the interval its graphic looks like a line. We know from physics that distance is velocity multiplied by time. If we draw a rectangle under this graph, we see that one side represents velocity and another represents time. This means that we calculate the distance traveled exactly the same way we calculate the area of this rectangle. If the velocity is not constant, we can still represent it with rectangles. We just need more of them. Then, we calculate the area of each rectangle and sum the results. The end result will still show us the distance traveled, just with a rounding error. The more rectangles we use, the better the result we get.
Another logical way to calculate the distance traveled is to subtract the start point from the destination. The distance function is an antiderivative of the velocity function, so if we know the antiderivative, we know the values of both points and can easily calculate the distance. The Fundamental Theorem of Calculus links the concept of calculating the area under the function to the function's antiderivative.
The definite integral can be used to calculate the area under the graph on some interval
Notice that we don't care about what antiderivative we choose, because the constants cancel each other out.
Function Approximation is a technique for selecting a function that closely matches a target function (known or unknown and underlying).
There are 2 major classes of function approximation problems:
- Approximating a known function. It's useful when the original function is too complicated for calculations, and we select an approximate function which behaves very similarly for the particular problem, but works better for calculations.
- Approximating an unknown underlying function by its data points. In many cases, to solve problems we work with functions instead of data points, so we first need to find one.
For most common functions, the infinite sum of terms that are expressed in terms of the function's derivatives at a single point equals the function near this point. To say it simpler, if we choose a polynomial where all its derivatives equal the derivatives of a function at a given point, this polynomial will behave very closely to the function itself. This polynomial is called Taylor Series.
Mathematically it looks like this:
or:
It looks complicated, but if we need to approximate a function around
Let's approximate
The derivative of
This is already a pretty good approximation for many use cases, but we can do even better. The third derivative of
This is a very good approximation and we can stop here. Here's how it looks graphically: