random_variables_20191113_152_chang.Rmd

---
title: "R Notebook"
output: html_notebook
---

## 14.7 Central Limit Theorem

The Central Limit Theorem (CLT) tells us that when the number of draws, also called the sample size, is large, the probability distribution of the sum of the independent draws is approximately normal.

If we know that the distribution of a list of numbers is approximated by the normal distribution, all we need to describe the list are the average and standard deviation.
We also know that the same applies to probability distributions. If a random variable has a probability distribution that is approximated with the normal distribution, then all we need to describe the probability distribution are the average and standard deviation, referred to as the expected value and standard error.

We previously ran this Monte Carlo simulation:
```{r}
n<-1000
B<-10000
roulette_winnings<-function(n){
  X<-sample(c(-1,1), n, replace=TRUE, prob=c(9/19,10/19))
  sum(X)
}
S<-replicate(B,roulette_winnings(n))
```

The Central Limit Theorem (CLT) tells us that the sum S is approximated by a normal distribution. We know that the expected value and standard error are:
```{r}
n*(20-18)/38
sqrt(n)*2*sqrt(90)/19
```
The theoretical values above match those obtained with the Monte Carlo simulation:
```{r}
mean(S)
sd(S)
```
Using the CLT, we can skip the Monte Carlo simulation and instead compute the probability of the casino losing money using this approximation:
```{r}
mu<-n*(20-18)/38
se<-sqrt(n)*2*sqrt(90)/19
pnorm(0,mu,se)
```
which is also in very good agreement with our Monte Carlo result:
```{r}
mean(S<0)
```

### 14.7.1 How large is larfe in the Central Limit Theorem?

The CLT works when the number of draws is large. But large is a relative term. In many circumstances as few as __30__ draws is enough to make the CLT useful. In some specific instances, as few as 10 is enough. However, these should not be considered general rules. Note, for example, that when the probability of success is very small, we need much larger sample sizes.

By way of illustration, let’s consider the lottery. In the lottery, the chances of winning are less than 1 in a million. Thousands of people play so the number of draws is very large. Yet the number of winners, the sum of the draws, range between 0 and 4. __This sum is certainly not well approximated by a normal distribution, so the CLT does not apply, even with the very large sample size__. This is generally true when the probability of a success is very low. In these cases, the Poisson distribution is more appropriate.

You can examine the properties of the Poisson distribution using `dpois` and `ppois`. You can generate random variables following this distribution with `rpois`. 

## 14.8 Statistical properties of averages

There are several useful mathematical results that we used above and often employ when working with data. We list them below.

1. The expected value of the sum of random variables is the sum of each random variable’s expected value. We can write it like this:
$$
E[X_1+X_2+ \dots+X_n]=E[X_1]+E[X_2]+\dots+
E[X_n]
$$
If the $X$ are independent draws from the urn, then they all have the same expected value. Let’s call it $μ$ and thus:
$$
E[X_1+X_2+ \dots+X_n]=n\mu
$$
which is another way of writing the result we show above for the sum of draws.

2. The expected value of a non-random constant times a random variable is the non-random constant times the expected value of a random variable. This is easier to explain with symbols:
$$
E[aX]=a\times E[X]
$$
To see why this is intuitive, consider change of units. If we change the units of a random variable, say from dollars to cents, the expectation should change in the same way. A consequence of the above two facts is that the expected value of the average of independent draws from the same urn is the expected value of the urn, call it $μ$ again:
$$
E[(X_1+X_2+ \dots+X_n)/n]=E[X_1+X_2+ \dots+X_n]/n=n\mu /n=\mu
$$

3. The square of the standard error of the sum of independent random variables is the sum of the square of the standard error of each random variable. This one is easier to understand in math form:
$$
SE[X_1+X_2+\dots+X_n] = \sqrt{SE[X_1]^2 + SE[X_2]^2+\dots+SE[X_n]^2}
$$
The square of the standard error is referred to as the variance in statistical textbooks. Note that this particular property is not as intuitive as the previous three and more in depth explanations can be found in statistics textbooks.

4. The standard error of a non-random constant times a random variable is the non-random constant times the random variable’s standard error. As with the expectation:
$$
SE[aX]=a\times SE[X]
$$
To see why this is intuitive, again think of units.

A consequence of 3 and 4 is that the standard error of the average of independent draws from the same urn is the standard deviation of the urn divided by the square root of $n$ (the number of draws), call it $σ$:
$$
\begin{aligned}
SE[(X_1+X_2+\dots+X_n) / n] &=SE[X_1+X_2+\dots+X_n]/n \\
&= \sqrt{SE[X_1]^2+SE[X_2]^2+\dots+SE[X_n]^2}/n \\
&= \sqrt{\sigma^2+\sigma^2+\dots+\sigma^2}/n\\
&= \sqrt{n\sigma^2}/n\\
&= \sigma / \sqrt{n}    
\end{aligned}
$$

5. If $X$ is a normally distributed random variable, then if $a$ and $b$ are non-random constants, $aX+b$ is also a normally distributed random variable. All we are doing is changing the units of the random variable by multiplying by $a$, then shifting the center by $b$.

μ: expected value
σ: standard error
μ is the Greek letter for m, the first letter of mean, which is another term used for expected value. Similarly, σ is the Greek letter for s, the first letter of standard error.

## 14.9 Law of large numbers

An important implication of the final result is that the standard error of the average becomes smaller and smaller as n grows larger. When n is very large, then the standard error is practically 0 and the average of the draws converges to the average of the urn. This is known in statistical textbooks as the law of large numbers or the law of averages.

### 14.9.1 Misinterpreting law of averages
The law of averages is sometimes misinterpreted. For example, if you toss a coin 5 times and see a head each time, you might hear someone argue that the next toss is probably a tail because of the law of averages: on average we should see 50% heads and 50% tails. A similar argument would be to say that red “is due” on the roulette wheel after seeing black come up five times in a row. These events are independent so the chance of a coin landing heads is 50% regardless of the previous 5. This is also the case for the roulette outcome. The law of averages applies only when the number of draws is very large and not in small samples. After a million tosses, you will definitely see about 50% heads regardless of the outcome of the first five tosses.

Another funny misuse of the law of averages is in sports when TV sportscasters predict a player is about to succeed because they have failed a few times in a row.

## 14.10 Exercises

1. In American Roulette you can also bet on green. There are 18 reds, 18 blacks and 2 greens (0 and 00). What are the chances the green comes out?
__2/38__

2. The payout for winning on green is $17 dollars. This means that if you bet a dollar and it lands on green, you get $17. Create a sampling model using sample to simulate the random variable $X$ for your winnings.
```{r}
x<-sample(c(17,-1), 1, prob=c(2/38, 36/38))
```

3. Compute the expected value of $X$.
```{r}
2/38 * 17 + 36/38 * -1
```

4. Compute the standard error of X.
```{r}
abs((17 - -1))*sqrt(2/38*36/38)
```

5. Now create a random variable S that is the sum of your winnings after betting on green 1000 times. Hint: change the argument size and replace in your answer to question 2. Start your code by setting the seed to 1 with set.seed(1).
```{r}
set.seed(1)
G <- sample(c(17,-1), 1000, replace=TRUE, prob = c(2/38, 36/38))
S <- sum(G)
S
```

6. What is the expected value of S?
```{r}
(17* 2/38 -1*36/38)*1000
```

7. What is the standard error of S?
```{r}
sqrt(1000)*abs((17-(-1)))*sqrt(2/38*36/38)
```

8. What is the probability that you end up winning money? Hint: use the CLT.
```{r}
m<-(17* 2/38 -1*36/38)*1000
se<-sqrt(1000)*abs((17-(-1)))*sqrt(2/38*36/38)
1-pnorm(0,m,se)
```

9. Create a Monte Carlo simulation that generates 1,000 outcomes of S. Compute the average and standard deviation of the resulting list to confirm the results of 6 and 7. Start your code by setting the seed to 1 with set.seed(1).
```{r}
set.seed(1)
S <- replicate(B,{
  X <- sample(c(17,-1), B, replace=TRUE, prob = c(2/38, 36/38))
  sum(X)
})
mean(S)
sd(S)
```

10. Now check your answer to 8 using the Monte Carlo result.
```{r}
mean(S>0)
```

11. The Monte Carlo result and the CLT approximation are close, but not that close. What could account for this?

__a. 1,000 simulations is not enough. If we do more, they match.__
b. The CLT does not work as well when the probability of success is small. In this case, it was 1/19. If we make the number of roulette plays bigger, they will match better.
c.The difference is within rounding error.
d. The CLT only works for averages.

12. Now create a random variable Y that is your average winnings per bet after playing off your winnings after betting on green 1,000 times.
```{r}
X <- sample(c(17,-1), 1000, replace=TRUE, prob = c(2/38, 36/38))
Y <- mean(X)
```

13. What is the expected value of Y?
```{r}
(17* 2/38 -1*36/38)*1000
```

14. What is the standard error of Y?
```{r}
sqrt(1000)*abs((17-(-1)))*sqrt(2/38*36/38)
```

15. What is the probability that you end up with winnings per game that are positive? Hint: use the CLT.
```{r}
m<-(17* 2/38 -1*36/38)*1000
se<-sqrt(1000)*abs((17-(-1)))*sqrt(2/38*36/38)
1-pnorm(0,m,se)
```


16. Create a Monte Carlo simulation that generates 2,500 outcomes of Y. Compute the average and standard deviation of the resulting list to confirm the results of 6 and 7. Start your code by setting the seed to 1 with set.seed(1).
```{r}
set.seed(1)
S <- replicate(2500,{
  Y <- sample(c(17,-1), 1000, replace=TRUE, prob = c(2/38, 36/38))
  mean(Y)
})
mean(S)
sd(S)
```


17. Now check your answer to 8 using the Monte Carlo result.
```{r}
mean(S>0)
```


18. The Monte Carlo result and the CLT approximation are now much closer. What could account for this?

a. We are now computing averages instead of sums.
b. 2,500 Monte Carlo simulations is not better than 1,000.
__c. The CLT works better when the sample size is larger. We increased from 1,000 to 2,500.__
d. It is not closer. The difference is within rounding error.

## 14.11 Case study: The Big Short

### 14.11.1 Interest rates explained with chance model

More complex versions of the sampling models we have discussed are also used by banks to decide interest rates. Suppose you run a small bank that has a history of identifying potential homeowners that can be trusted to make payments. In fact, historically, in a given year, only 2% of your customers default, meaning that they don’t pay back the money that you lent them. However, you are aware that if you simply loan money to everybody without interest, you will end up losing money due to this 2%. Although you know 2% of your clients will probably default, you don’t know which ones. Yet by charging everybody just a bit extra in interest, you can make up the losses incurred due to that 2% and also cover your operating costs. You can also make a profit, but if you set the interest rates too high, your clients will go to another bank. We use all these facts and some probability theory to decide what interest rate you should charge.

Suppose your bank will give out 1,000 loans for 180,000 dollars this year. Also, after adding up all costs, suppose your bank loses $200,000 per foreclosure. For simplicity, we assume this includes all operational costs. A sampling model for this scenario can be coded like this:
```{r}
n <- 1000
loss_per_foreclosure <- -200000
p <- 0.02 
defaults <- sample( c(0,1), n, prob=c(1-p, p), replace = TRUE)
sum(defaults * loss_per_foreclosure)
```

Note that the total loss defined by the final sum is a random variable. Every time you run the above code, you get a different answer. We can easily construct a Monte Carlo simulation to get an idea of the distribution of this random variable.
```{r}
B <- 10000
losses <- replicate(B, {
    defaults <- sample( c(0,1), n, prob=c(1-p, p), replace = TRUE) 
  sum(defaults * loss_per_foreclosure)
})
```

We don’t really need a Monte Carlo simulation though. Using what we have learned, the CLT tells us that because our losses are a sum of independent draws, its distribution is approximately normal with expected value and standard errors given by:
```{r}
n*(p*loss_per_foreclosure + (1-p)*0)
sqrt(n)*abs(loss_per_foreclosure)*sqrt(p*(1-p))
```

We can now set an interest rate to guarantee that, on average, we break even. Basically, we need to add a quantity $x$ to each loan, which in this case are represented by draws, so that the expected value is 0. If we define $l$ to be the loss per foreclosure, we need:
$$
lp  + x(1-p) = 0
$$

which implies $x$ is
```{r}
- loss_per_foreclosure*p/(1-p)
```
or an interest rate of 0.023.

However, we still have a problem. Although this interest rate guarantees that on average we break even, there is a 50% chance that we lose money. If our bank loses money, we have to close it down. We therefore need to pick an interest rate that makes it unlikely for this to happen. At the same time, if the interest rate is too high, our clients will go to another bank so we must be willing to take some risks. So let’s say that we want our chances of losing money to be 1 in 100, what does the $x$ quantity need to be now? This one is a bit harder. We want the sum $S$ to have:
$$
Pr(S < 0) = 0.01
$$

We know that $S$ is approximately normal. The expected value of $S$ is
$$
\mbox{E}[S] = \{ lp + x(1-p)\}n
$$
with $n$ the number of draws, which in this case represents loans. The standard error is
$$
\mbox{SD}[S] = |x-l| \sqrt{np(1-p)}.
$$
Because $x$ is positive and $l$ negative $|x-l|=x-l$.  Note that these are just an application of the formulas shown earlier, but using more compact symbols.

Now we are going to use a mathematical “trick” that is very common in statistics. We add and subtract the same quantities to both sides of the event $S < 0$ so that the probability does not change and we end up with a standard normal random variable on the left, which will then permit us to write down an equation with only $x$ as an unknown. This “trick” is as follows:

If $Pr(S < 0) = 0.01$ then
$$
\mbox{Pr}\left(\frac{S - \mbox{E}[S]}{\mbox{SE}[S]} < \frac{ - \mbox{E}[S]}{\mbox{SE}[S]}\right)
$$

And remember $E[S]$ and $SE[S]$ are the expected value and standard error of $S$, respectively. All we did above was add and divide by the same quantity on both sides. We did this because now the term on the left is a standard normal random variable, which we will rename $Z$. Now we fill in the blanks with the actual formula for expected value and standard error:
$$
\mbox{Pr}\left(Z <  \frac{- \{ lp + x(1-p)\}n}{(x-l) \sqrt{np(1-p)}}\right) = 0.01
$$

It may look complicated, but remember that $l$, $p$ and $n$ are all known amounts, so eventually we will replace them with numbers.

Now because the Z is a normal random with expected value 0 and standard error 1, it means that the quantity on the right side of the < sign must be equal to:
```{r}
qnorm(0.01)
```

for the equation to hold true. Remember that $z$=`qnorm(0.01)` gives us the value of $z$ for which:
$$
Pr(Z \leq z) = 0.01
$$
So this means that the right side of the complicated equation must be $z$ = `qnorm(0.01)`.
$$
\frac{- \{ lp + x(1-p)\}n} {(x-l) \sqrt{n p (1-p)}} = z
$$
The trick works because we end up with an expression containing $x$ that we know has to be equal to a known quantity $z$. Solving for $x$ is now simply algebra:
$$
x = - l \frac{ np  - z \sqrt{np(1-p)}}{n(1-p) + z \sqrt{np(1-p)}}
$$
which is:
```{r}
l <- loss_per_foreclosure
z <- qnorm(0.01)
x <- -l*( n*p - z*sqrt(n*p*(1-p)))/ ( n*(1-p) + z*sqrt(n*p*(1-p)))
x
```

Our interest rate now goes up to 0.035. This is still a very competitive interest rate. By choosing this interest rate, we now have an expected profit per loan of:
```{r}
loss_per_foreclosure*p + x*(1-p)
```

which is a total expected profit of about:
```{r}
n*(loss_per_foreclosure*p + x*(1-p)) 
```
dollars!

We can run a Monte Carlo simulation to double check our theoretical approximations:
```{r}
B <- 100000
profit <- replicate(B, {
    draws <- sample( c(x, loss_per_foreclosure), n, 
                        prob=c(1-p, p), replace = TRUE) 
    sum(draws)
})
mean(profit)
mean(profit<0)
```

### 14.11.2 The Big Short

One of your employees points out that since the bank is making 2,124 dollars per loan, the bank should give out more loans! Why just $n$? You explain that finding those $n$ clients was hard. You need a group that is predictable and that keeps the chances of defaults low. He then points out that even if the probability of default is higher, as long as our expected value is positive, you can minimize your chances of losses by increasing $n$ and relying on the law of large numbers.

He claims that even if the default rate is twice as high, say 4%, if we set the rate just a bit higher than this value:
```{r}
p <- 0.04
r <- (- loss_per_foreclosure*p/(1-p)) / 180000
r
```

we will profit. At 5%, we are guaranteed a positive expected value of:
```{r}
r <- 0.05
x <- r*180000
loss_per_foreclosure*p + x * (1-p)
```

and can minimize our chances of losing money by simply increasing $n$ since:
$$
\mbox{Pr}(S < 0) = 
\mbox{Pr}\left(Z < - \frac{\mbox{E}[S]}{\mbox{SE}[S]}\right)
$$

with $Z$ a standard normal random variable as shown earlier. If we define $\mu$ and $\sigma$ to be the expected value and standard deviation of the urn, respectively (that is of a single loan), using the formulas above we have: $E[S] = n\mu$ and $SE[S] = \sqrt{n}\sigma$. So if we define $z$=`qnorm(0.01)`, we have:
$$
- \frac{n\mu}{\sqrt{n}\sigma} = - \frac{\sqrt{n}\mu}{\sigma} = z
$$

which implies that if we let:
$$
n \geq z^2 \sigma^2 / \mu^2
$$

we are guaranteed to have a probability of less than 0.01. The implication is that, as long as $\mu$ is positive, we can find an $n$ that minimizes the probability of a loss. This is a form of the law of large numbers: when $n$ is large, our average earnings per loan converges to the expected earning $\mu$.

With $x$ fixed, now we can ask what $n$ do we need for the probability to be 0.01? In our example, if we give out:
```{r}
z <- qnorm(0.01)
n <- ceiling((z^2*(x-l)^2*p*(1-p))/(l*p + x*(1-p))^2)
n
```
loans, the probability of losing is about 0.01 and we are expected to earn a total of
```{r}
n*(loss_per_foreclosure*p + x * (1-p))
```
dollars! We can confirm this with a Monte Carlo simulation:
```{r}
p <- 0.04
x <- 0.05*180000
profit <- replicate(B, {
    draws <- sample( c(x, loss_per_foreclosure), n, 
                        prob=c(1-p, p), replace = TRUE) 
    sum(draws)
})
mean(profit)
```
This seems like a no brainer. As a result, your colleague decides to leave your bank and start his own high-risk mortgage company. A few months later, your colleague’s bank has gone bankrupt. A book is written and eventually a movie is made relating the mistake your friend, and many others, made. What happened?

Your colleague’s scheme was mainly based on this mathematical formula:
$$
\mbox{SE}[(X_1+X_2+\dots+X_n) / n] = \sigma / \sqrt{n}
$$

By making $n$ large, we minimize the standard error of our per-loan profit. However, for this rule to hold, the $X$s must be independent draws: one person defaulting must be independent of others defaulting. Note that in the case of averaging the __same__ event over and over, an extreme example of events that are not independent, we get a standard error that is $\sqrt{n}$ times bigger:
$$
\mbox{SE}[(X_1+X_1+\dots+X_1) / n] =  \mbox{SE}[n X_1  / n] = \sigma > \sigma / \sqrt{n}
$$

To construct a more realistic simulation than the original one your colleague ran, let’s assume there is a global event that affects everybody with high-risk mortgages and changes their probability. We will assume that with 50-50 chance, all the probabilities go up or down slightly to somewhere between 0.03 and 0.05. But it happens to everybody at once, not just one person. These draws are no longer independent.
```{r}
p <- 0.04
x <- 0.05*180000
profit <- replicate(B, {
    new_p <- 0.04 + sample(seq(-0.01, 0.01, length = 100), 1)
    draws <- sample( c(x, loss_per_foreclosure), n, 
                        prob=c(1-new_p, new_p), replace = TRUE) 
    sum(draws)
})
```
Note that our expected profit is still large:
```{r}
mean(profit)
```
However, the probability of the bank having negative earnings shoots up to:
```{r}
mean(profit<0)
```
Even scarier is that the probability of losing more than 10 million dollars is:
```{r}
mean(profit < -10000000)
```
To understand how this happens look at the distribution:
```{r}
library(tidyverse)
data.frame(profit_in_millions=profit/10^6) %>% 
  ggplot(aes(profit_in_millions)) + 
  geom_histogram(color="black", binwidth = 5)
```
The theory completely breaks down and the random variable has much more variability than expected. The financial meltdown of 2007 was due, among other things, to financial “experts” assuming independence when there was none.

## 14.12 Exercises

1. Create a random variable $S$ with the earnings of your bank if you give out 10,000 loans, the defalut rate is 0.3, and you lose $200,000 in each foreclosure. Hint: use the code we showed in the previous section, but change the parameters.
```{r}
n <- 10000
loss_per_foreclosure <- -200000
rate <- 0.3
loans <- sample(c(0,1), n, replace = TRUE, prob=c(1-rate, rate))
S <- sum(loans * loss_per_foreclosure)
S
```

2. Run a Monte Carlo simulation with 10,000 outcomes for $S$. Make a histogram of the results.
```{r}
S <- replicate(n, {
  loans <- sample(c(0,1), n, replace = TRUE, prob=c(1-rate, rate))
  sum(loans * loss_per_foreclosure)
})
hist(S)
```

3. What is the expected value of $S$?
```{r}
n*(rate*loss_per_foreclosure + (1-rate)*0)
```

4. What is the standard error of $S$?
```{r}
sqrt(n) * abs(loss_per_foreclosure) * sqrt(rate*(1-rate))
```

5. Suppose we give out loans for $180,000. What should the interest rate be so that our expected value is 0?
```{r}
x <- -(loss_per_foreclosure*rate) / (1 - rate)
x / 180000
```

6. (Harder) What should the interest rate be so that the chance of losing money is 1 in 20? In math notation, what should the interest rate be so that $Pr(S < 0) = 0.05$?
```{r}
z <- qnorm(0.05)
x <- -loss_per_foreclosure*( n*rate - z*sqrt(n*rate*(1-rate)))/ ( n*(1-rate) + z*sqrt(n*rate*(1 -rate)))
x / 180000
```

7. If the bank wants to minimize the probabilities of losing money, which of the following does not make interest rates go up?

a. A smaller pool of loans.  
b. A larger probability of default.  
c. A smaller required probability of losing money.  
__d. The number of Monte Carlo simulations.__