Exercise_02_Distributions.Rmd

---
title: 'Lab 2: Probability distributions'
author: "GE509"
output: html_document
---

## Objectives

* Familiarize yourself with common probability distributions and R's functions for calling them
* Understand how different probability distributions are related to one another
* Understand how discrete distributions can be approximated by continuous distributions

## Git updating:

* If you originally pulled directly from the original repository
    + In RStudio's "Git" tab, click on the down arrow
    + OR at the command line `git pull`
* If you FORKED the original repo to your Github account
    + At the command line ```git pull git@github.com:mdietze/EE509.git```
    + OR ```git pull https://github.com/mdietze/EE509.git```

## Probability distributions in R

Because it is a statistical language, there are a large number of probability distributions in R by default and an even larger number that can be loaded from packages. The table below gives a listing of the most common distributions in R, the name of the function within R, and the parameters of the distribution

 Common distributions
---------------------- 

Distribution | R name | Parameters
------------ | ------ | ----------
beta         | beta   | shape1, shape2, ncp 
Binomial     | binom  | size, prob
Cauchy       | cauchy | location, scale 
chi-squared  | chisq  | df, ncp
exponential  | exp    | rate
F            | f      | df1, df2, ncp
gamma        | gamma  | shape, scale 
geometric    | geom   | prob
hypergeometric | hyper | m, n, k 
log-normal   | lnorm  | meanlog, sdlog 
logistic     | logis  | location, scale 
Negative binomial | nbinom | size, prob
normal       | norm   | mean, sd
Poisson      | pois   | lambda
Student's t  | t      | df, ncp
uniform      | unif   | min, max 
Weibull      | weibull | shape, scale 
Wilcoxon     | wilcox | m, n

R actually provides four related functions for each probability distribution. These functions are called by adding a letter at the beginning of the function name. The variants of each probability distribution are:

* "d" = density: probability density function (PDF)
* "p" = cumulative distribution function (CDF)
* "q" = quantile: calculates the value associated with a specified tail probability, inverse of "p"
* “r” = random: simulates random numbers

The first argument to these functions is the same regardless of the distribution and is the value *x* for "d", the quantile *q* for "p", the probability *p* for "q", and the sample size *n* for "r"

All of this will make more sense once we consider a concrete example. Let's take a look at the normal probability density function first, since it's the one you're most familiar with. If you use **?dnorm** you'll see that for many of the function arguments there are default values, specifically mean=0 and sd=1. Therefore if these values are not specified explicitly in the function call R assumes you want a *standard Normal* distribution.

```{r}
x = seq(-5,5,by=0.1)
plot(x,dnorm(x),type='l')  		## that’s a lowercase “L” for “line”
abline(v=0)					          ## add a line to indicate the mean (“v” is for “vertical”)
lines(x,dnorm(x,2),col=2)			## try changing the mean (“col” sets the color)
abline(v=2,col=2)
lines(x,dnorm(x,-1,2),col=3)	## try changing the mean and standard dev
abline(v=-1,col=3)
```

The above plot can also be produced using R's `curve` function

```{r}
curve(dnorm,-5,5)
abline(v=0)
curve(dnorm(x,2),-5,5,add=TRUE,col=2)
abline(v=2,col=2)
curve(dnorm(x,-1,2),-5,5,add=TRUE,col=3)	## try changing the mean and standard dev
abline(v=-1,col=3)
```

You are welcome to use either approach in the following examples.

You will use density functions quite frequently to estimate the **likelihood** of data.  For example if we collected a data point of x = 0.5 we can calculate the likelihood that this data point came from each of these three normal distributions.  Implicit in the likelihood is that we're looking at the probability of the data in a very small window, $\delta x$, and that $\delta x$ never changes so that:

$$Pr(x < X < x + \delta x) = \int_{x}^{x + \delta x} f(x)dx \propto f(x)$$

```{r}
dnorm(0.5,0,1)
dnorm(0.5,2,1)
dnorm(0.5,-1,2)
```

This shows that the first distribution has a higher likelihood than the other two, which are about the same.  We interpret this as saying that the observed data point was more likely to have been generated by the first distribution than the other two.  This is consistent with where the dashed blue line intersects each of the curves.  

This plot of the normal distribution and the effects of varying the parameters in the normal are both probably familiar to you already – changing the mean changes where the distribution is centered while changing the standard deviation changes the spread of the distribution.  Next try looking at the CDF of the normal:

```{r}
plot(x,pnorm(x,0,1),type='l')
abline(v=0)
lines(x,pnorm(x,2,1),col=2)
abline(v=2,col=2)
lines(x,pnorm(x,-1,2),col=3)
abline(v=-1,col=3)
```

Using the CDF we can calculate $Pr(X \leq x)$ for our data point

```{r}
pnorm(0.5,0,1)
pnorm(0.5,2,1)
pnorm(0.5,-1,2)
```

If the value 0.5 here corresponded to some hypothesis, the **CDF could be use to calculate a p-value associated with the one-sided test**. Would any be significant at $\alpha$=0.05 significance?

Next let's look at the function qnorm.  Since the input to this function is a quantile, the x-values for the plot are restricted to the range [0,1].

```{r}
p = seq(0,1,by=0.01)
plot(p,qnorm(p,0,1),type='l',ylim=range(x))    # ylim sets the y-axis range
# range returns the min/max as a 2-element vector
abline(h=0)						# “h” for “horizontal”
lines(p,qnorm(p,2,1),col=2)
abline(h=2,col=2)
lines(p,qnorm(p,-1,2),col=3)
abline(h=-1,col=3)
```

It should be readily apparent that the quantile function is the inverse of the CDF.  This function can be used to find the median of the distribution (p = 0.5) or to **estimate confidence intervals** at any level desired.

```{r}
qnorm(c(0.025,0.975),0,1)		# what width CI is specified by these values?
plot(p,qnorm(p,0,1),type='l',ylim=range(x))
abline(v=c(0.025,0.975),lty=2)	# add vertical lines at the CI
abline(h=qnorm(c(0.025,0.975)),lty=2)	#add horizontal lines at the threshold vals
plot(x,dnorm(x,0,1),type='l')		# plot the corresponding pdf
abline(v=qnorm(c(0.025,0.975)),lty=2)
```

Finally, let's investigate the rnorm function for generating random numbers that have a normal distribution.  Here we generate histograms that have a progressively larger sample size and compare that to the actual density of the standard normal.  

```{r}
n = c(10,100,1000,10000)	# sequence of sample sizes
for(i in 1:4){			# loop over these sample sizes
  hist(rnorm(n[i]),main=n[i],probability=TRUE,breaks=40)  
				#here breaks defines number of bins in the histogram
  lines(x,dnorm(x),col=2)
}
```

To make these plots we introduced the use of a ```for``` loop.  A for loop iterates over all specified values of some index. Above, we used ```i``` as our index, and specified values 1, 2, 3, and 4 with the 1:4 syntax from last week. `for` loops always start this way (including parentheses): ```for(index in vector)```. The code in the { } then executes once for every value in the vector, with `i` taking on the next value in the list.  The ```for``` syntax in R is very flexible and will allow you to loop over a vector of anything, not just integers...characters, factors, filenames, etc. Also note that your index variable can be anything, but to avoid unwanted results make sure it’s not the same as a variable you’re using elsewhere in the code (e.g. indexing by `x` above would ruin everything, because `x` would be overwritten on every iteration). 

One other technical note: like any function in R that generates random output, this example will give different results every time you run it. 

This example demonstrates that **as the number of random draws from a probability distribution increases, the histogram of those draws provides a better and better approximation of the density itself**.  We will make use of this fact extensively this semester because – as odd as this may sound now – there are many distributions that are easier to randomly sample from than solve for analytically.  We can also show that as the sample size increases, the sample mean and standard deviation get closer to the true population values (mean = 0, sd = 1 in this example) and the standard error gets smaller:

```{r}
y = rnorm(10)  	## Sample size = 10
y
mean(y)
sd(y)			## standard deviation
sd(y)/sqrt(10) 		## standard error
y = rnorm(10000) 	## Sample size = 10000
mean(y)
sd(y)			## standard deviation
sd(y)/sqrt(10000) 	## standard error
```

We are next going to investigate several other common probability distributions.  There will be a few examples of the effects of varying parameters for each distribution, but you are strongly recommended to explore and try other values to see how they affect the shape of each PDF.  You might also want to refer to Appendix F in the textbook or Chapter 3 of Hilborne and Mangel in order to learn more about each distribution.  There is also a good chart at http://www.johndcook.com/distribution_chart.html that describes the relationships among these common distributions, and the Wikipedia articles for most of them are good for quick reference. 

There's a good bit of code this week. We recommend running each clump of code by itself, and especially that you **make sure you understand what it did before proceeding to the next box**.

### LAB REPORT

For each of the distributions below, please include the multi-panel figure generated in your lab report and provide a short answer to the questions that follow. If you write any additional code in deriving your answers (you may not need to in this lab), include it and any R outputs or figures it generates. When doing calculations "by hand," include the formulas you used. You do not have to show all your work, but it may make it easier to assign partial credit. 

**For this lab, turn everything in as a single .Rmd file by the start of next lab.**


# Part 1: Continuous distributions

*Note, if you didn’t start with the tutorial above, you’ll need to define x and p variables before running the codes below.*

### Uniform

The uniform is used when there's an equal probability of an event occurring over a range of values.

```{r}
plot(x,dunif(x),type='l')
lines(x,dunif(x,0,4),col=2)
lines(x,dunif(x,-3,-0.5),col=3)

plot(x,punif(x),type='l')
lines(x,punif(x,0,4),col=2)
lines(x,punif(x,-3,-0.5),col=3)

plot(p,qunif(p),type='l',ylim=range(x))
lines(p,qunif(p,0,4),col=2)
lines(p,qunif(p,-3,-0.5),col=3)

hist(runif(500,-3,3),breaks=30,xlim=range(x),probability=TRUE)
lines(x,dunif(x,-3,3),col=2)
```

### Lab questions: 
```
1. Why does the height of the uniform PDF change as the width changes?
2. What do the second and third arguments to the uniform specify? What are their default values?
```

### Beta

The Beta is an interesting distribution because it is bound on the range 0 <= X <= 1 and thus is very often used to describe data that is a proportion.  At first glance the mathematical formula for the Beta looks a lot like the Binomial:

$$Beta(x \mid a,b) \propto x^{a-1} (1-x)^{b-1}$$
$$Binom(x \mid n,p) \propto p^x (1-p)^{n-p}$$

The critical difference between the two is that in the Beta the random variable X is in the base while in the Binomial it is in the exponent.  Unlike many distributions you may be have used in the past, the two shape parameters in the Beta do not define the mean and variance, but these can be calculated as simple functions of $\alpha$ and $\beta$.  The Beta does have an interesting property of symmetry though, whereby Beta($\alpha$, $\beta$) is the reflection of Beta($\beta$,$\alpha$).

```{r}
plot(p,dbeta(p,5,5),type='l')
lines(p,dbeta(p,1,1),col=2)
lines(p,dbeta(p,0.2,0.2),col=3)

## vary beta
plot(p,dbeta(p,6,6),type='l',ylim=c(0,5))
lines(p,dbeta(p,6,4),col=2)
lines(p,dbeta(p,6,2),col=3)
lines(p,dbeta(p,6,1.25),col=4)
lines(p,dbeta(p,6,1),col=5)
lines(p,dbeta(p,6,0.25),col=6)
legend("topleft",legend=c(6,4,2,1.25,1,0.5),lty=1,col=1:6)

plot(p,pbeta(p,5,5),type='l')
lines(p,pbeta(p,1,1),col=2)
lines(p,pbeta(p,0.2,0.2),col=3)

hist(rbeta(500,3,3),breaks=30,xlim=range(p),probability=TRUE)
lines(p,dbeta(p,3,3),col=2)
lines(p,dbeta(p,3,3),col=2)
```

### Lab questions
```
Lab questions: 
  3) The Beta has a special case, Beta(1,1) that is equivalent to what other PDF?
  4) In the first panel, the mean is the same for each line (0.5).  What are the variances? (Hint: Calculate this analytically. Look up the distribution in one of the recommended references.)
  5) In the second panel, what are the means and medians of each of these curves?  (Hint: you'll need to calculate the mean analytically and use one of the variants of R's beta function to find the median.)
```

### Log Normal

The lognormal is a log transform of the normal distribution.  It is defined on the range X > 0 so is commonly used for data that cannot be negative by definition.  The distribution is also positively skewed so is often used for skewed data.  One thing that often goes unappreciated with the log-normal is that the mean, E[X], depends on the variance:

$$E[X] = e^{\mu + {{1}\over{2}}\sigma^2}$$

This applies not just when you explicitly use the lognormal, but also **whenever you log-transform data** and then calculate a mean or standard deviation – a fact that is vastly under-appreciated in the biological and environmental sciences and frequently missed in the published literature.  In fact, ANY data transformation applied to make data “more normal” will change the mean, with the functional form of the bias depending on the transformation used.  You can not simply back-transform the data without correcting for this bias.  This phenomena is another illustration of Jensen's Inequality.

```{r}
## changing the mean
x <- 10^seq(-2,2,by=0.01)
plot(x,dlnorm(x,0),type='l',xlim=c(0,15),main="Changing the Mean")
lines(x,dlnorm(x,1),col=2)
lines(x,dlnorm(x,2),col=3)
legend("topright",legend=0:2,lty=1,col=1:3)

## on a log scale
plot(x,dlnorm(x,0),type='l',log='x',main="Log Scale")
lines(x,dlnorm(x,1),col=2)
lines(x,dlnorm(x,2),col=3)
abline(v=exp(0),col=1)
abline(v=exp(1),col=2)
abline(v=exp(2),col=3)
legend("topright",legend=0:2,lty=1,col=1:3)

## changing the variance
plot(x,dlnorm(x,2,.125),type='l',xlim=c(0,20),ylim=c(0,0.6),main="Changing the Variance")
lines(x,dlnorm(x,2,0.25),col=2)
lines(x,dlnorm(x,2,0.5),col=3)
lines(x,dlnorm(x,2,1),col=4)
lines(x,dlnorm(x,2,2),col=5)
lines(x,dlnorm(x,2,4),col=6)
abline(v=exp(2),col=1)
legend("topright",legend=c(0.125,0.25,0.5,1,2,4),lty=1,col=1:6)

## random sample
hist(rlnorm(250,2,1),breaks=30,probability=TRUE)
lines(x,dlnorm(x,2,1),col=4)
```

### Lab questions
```
6) What are the arithmetric and geometric means of the three curves in the first panel? (Reminder: arithmetic means are means in the linear domain, geometric means are means in the log domain)
```

## Exponential & Laplace

The exponential distribution arises naturally as the time it takes for an event to occur when the average rate of occurrence, *r*, is constant.  The exponential is a special case of the Gamma (discussed next) where $Exp(X \mid r) = Gamma(X \mid 1,r)$.  The exponential is also a special case of the Weibull, $Exp(X \mid r) = Weibull(X \mid r,1)$, where the Weibull is a generalization of the exponential that allows the rate parameter *r* to increase or decrease with time.  The Laplace is basically a two-sided exponential and arises naturally if one is dealing with absolute deviation, $|x-m|$, rather than squared deviation, $(x-m)^2$, as is done with the normal.

```{r}
## changing the mean
x <- seq(0,10,by=0.01)
plot(x,dexp(x,0.125),type='l',ylim=c(0,1))
lines(x,dexp(x,0.25),col=2)
lines(x,dexp(x,0.5),col=3)
lines(x,dexp(x,1),col=4)
lines(x,dexp(x,2),col=5)
lines(x,dexp(x,4),col=6)
legend("topright",legend=c(0.125,0.25,0.5,1,2,4),lty=1,col=1:6)

## random sample
hist(rexp(250,2),breaks=30,probability=TRUE)
lines(x,dexp(x,2),col=4)

## laplace vs Gaussian
plot(x,dexp(abs(x-5),1)/2,type='l')
lines(x,dnorm(x,5),col=2)

plot(x,dexp(abs(x-5),1)/2,type='l',log='y')   ## same plot as last but on a log scale
lines(x,dnorm(x,5),col=2)
```

### Lab questions:
``` 
7) The last two panels compare a normal and a Laplace distribution with the same mean and variance.  How do the two distributions compare?  In particular, compare the difference in the probabilities of extreme events in each distribution.
```

## Gamma

The gamma and inverse-gamma distribution are flexible distributions defined for positive real numbers.  These are frequently used to model the distribution of variances or precisions (precision = 1/variance), in which case the shape and rate parameters are related to the sample size and sum of squares, respectively.  The gamma is frequently used in Bayesian statistics as a prior distribution, and also in mixture distributions for inflating the variance of another distribution

```{r}
x <- seq(0,10,by=0.01)

## change rate
plot(x,dgamma(x,3,3),type='l' ,ylim=c(0,1.6))
lines(x,dgamma(x,3,1),col=2)
lines(x,dgamma(x,3,6),col=3)
lines(x,dgamma(x,3,1/3),col=4)
legend("topright",legend=c(3,1,6,0.33),lty=1,col=1:4)

## change shape
plot(x,dgamma(x,3,3),type='l')
lines(x,dgamma(x,1,3),col=2)
lines(x,dgamma(x,6,3),col=3)
lines(x,dgamma(x,18,3),col=4)
legend("topright",legend=c(3,1,6,18),lty=1,col=1:4)

## change variance
a <- c(20,15,10,5,2.5,1.25)
r <- c(4,3,2,1,0.5,0.25)
plot(x,dgamma(x,20,4),type='l')
for(i in 1:6){
  lines(x,dgamma(x,a[i],r[i]),col=i)}
var = a/r^2
mean =  a/r
legend("topright",legend=format(var,digits=3),lty=1,col=1:6)
var
mean

## change mean
var = 4
mean = c(1,2,3,4,5)
rate = mean/var
shape = mean^2/var
plot(x,dgamma(x,shape[2],rate[2]),type='l',col=2)
for(i in 1:5){
  lines(x,dgamma(x,shape[i],rate[i]),col=i)
}
legend("topright",legend=1:5,lty=1,col=1:5)
rate
shape
```

### Lab questions:
``` 
8) Looking at the 'change variance' figure, how does the variance change as a and r increase? Qualitatively, how does this affect the mode and skew? Quantitatively, how does this affect the median (relative to the mean)?
```

# Part 2: Discrete distributions

## Binomial

The binomial arises naturally from counts of the number of successes given a probability of success, p, and a sample size, n.  You are probably already familiar with the binomial in the context of coin toss examples.

```{r}
x <- 0:11

## vary size of sample (number of draws)
size = c(1,5,10)
plot(x,dbinom(x,size[1],0.5),type='s')
for(i in 2:3){
  lines(x,dbinom(x,size[i],0.5),type='s',col=i)
}
legend("topright",legend=size,lty=1,col=1:3)

## vary probability
n = 10
p = c(0.1,0.25,0.5)
plot(x,dbinom(x,n,p[1]),type='s')
for(i in 2:3){
  lines(x,dbinom(x,n,p[i]),col=i,type='s')
}
abline(v = n*p,col=1:3,lty=2)
legend("topright",legend=p,lty=1,col=1:3)

## CDF
plot(x,pbinom(x,n,p[1]),type='s',ylim=c(0,1))
for(i in 2:3){
  lines(x,pbinom(x,n,p[i]),col=i,type='s')
}
legend("bottomright",legend=p,lty=1,col=1:3)

## Random samples
hist(rbinom(100,10,0.5)+0.0001,   ## small amount added because
     probability=TRUE,		 ## of way R calculates breaks 
     breaks=0:11,main="Random Binomial") 
lines(x,dbinom(x,10,0.5),type='s',col=2)	#Analytical solution
legend("topright",legend=c("sample","pmf"),lty=1,col=1:2)
```

### Lab questions:
```
9)  Consider a binomial distribution that has a constant mean, np.  What are the differences in the shape of this distribution if it has a high n and low p vs. a low n and high p?
```

## Poisson

The Poisson is also very common for count data and arises as the number of events that occur in a fixed amount of time (e.g. number of bird sightings per hour), or the number of items found in a fixed amount of space (e.g. the number of trees in a plot). Unlike the Binomial distribution the Poisson doesn't have a fixed upper bound for the number of events that can occur.

```{r}
x <- 0:12
plot(x,dpois(x,1),type='s')
lines(x,dpois(x,2),type='s',col=2)
lines(x,dpois(x,5),type='s',col=3)
legend("topright",legend=c(1,2,5),lty=1,col=1:3)

x <- 20:80
plot(x,dpois(x,50),type='s',ylim=c(0,0.08))  	#Poisson with mean 50 (variance = 50)
lines(x+0.5,dnorm(x,50,sqrt(50)),col=2)		#Normal with mean and variance of 50
lines(x,dbinom(x,100,0.5),col=3,type='s')		#Binomial with mean 50 (variance = 25)
legend("topright",legend=c("pois","norm","binom"),col=1:3,lty=1)

plot(x,dbinom(x,100,0.5),type='s',col=3)		#Binomial with mean 50 (variance = 25)
lines(x+0.5,dnorm(x,50,sqrt(25)),col=2)		#Normal with mean 50 and variance of 25
lines(x,dpois(x,50),col=1,type='s')			#Poisson with mean 50 (variance = 50)
legend("topright",legend=c("pois","norm","binom"),col=1:3,lty=1)
```

The last two panels depict a comparison of the Poisson, Normal, and Binomial with the same mean and a large sample size.  The Poisson and Binomial are identical in the two figures but in the first the normal has the same variance as the Poisson and the second it is the same as the binomial.

### Lab questions:
```
10)  Normal distributions are often applied to count data even though count data can only take on positive integer values.  Is this fair is this to do in these two examples? (i.e. how good is the normal approximation)
11) Would the normal be a fair approximation to the Poisson curves for small numbers (the first panel)? How about for the Bionomial for small numbers (earlier panel of figures on the Binomial)?
12) Is the Poisson a good approximation of the Binomial?
13) Is it possible to choose the parameters so that the Poisson and Binomial to both have the same mean and  variance?  If so what is this parameterization?
```

## Negative binomial

The negative binomial has two interesting interpretations that are subtlety different from either the Poisson or Binomial.  In the first case, it is the number of trials needed in order to observe a fixed number of occurrences, which is the opposite from the Binomial's number of occurrences in a fixed trial size and thus where it gets its name.  The Negative Binomial also arises as the distribution of number of events that occur in a fixed space or time when the rate is not constant (as in the Poisson) but varies according to a Gamma distribution.  Hence the Negative Binomial is also used to describe data that logically seems to come from a Poisson process but has greater variability that is expected from the Poisson (which by definition has a variance equal to its mean).  The Geometric distribution arises as a special case of the negative binomial where the number of occurrences is fixed at 1.

```{r}
x <- 0:20
## negative binomial

## vary size
plot(x,dnbinom(x,1,0.5),type="s",main="vary size")
lines(x,dnbinom(x,2,0.5),type="s",col=2)
lines(x,dnbinom(x,3,0.5),type="s",col=3)
lines(x,dnbinom(x,5,0.5),type="s",col=4)
lines(x,dnbinom(x,10,0.5),type="s",col=5)
legend("topright",legend=c(1,2,3,5,10),col=1:5,lty=1)

## vary probability
plot(x,dnbinom(x,3,0.5),type="s",main="vary probability")
lines(x,dnbinom(x,3,0.3),type="s",col=2)
lines(x,dnbinom(x,3,0.2),type="s",col=3)
lines(x,dnbinom(x,3,0.1),type="s",col=4)
legend("topright",legend=c(0.5,0.3,0.2,0.1),col=1:5,lty=1)

## vary variance , alternate parameterization
mean = 8
var = c(10,20,30)
size = mean^2/(var-mean)
plot(x,dnbinom(x,mu=mean,size=size[1]),type="s",ylim=c(0,0.14),main="vary variance")
lines(x,dnbinom(x,mu=mean,size=size[2]),type="s",col=2)
lines(x,dnbinom(x,mu=mean,size=size[3]),type="s",col=3)
legend('topright',legend=format(c(var,mean),digits=2),col=1:4,lty=1)
lines(x,dpois(x,mean),col=4,type="s")
## NB as generalization of pois with inflated variance

## geometric
plot(x,dgeom(x,0.5),type="s",main="Geometric")
lines(x,dgeom(x,0.15),type="s",col=2)
lines(x,dgeom(x,0.05),type="s",col=3)
lines(x,dnbinom(x,1,0.15),type="s",col=4,lty=2)
## geometric as special case of NB where size = 1
```


### Lab questions:
``` 
14)  In the 'vary size' panel, what are the means of the curves? 
15) In the “vary variance” panel, how does the shape of the Negative Binomial compare to a Poisson with the same mean?
```