Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in reproduce_model examples documentation #65

Closed
ngotelli opened this issue Apr 1, 2015 · 11 comments
Closed

Error in reproduce_model examples documentation #65

ngotelli opened this issue Apr 1, 2015 · 11 comments
Labels

Comments

@ngotelli
Copy link
Contributor

ngotelli commented Apr 1, 2015

Hi @emhart

There is a small error in the documentation for the reproduce_model examples code. We have:

## Not run: 
finchMod <- cooc_null_model(dataWiFinches, algo="sim1",saveSeed=T)
## Check model output
mean(finchMod$Sim)

reproduce_model(finchMod$Sim)

finchMod <- cooc_null_model(dataWiFinches, algo="sim1")
## Check model output is the same as before
mean(finchMod$Sim)
reproduce_model(finchMod$Sim)

## End(Not run)

However, in two of the lines, the correct function call should be:

reproduce_model(finchMod)

Since you have already submitted to CRAN, I wasn't sure if it was safe to push any changes to the repo now.

@ngotelli ngotelli added the bug label Apr 1, 2015
@emhart
Copy link
Member

emhart commented Apr 1, 2015

I need to upload it again because I included a '.' where I shouldn't have in the DESCRIPTION file, so you can push the change.

@ngotelli
Copy link
Contributor Author

ngotelli commented Apr 1, 2015

OK, I will push it up. Meanwhile, I am having trouble creating a simple example of a user-defined null model that calls null_model_engine. Here is the code:

##################
# Create your own null model
# Simple test for fit of data to a Poisson

# vector set up as a data frame for null_model_engine
MyData <- data.frame(c(0,0,0,1,2,50))
names(MyData) <- "N"

# Calculate the variance to mean ratio of the data
# For a true Poisson, this should ~ 1.0
MyMetric <- function(x=runif(10)){
          VarMeanRatio <- var(x)/mean(x)
             return(VarMeanRatio)
}

# Take a data vector
# Calculate its mean
# Treat that as lambda
# Simulate a data set of the same size
MyAlgo <- function(x=runif(10)){
             lambda <- mean(x)
             sim <- rpois(length(x),lambda)
             return(sim)
}

# functions work on vectors, but this code throws an error:
MyModel <- null_model_engine(speciesData="MyData",algo="MyAlgo", metric="MyMetric", type=NULL)

Is this a problem because I am using a vector rather than a matrix? When this is straightened out, please add this code to the Examples section of the documentation for null_models_engine. We don't have any coding examples there, and it will be important to have a simple example like this one. Thanks!

@emhart
Copy link
Member

emhart commented Apr 2, 2015

@ngotelli There are two problems actually. 1). When you make your own function the first arguments need to follow the ecosimr conventions of having the first argument be speciesData and m and 2). The speciesData parameter is actually going to be an object because it's a data frame, vector, etc...However the algo and metric are functions so the only way to pass them in is as strings. So this works fine:

MyData <- c(0,0,0,1,2,50)
colnames(MyData) <- "N"
#Calculate the variance to mean ratio of the data
# For a true Poisson, this should ~ 1.0
MyMetric <- function(m=runif(10)){
  VarMeanRatio <- var(m)/mean(m)
  return(VarMeanRatio)
}

# Take a data vector
# Calculate its mean
# Treat that as lambda
# Simulate a data set of the same size
MyAlgo <- function(speciesData=runif(10)){
  lambda <- mean(speciesData)
  sim <- rpois(length(speciesData),lambda)
  return(sim)
}

# functions work on
MyModel <- null_model_engine(speciesData= MyData ,algo="MyAlgo", metric="MyMetric",nReps = 1000)

summary(MyModel)

plot(MyModel)

Inputting your data as a dataframe will work too...

MyData <- data.frame(c(0,0,0,1,2,50))

@ngotelli
Copy link
Contributor Author

ngotelli commented Apr 2, 2015

Thanks; I had tried it before with no quotes on the data frame and still had troubles. I am sure with hard work and study I will eventually master this program! 🐑

@ngotelli ngotelli closed this as completed Apr 2, 2015
@emhart
Copy link
Member

emhart commented Apr 2, 2015

Do you think there's a way we could make it easier and more intuitive?

@ngotelli
Copy link
Contributor Author

ngotelli commented Apr 2, 2015

Hi @emhart The null_model_engine function itself seems fine. I think the way to make it usable is to have a couple of examples. Specifically, let's add:

  1. The first example that I worked up, which has minimal code for a metric and an algorithm
  2. A second example showing how you would use a list to add in more parameters for a function. I will illustrate this with a model in which you draw individuals from a source pool. The source pool has an additional parameter of weights for each species to pass to the sample function.
  3. I'd like to add one more example in which the user sets type=cooc and then uses a metric from another package. I am sure there is some kind of diversity index in vegan that would work. I think this third option could be very popular.

I will try to work more on this soon.

N.

@ngotelli
Copy link
Contributor Author

ngotelli commented Apr 3, 2015

Hi @emhart . Sorry to bother you again, but I am still having trouble getting null_model_engine to work for me. Here is a simple example that is calling in a list of algorithm options:

# Example #2
# Construct a source pool and a parameter for species weights
# Draw randomly from the source pool and count the number of species present
# This is just a poor man's rarefaction program

# Create the sourcepool of 26 alphabet species
MySourcePool <- paste("Species",LETTERS,sep="")

# Create an island assemblage with 64 individuals and 6 species:
MyData <- paste("Species",c(rep("A",50),rep("B",10),"C","D","E","F"),sep="")

# Create a vector of relative species colonization weights
MyWeights <- sort(rbeta(n=length(MySourcePool),shape1=0.5,shape2=0.5),decreasing=TRUE)

# "algo" function for null model algorithm
# Draw a random sample from the source pool, sampling with replacement and species weights 

MyAlgo <- function(speciesData=runif(10),weights=runif(100),sourcepool=runif(100)){
           NullAssemblage <- sample(x=sourcepool,size=length(speciesData),replace=TRUE,prob=weights)
           return(NullAssemblage)
}

# "metric" function for null model metric
# give the species count for the random sample
MyMetric <- function(m=LETTERS){
            SpeciesCount <- length(unique(m))
            return(SpeciesCount)
}

MyModel <- null_model_engine(speciesData=MyData,algo="MyAlgo",metric="MyMetric",algoOpts=list(weights=MyWeights,sourcepool=MySourcePool))
summary(MyModel)
plot(MyModel)

I have confirmed the behavior of the two functions (MyMetric and MyAlgo) and the proper structure and content of the 3 data vectors (MyData, MyWeights and MySourcePool). null_model_engine gives a complete run, but the simulated and observed data are always zero.

Thanks for your help!

Nick

@emhart emhart reopened this Apr 3, 2015
@emhart
Copy link
Member

emhart commented Apr 3, 2015

@ngotelli I tracked this down to some error handling. If a data frame's first column is text, the software strips out the first column and then reclasses the data as numeric. This is to handle the case where a user inputs a matrix with species names in the first column and the rest of the numeric values happen to be text. In your case, all your data is text. It get's replaced with NA's so you get 0.

This behaviour is to try and make data input seamless for the user. However if we just allow text inputs, we lose error handling in the case of a text matrix being entered. So I'm not sure of a way around this.

@ngotelli
Copy link
Contributor Author

ngotelli commented Apr 3, 2015

@emhart Ah, that's insidious, and it is the same error that tripped me up when I was testing co-occurrence a while back. To patch this, I will try adding row names to all of my vectors so that when they are stripped out I should still be left with my character vectors.

One possible solution (which does not have to implemented now) would be to create a new function custom_null_model, which is like null_model_engine, but has none of the error trapping or other convenience functions. Then advanced users can create their own null models using this custom function (and being responsible for their own error checks). However, i don't know where all the error checks are located within EcoSimR and if it is possible to do this easily.

A related issue is that it seems counter-intuitive that new functions for algo and metric are required to take inputs of speciesData and m. With this restriction, I don't think it would be possible to use any built-in functions from other packages such as vegan. I don't know if it will be possible in the future to fix this, but it would be a good change to have.

For now, I see you are getting close to having EcoSimR up on CRAN, which is very exciting! Even without the bells and whistles for customized null models, the current package is an excellent contribution and a solid base to build from.

Enjoy your weekend,

N.

@emhart
Copy link
Member

emhart commented Apr 3, 2015

@ngotelli actually, it will probably still not work because the data you're passing in is text. No matter what it will try and convert it to a number because it's expecting numeric inputs, so you'll probably get a new error. I didn't realize we would accept text inputs.

Here I try and adopt what you're doing but with counts, not sure if I get at the spirit of what your example was trying...

MyData <- table(paste("Species",c(rep("A",50),rep("B",10),"C","D","E","F"),sep=""))

MyWeights <- sort(rbeta(n=length(MyData),shape1=0.5,shape2=0.5),decreasing=TRUE)

MyAlgo <- function(speciesData,weights) {
  NullAssemblage <- rmultinom(1,size=sum(speciesData),prob=weights)
  return(NullAssemblage)
}

MyMetric <- function(m){
  return(sum(m > 0))
}

MyModel <- null_model_engine(speciesData=MyData,algo="MyAlgo",metric="MyMetric",algoOpts=list(weights=MyWeights))
summary(MyModel)
plot(MyModel)

Regarding the parameter names, the reason I do this is because the null model engine uses do.call() to handle options. I needed to standardize those options for all the functions and the way I did this was just through a parameter naming convention. I've opened an issue on this, #66. I need to do more testing, but we could release an upgrade in a few months that is more flexible and has more intuitive error handling.

@ngotelli
Copy link
Contributor Author

ngotelli commented Apr 3, 2015

Hi @emhart Yes, your code does the same thing with numeric variables. I think it is fine to restrict EcoSimR format to numeric to avoid these kind of problems; I just wasn't thinking about it when I started coding. Eventually, I'd like to get both of these (corrected) examples included with the documentation for null_model_engine. But you are in the midst of uploading to CRAN, so I won't make any edits for now. But I will certainly use these examples for the course in Switzerland. Thanks!

We can revisit the general structure of the user-defined null models some time later.

Best,

Nick

@emhart emhart closed this as completed Apr 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants