Skip to content
This repository has been archived by the owner on Jul 7, 2024. It is now read-only.

Sdm #45

Closed
wants to merge 21 commits into from
Closed

Sdm #45

wants to merge 21 commits into from

Conversation

davharris
Copy link
Contributor

In progress, but hopefully ready soon. Also includes a function for shear maps (#44).

image

@davharris
Copy link
Contributor Author

@sdtaylor @ethanwhite this is ready for code review when one of you has time.

The dev version of dplyr apparently only gets the first hundred thousand rows by default, and throws a warning.  This fixes it.
* use correct functions for dev version of dplyr
* Handle cases with zero occurrences
Make the background less ugly, add threshold at exactly 0
@ethanwhite
Copy link
Member

hey @davharris - what's the status of getting this up and running?

@ethanwhite
Copy link
Member

@sdtaylor - if you have some time would you mind giving this a review. Thanks.

@sdtaylor
Copy link
Contributor

Will do.

@sdtaylor
Copy link
Contributor

So if I run sdm.R inside rstudio I get this error when get_env_data() is called.

Error in get_elev_data() : could not find function "getData"

Which looks like the getData() from the raster package is not being loaded correctly from the importFrom statement.

But when I run the script from the command line with Rscript sdm.R this error comes up

Error in UseMethod("db_list_tables") : 
  no applicable method for 'db_list_tables' applied to an object of class "SQLiteConnection"
Calls: %>% ... match -> src_tbls -> src_tbls.src_sql -> db_list_tables
Execution halted

Which comes from one of the database calls in in get_env_data() but I'm not sure which. Any ideas?

y = distinct(occurrences, year, species_id, site_id, .keep_all = TRUE) %>%
inner_join(select(x, year, site_id), c("year", "site_id")) %>%
mutate(presence = abundance > 0) %>%
select(-abundance, lat, long, start_time, month, day) %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an issue here. start_time and month aren't found when doing this select statement. Could it be from different bbs_data.csv files or databases? I'm using the ./data/bbs_data.csv file thats on the T drive.

@sdtaylor
Copy link
Contributor

Are SDM's going to be another model type to compare against ARIMA, spatial ndvi, etc.? If so, this could use a wrapper to return forecasts in the same format as the others.

@ethanwhite
Copy link
Member

Are SDM's going to be another model type to compare against ARIMA, spatial ndvi, etc.?

Yes, that's the idea.


env = get_env_data() %>%
filter_ts(start_yr, end_yr, min_num_yrs) %>%
inner_join(distinct(occurrences, site_id, lat, long, year),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inner_join(distinct()) puts the occurrence and species_id columns in the env data frame, which ends up getting used in the models. That definitely doesn't look intentional unless your going for some sort of pseudo-joint sdm. =)

@ethanwhite
Copy link
Member

@davharris - just wanted to check in on the status of this PR.

}
# fit gbm -----------------------------------------------------------------

# Function to cross-validate a GBM model with different numbers of trees, then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my stuff I estimate the optimum number of trees with gbm.perf(). I think it might be faster than doing cv manually.

@davharris davharris mentioned this pull request Sep 27, 2016
@davharris
Copy link
Contributor Author

replaced by #76

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants