-
Notifications
You must be signed in to change notification settings - Fork 9
Conversation
@sdtaylor @ethanwhite this is ready for code review when one of you has time. |
The dev version of dplyr apparently only gets the first hundred thousand rows by default, and throws a warning. This fixes it.
* use correct functions for dev version of dplyr * Handle cases with zero occurrences
Make the background less ugly, add threshold at exactly 0
hey @davharris - what's the status of getting this up and running? |
@sdtaylor - if you have some time would you mind giving this a review. Thanks. |
Will do. |
So if I run sdm.R inside rstudio I get this error when get_env_data() is called.
Which looks like the But when I run the script from the command line with
Which comes from one of the database calls in in |
y = distinct(occurrences, year, species_id, site_id, .keep_all = TRUE) %>% | ||
inner_join(select(x, year, site_id), c("year", "site_id")) %>% | ||
mutate(presence = abundance > 0) %>% | ||
select(-abundance, lat, long, start_time, month, day) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an issue here. start_time
and month
aren't found when doing this select statement. Could it be from different bbs_data.csv files or databases? I'm using the ./data/bbs_data.csv
file thats on the T drive.
Are SDM's going to be another model type to compare against ARIMA, spatial ndvi, etc.? If so, this could use a wrapper to return forecasts in the same format as the others. |
Yes, that's the idea. |
|
||
env = get_env_data() %>% | ||
filter_ts(start_yr, end_yr, min_num_yrs) %>% | ||
inner_join(distinct(occurrences, site_id, lat, long, year), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This inner_join(distinct())
puts the occurrence
and species_id
columns in the env
data frame, which ends up getting used in the models. That definitely doesn't look intentional unless your going for some sort of pseudo-joint sdm. =)
@davharris - just wanted to check in on the status of this PR. |
} | ||
# fit gbm ----------------------------------------------------------------- | ||
|
||
# Function to cross-validate a GBM model with different numbers of trees, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my stuff I estimate the optimum number of trees with gbm.perf()
. I think it might be faster than doing cv manually.
replaced by #76 |
In progress, but hopefully ready soon. Also includes a function for shear maps (#44).