Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActivityStartDateTime, time zones, and offset cols #558

Open
cristinamullin opened this issue Dec 17, 2024 · 2 comments
Open

ActivityStartDateTime, time zones, and offset cols #558

cristinamullin opened this issue Dec 17, 2024 · 2 comments

Comments

@cristinamullin
Copy link
Collaborator

Is your feature request related to a problem? Please describe:

The time zone associated with ActivityStartDateTime is not clear because it is not included in the df returned from dataRetrieval:::create_dateTime, which is run within dataRetrieval functions called within TADA_DataRetrieval and separately within TADA_AutoClean when ActivityStartDateTime is missing from the input df.

Describe the solution you'd like:

I think it might be more user friendly to include a column titled ActivityStartDateTime.TimeZoneCode (UTC in this case) instead of the ActivityStartTime.TimeZoneCode_offset (which includes number of hours). As is, the target time zone for ActivityStartDateTime (a function input here) is not documented anywhere in the returned df (see review_TADAProfile1 below).

Describe alternatives you've considered:

Alternatively, UTC could potentially be included in ActivityStartDateTime but that might break people workflows (e.g. "2023-05-11 11:45:00 UTC").

Additional context:

Regarding dataRetrieval:
The internal code is here;
https://github.com/DOI-USGS/dataRetrieval/blob/main/R/importWQP.R#L223
(you can call it, you'd just need to do a triple colon: dataRetrieval:::create_dateTime)

offsetLibrary is a dataframe saved in sysdata.rda
You can see where and how it gets called here:
https://github.com/DOI-USGS/dataRetrieval/blob/main/R/importWQP.R#L160

Review_TADAProfile1 below. See discussion #557

# Find web service URLs for each Profile using WQP User Interface (https://www.waterqualitydata.us/)
# Example WQP URL: https://www.waterqualitydata.us/#statecode=US%3A09&characteristicType=Nutrient&startDateLo=04-01-2023&startDateHi=11-01-2023&mimeType=csv&providers=NWIS&providers=STEWARDS&providers=STORET

# Use TADA_ReadWQPWebServices to load the Station, Project, and Phys-Chem Result profiles
stationProfile <- TADA_ReadWQPWebServices("https://www.waterqualitydata.us/data/Station/search?statecode=US%3A09&characteristicType=Nutrient&startDateLo=04-01-2023&startDateHi=11-01-2023&mimeType=csv&zip=yes&providers=NWIS&providers=STEWARDS&providers=STORET")
physchemProfile <- TADA_ReadWQPWebServices("https://www.waterqualitydata.us/data/Result/search?statecode=US%3A09&characteristicType=Nutrient&startDateLo=04-01-2023&startDateHi=11-01-2023&mimeType=csv&zip=yes&dataProfile=resultPhysChem&providers=NWIS&providers=STEWARDS&providers=STORET")
projectProfile <- TADA_ReadWQPWebServices("https://www.waterqualitydata.us/data/Project/search?statecode=US%3A09&characteristicType=Nutrient&startDateLo=04-01-2023&startDateHi=11-01-2023&mimeType=csv&zip=yes&providers=NWIS&providers=STEWARDS&providers=STORET")

# Join all three profiles using TADA_JoinWQPProfiles
TADAProfile <- TADA_JoinWQPProfiles(FullPhysChem = physchemProfile, Sites = stationProfile, Projects = projectProfile)

# Run TADA_CheckRequiredFields, returns error message, 'The dataframe does not contain the required fields: ActivityStartDateTime'
TADA_CheckRequiredFields(TADAProfile)

# Add missing col
TADAProfile1 <- dataRetrieval:::create_dateTime(df = TADAProfile, 
                                         date_col = "ActivityStartDate", 
                                         time_col = "ActivityStartTime.Time",
                                         tz_col = "ActivityStartTime.TimeZoneCode", 
                                         tz = "UTC")

review_TADAProfile1 = TADAProfile1 %>% dplyr::select(c("ActivityStartDate", 
                          "ActivityStartTime.Time", 
                          "ActivityStartTime.TimeZoneCode", 
                          "ActivityStartDateTime",
                          "ActivityStartTime.TimeZoneCode_offset"))

# re-run TADA_CheckRequiredFields, returns TRUE
TADA_CheckRequiredFields(TADAProfile1)
@hillarymarler
Copy link
Collaborator

I like the idea of a separate time zone code column. As you note, this seems like it would be much less likely to impact existing workflows.

@cristinamullin
Copy link
Collaborator Author

From Laura D:

Just making sure we're both on the same page: In dataRetrieval, the default timezone is UTC set here:
https://github.com/DOI-USGS/dataRetrieval/blob/main/R/readWQPdata.R#L200
You can read about changing timezones here:
https://doi-usgs.github.io/dataRetrieval/reference/readWQPdata.html#arg-tz
This sets the time zone attribute of the POSIX object.

Like this:

library(dataRetrieval)
nameToUse <- "pH"
pHData <- readWQPdata(siteid = "USGS-04024315",
characteristicName = nameToUse,
service = "ResultWQX")
attr(pHData$Activity_StartDateTime, "tzone")
[1] "UTC"
pHData$Activity_StartDateTime[1]
[1] "1975-09-27 15:50:00 UTC"

pHData2 <- readWQPdata(siteid = "USGS-04024315",
characteristicName = nameToUse,
tz = "America/Chicago",
service = "ResultWQX")
attr(pHData2$Activity_StartDateTime, "tzone")
[1] "America/Chicago"
pHData2$Activity_StartDateTime[1]
[1] "1975-09-27 10:50:00 CDT"

So what you are asking for is a column that converts the offset number of hours to the timezone it was converted to?

Note there's also the link in the help to the OlsonNames() base R function which talks about how R handles timezones. The issue is that different operating systems and depending on where in the world the computer things you are will want different abbreviations for timezones (that's why using the OlsonNames is what has been working best for dataRetrieval).
https://rdrr.io/r/base/timezones.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants