-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Maintainer Wanted :-) #57
Comments
@ropensci/admin |
Hello, I can't say that I will have time or knowledge to be maintainer but I have used However, before thinking whether I should be a maintainer, I'm not even sure it's technically possible to write a package that automatically downloads the ESS datasets anymore. First, there's no API. Second, from what I can see from the current code, the old way was "easy" in the sense that each dataset had its own fixed URL to download the data. However, it seems that it is no longer the case. Indeed, all "Download" buttons now lead to the same URL: https://ess-search.nsd.no/en/download This means that it is not possible to distinguish one dataset from the other based on the URL. Maybe we could perform the POST request to trigger the download ourselves using There is also a Therefore, it seems to me that the only way to bring |
Wow, thanks a lot @etiennebacher for the digging! So yes, I suppose a new maintainer would need to contact the data provider first. |
@djhurio can help as he has contact with the ESS organizing team. @etiennebacher, you're right as we're a bit lost on how the data is now being downloaded. I haven't looked into how it works now but you're on the right track. |
Hi all, @etiennebacher it looks like programmatically accessing metadata and downloading data can be done on the new data portal using a GraphQL API, but data downloads need a few steps. There are API docs here but they're not particularly illuminating, I've just been following the request flow in dev tools to see what it does. It looks a bit fiddly but very doable. It also needs an OAuth2 authentication flow, which is a bit of a pain. I can't see any option to register an OAuth2 client app through the ESS data portal so that would probably need help from the ESS organising team. I'm a bit time-strapped at the moment but would be more than happy to help get something up and running! |
Thanks @gorcha!! Should I go ahead and give you write access to this repository? (if you decide to become the maintainer, you'd get admin access) |
@gorcha I see the GraphQL queries, but I don't know how to reproduce them. Some elements look random, like the For reference, here are the steps I follow:
Here's the first GraphQL query for me: And the second: I have more or less the same requests when I use the "Data Wizard". The only difference is that there are more arguments because we specify which variables/years/countries we want. I don't know how to mimic this from R. If you have an example, I'd be curious to see how you do it. |
Hi @maelle - sure! I won't have a chance to look at it for a few weeks though. Hi @etiennebacher, the datafile IDs are retrieved as part of some earlier GraphQL queries. In addition to watching the requests I've been prodding the JavaScript a bit to figure out some of the details. Here's a dump of the process flow for both the Data Portal and Data Wizard from a bit of poking around, with the GraphQL operation name and arguments for each step. One note - I haven't had a go at replicating it in R yet, but the GraphQL calls are all just JSON so there'll be some parsing shenanigans and dealing with HTTP responses but the trickiest part will probably be OAuth. Data PortalStudy metadatagraphql: Retrieve metadata for the study (e.g. ESS 2010), which is used to populate the page (the study ID is in the page URL). The set of datafiles for the study are part of the returned object, including the datafile ID, description etc. Get download URLgraphql: Get the download URL. This uses the datafileID, version, agencyId and instance values returned from Download
Download the data set from the URL returned by Register the downloadgraphql: This registers downloads so they show up in the "previous downloads" section (not sure if there is any other purpose for this). 'urn:ddi:int.esseric:' + t.id + ':' + t.version
Data WizardGet variable metadatagraphql: Variable metadata (high level conceptual groups, variable details) are retrieved for the entire ESS series. This works similarly to the study metadata/data files - the top level conceptual variable groups are accessed from the series metadata (using the series ID). The returned object contains variable details (ID, name, etc.), and further variable info is accessed through additional graphql queries. Get country metadatagraphql: Possible country combinations are returned as a "countryCoverageTable" object (based on the series ID) that contains datafile IDs, country IDs, and a flag telling us whether the country exists for each datafile. Make the wizard generate the filegraphql: This asks the wizard to create the file. The WizardDownloadInput tells it what datafiles, countries and variables to include and ties it to the current user with {
"variables": {
"input": {
"agencyId": "INT_ESSERIC",
"datafiles": [
{
"countries": null,
"id": "ffc43f48-e15a-4a1c-8813-47eda377c355",
"version": 73
},
{
"countries": [
"AT"
],
"id": "b2b0bf39-176b-4eca-8d26-3c05ea83d2cb",
"version": 248
}
],
"format": "SAV",
"instance": "PUBLISHED",
"variables": [
"netuse",
"netusoft",
"netustm",
"nwspol",
"nwsppol",
"nwsptot",
"pplfair",
"pplhlp",
"ppltrst",
"rdpol",
"rdtot",
"tvpol",
"tvtot"
]
}
}
} The response gives the ID for the wizard download to be used in the following steps. Register the downloadgraphql: Similar to the Data Portal download registration, but I can't see this being used in the UI anywhere (there are no previous downloads displayed for the data wizard). Poll the wizard downloadWhile the file is being generated/prepared it is repeatedly polled to see if it's ready yet. This returns the an isFinished indicator and the URL for the download. graphql: Download
Once the |
Thanks a lot for all these explanations! I don't have the skills to make this work, I never used GraphQL before so I've no idea where to start. I'll try to help in other ways |
@gorcha I've now invited you to the ropensci organization and to a team with write access to this repository. Note that you'll need to enable 2FA for your GitHub account if that's not already the case, see https://docs.github.com/en/authentication/securing-your-account-with-two-factor-authentication-2fa/configuring-two-factor-authentication + https://ropensci.org/blog/2022/05/16/requiring-2fa-for-the-ropensci-github-organization/ for context @etiennebacher if you start contributing (thanks already for the convo here!!) please ping me so I might grant you access too. |
Thanks @maelle! No worries @etiennebacher, any help with testing and documentation would be super helpful 🙂 |
@gorcha Just to say that I'd be happy to help as well. I've loved using this package in both teaching and research and would be glad to see it brought back to life :) |
Just to throw that out there: I now saw that the ESS data is CC BY-NC-SA 4.0 ... and not very large (10 waves of apparently only ~15 MBs). So an alternative to OAuth and API issues might just be to host the data in a separate GitHub repo, accessed by the package? |
Hey @LukasWallrich, Great idea, I hadn't even thought of that! Will check it out :) |
Dear all, thank you for your involvement into this issue with the ESS data. Regarding the idea by @LukasWallrich, please note the ESS data for each round is released in several releases and versions (data editions are numbered according
If you host the data on a separate repo than there should be a process to monitor if new data edition ( |
Hello, any news about this? @gorcha did you have the occasion to try making the graphql requests from R? Or is the plan to use a separate repo to host the data and keep it updated with future releases? FYI, I contacted the organization that manages the ESS data. They said that providing a clear API documentation is something they want to do but they didn't give me a timeline for this. |
Dear colleagues, Wishing you a Merry Christmas, I would like to share two comments. Firstly, it is evident that the ESS managers consider the package a low priority. Despite promises of an API, more than six months have passed without any developments. I propose two solutions:
Best regards, Jānis |
Or new maintainer team. 😸
Because of #56 a whole overhaul of the package is needed
If you're interested, please comment in the issue.
For more info, see
The text was updated successfully, but these errors were encountered: