-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certain concepts incorrectly trigger "unknown concept" error (64 character limit) #49
Comments
Most likely it's a maximum of 64 characters:
The column does not exist in the big waffle datapoints table after loading SG. We will need a renaming function that also prevents collissions (however unlikely) when creating and querying tables. Maybe we substring anything longer than 60 chars to 60 and suffix a 4 character hash (probably by base64 encoding) of the complete string? Overview of non-crypto hashes in (node.)js: https://github.com/joliss/fast-js-hash-benchmark |
There was already code implemented for long column names, including a test case. However the code checked for > 64 characters Line 454 in dc0622a
while the test case used a concept < 64 characters. Line 475 in dc0622a
Meaning the test case passed because it didn't actually check the long concept implementation. The problem seems to be at the CONNECT engine. It correctly creates a table with the shortened column name but then gives this error when trying to select the data: When renaming the header in the csv file to the shortened name, data is loaded correctly (can select) but the rest of the script fails because the csv file has an unexpected header. My hypothesis is that this is due to a mismatch between csv table header and CONNECT engine table header. CONNECT tries to read the shortened name from csv but it doesn't exist there. I thought CONNECT engine might allow longer column names than INNODB but a test showed that the CONNECT engine also can't work with long names. It's a MariaDB wide limitation: https://mariadb.com/kb/en/identifier-names/
So it seems it's either:
|
Some concepts do exist in the dataset, and are returned in the availability queries, but on datapoint queries they fail with "unknown concept".
query:
url:
https://big-waffle.gapminder.org/sg-master/75ab7c3?_select_key@=geo&=time;&value@=debt/_servicing/_costs/_percent/_of/_exports/_and/_net/_income/_from/_abroad;;&from=datapoints
Here's the availability query for that dataset:
https://big-waffle.gapminder.org/sg-master/75ab7c3?_select_key@=key&=value;&value@;;&from=datapoints.schema
Which contains
<geo,time>,debt_servicing_costs_percent_of_exports_and_net_income_from_abroad
One hypothesis: it seems to be mostly concepts with long id's, might be some undocumented concept id length limitation?
The text was updated successfully, but these errors were encountered: