Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degeneracy in variable name #330

Open
aidanheerdegen opened this issue Sep 21, 2023 · 5 comments
Open

Degeneracy in variable name #330

aidanheerdegen opened this issue Sep 21, 2023 · 5 comments

Comments

@aidanheerdegen
Copy link
Collaborator

aidanheerdegen commented Sep 21, 2023

While looking for a mapping from variable name to long_name, standard_name and units there are some troubling inconsistencies

ACCESS-NRI/experiment-metadb#3 (comment)

The variables table in the database has the following schema

CREATE TABLE variables (
        id INTEGER NOT NULL, 
        name VARCHAR NOT NULL, 
        long_name VARCHAR, 
        standard_name VARCHAR, 
        units VARCHAR, 
        PRIMARY KEY (id)
);
CREATE INDEX ix_variables_name ON variables (name);
CREATE UNIQUE INDEX ix_variables_name_long_name_units ON variables (name, long_name, units);

Arguably this should also have an index columns for model and realm in case of variable name clashes between sub-models and models. In the original conception of the database it was only storing COSIMA data, so the same model and AFAIK there were no variable name overlaps between CICE and MOM5.

However if there are any other experiment types stored in the DB it may lead to more possibility of variable name clashes.

If you look for instances of multiple variable names with different definitions there are some troubling examples

sqlite> select * from variables where name not like "%time%" and name in (select name from variables group by name having count(*) > 1);
...
802|vh|Meridional Thickness Flux||m3 s-1
161|vh|Meridional thickness flux||m3 s-1
...
932|zoo|||
515|zoo|zoo||mmol/m^3
698|zoo|zoo||none
897|zoo|zooplankton||mmol/m^3

So vh is defined with slightly different long names!? How does that happen?

There are four different distinct versions of zoo (zooplankton) variables? How does this happen?

@aidanheerdegen
Copy link
Collaborator Author

aidanheerdegen commented Sep 21, 2023

Here are some examples of the four different zoo variables

id path
897 /g/data/ik11/outputs/access-om2/1deg_iamip2_his/output056/ocean/oceanbgc-3d-zoo-1-yearly-mean-y_2014.nc
515 /g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle5/output288/ocean/ocean_bgc_ann.nc
698 /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_bgc/restart050/ocean/csiro_bgc.res.nc
932 /g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126/restart070/ice/csiro_bgc.res.nc

The latter two are restart files, though it's a bit odd one is in the ice subdirectory, and the other is in ocean.

The first two are a bit of a mystery. Was there a code update for the 1deg_iamip2_his experiment? Looks like it was done with a bespoke build by @hakaseh:

https://github.com/hakaseh/1deg_jra55_iaf/blob/iamip2-his/manifests/exe.yaml#L15

The query for this:

select variables.id, variables.name, experiment, root_dir, ncfile 
from experiments  
        join ncfiles on experiments.id = ncfiles.experiment_id 
        join ncvars on ncvars.ncfile_id = ncfiles.id 
        join variables on  ncvars.variable_id = variables.id 
where variables.name = 'zoo';

@aidanheerdegen
Copy link
Collaborator Author

@aekiss should potential temperature and conservative temperature have different variable names? Or are they the same at the surface?

792|surface_temp|Conservative temperature|sea_surface_conservative_temperature|K
1453|surface_temp|Conservative temperature||deg_C
1618|surface_temp|Potential temperature|sea_surface_temperature|degrees K

@aekiss
Copy link
Collaborator

aekiss commented Sep 21, 2023

Potential and conservative temperature are different at the surface, so yes they should have distinct names.

@aidanheerdegen
Copy link
Collaborator Author

Just talked to Andrew, and apparently with MOM you can choose to have potential or conservative temperature as the prognostic variable, but the actual variable name does not change, though the long name will differ.

This is unfortunate for people who want to create databases mapping variable names to long names, standard names and units.

This means such look up tables have to be experiment specific AFAICT. Doh.

@hakaseh
Copy link

hakaseh commented Sep 22, 2023

Here are some examples of the four different zoo variables

id path
897 /g/data/ik11/outputs/access-om2/1deg_iamip2_his/output056/ocean/oceanbgc-3d-zoo-1-yearly-mean-y_2014.nc
515 /g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle5/output288/ocean/ocean_bgc_ann.nc
698 /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_bgc/restart050/ocean/csiro_bgc.res.nc
932 /g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126/restart070/ice/csiro_bgc.res.nc
The latter two are restart files, though it's a bit odd one is in the ice subdirectory, and the other is in ocean.

I agree that it is odd that csiro_bgc.res.nc is saved in both ice and ocean subdirectories. Only one is needed.

The first two are a bit of a mystery. Was there a code update for the 1deg_iamip2_his experiment? Looks like it was done with a bespoke build by @hakaseh:

https://github.com/hakaseh/1deg_jra55_iaf/blob/iamip2-his/manifests/exe.yaml#L15

I didn't remember changing the longnames, but looking at the commit history, it looks like they were added by @aekiss:

hakaseh/1deg_jra55_iaf@7deb65a

The query for this:

select variables.id, variables.name, experiment, root_dir, ncfile 
from experiments  
        join ncfiles on experiments.id = ncfiles.experiment_id 
        join ncvars on ncvars.ncfile_id = ncfiles.id 
        join variables on  ncvars.variable_id = variables.id 
where variables.name = 'zoo';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants