Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MySQL] Error if wanting to generate the Transitive Closure File #54

Open
rjalexa opened this issue Nov 4, 2022 · 11 comments
Open

[MySQL] Error if wanting to generate the Transitive Closure File #54

rjalexa opened this issue Nov 4, 2022 · 11 comments
Assignees
Labels

Comments

@rjalexa
Copy link

rjalexa commented Nov 4, 2022

MySQL 8
SnomedCT_InternationalRF2_PRODUCTION_20221031T120000Z.zip input file

script invoked as:
./load_release.sh ../../../SNOMEDCT_INT_202210/SnomedCT_InternationalRF2_PRODUCTION_20221031T120000Z.zip snomedct20221031 FULL

If I reply Yes to the
Calculate and store inferred transitive closure? [Y/N]:
I see the following happening:

Including transitive closure table - transclos
Archive:  ../../../SNOMEDCT_INT_202210/SnomedCT_InternationalRF2_PRODUCTION_20221031T120000Z.zip
  inflating: tmp_extracted/._Full    
  inflating: tmp_extracted/._Refset  
  inflating: tmp_extracted/._Terminology  
  inflating: tmp_extracted/._Content  
  inflating: tmp_extracted/._Language  
  inflating: tmp_extracted/._Map     
  inflating: tmp_extracted/._Metadata  
  inflating: tmp_extracted/sct2_RelationshipConcreteValues_Full_INT_20221031.txt  
  inflating: tmp_extracted/sct2_sRefset_OWLExpressionFull_INT_20221031.txt  
  inflating: tmp_extracted/sct2_TextDefinition_Full-en_INT_20221031.txt  
  inflating: tmp_extracted/sct2_StatedRelationship_Full_INT_20221031.txt  
  inflating: tmp_extracted/sct2_Description_Full-en_INT_20221031.txt  
  inflating: tmp_extracted/sct2_Relationship_Full_INT_20221031.txt  
  inflating: tmp_extracted/sct2_Identifier_Full_INT_20221031.txt  
  inflating: tmp_extracted/sct2_Concept_Full_INT_20221031.txt  
  inflating: tmp_extracted/der2_cRefset_AssociationFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cRefset_AttributeValueFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_Refset_SimpleFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cRefset_LanguageFull-en_INT_20221031.txt  
  inflating: tmp_extracted/der2_sRefset_SimpleMapFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_iisssccRefset_ExtendedMapFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cissccRefset_MRCMAttributeDomainFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_ciRefset_DescriptionTypeFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cciRefset_RefsetDescriptorFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_ssRefset_ModuleDependencyFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cRefset_MRCMModuleScopeFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_ssccRefset_MRCMAttributeRangeFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_sssssssRefset_MRCMDomainFull_INT_20221031.txt  

Generating Environment script for FULL type(s)

Generating loading script for 20221031
Unable to find der2_iissscRefset_ComplexMapFull_INT_20221031.txt or beta version, skipping...
Skipping
mysql: [Warning] Using a password on the command line interface can be insecure.

Ensuring schema snomedct20221031 exists

(re)Creating Schema using tmp_environment-mysql.sql
Generating Transitive Closure file...
can't open tmp_extracted/xsct2_Relationship_Snapshot_INT_20221031.txt at ./transitiveClosureRf2Snap_dbCompatible.pl line 65.
@pgwilliams
Copy link
Member

Oh yes, I think I see that. You've chosen to only import the Full files, but the transitive closure routine works with the Snapshot file. I've always imported 'A' for all files, and then the Snapshot file will be found. The reason that the xsct variant is being complained about is because it checks for sct2 first, and then xsct2 so the error message is being a bit lazy there.

I should add in a check that refuses to generate the transitive closure if the Snapshot files are not extracted, but in the meantime, please either select Snapshot or All if you want a transitive closure generated.

@pgwilliams pgwilliams self-assigned this Nov 4, 2022
@pgwilliams pgwilliams added the bug label Nov 4, 2022
@rjalexa
Copy link
Author

rjalexa commented Nov 4, 2022

Bear with my ignorance please.
I can only download the file mentioned in the first post following the SNOMED page links (https://uts.nlm.nih.gov/uts/).
So what is the meaning of your suggestion on A or Snapshot? Not sure I know how to get the latter.
Thanks a lot for your help.

@pgwilliams
Copy link
Member

pgwilliams commented Nov 4, 2022

Oh sorry, I said select 'A' but I see that this code wants you to write out 'ALL' for your command line argument, instead of 'FULL' as you've done above here. Alternatively SNAPSHOT will be quicker if you don't need to know about recent or historical changes.

The zip file should contain 3 sections, one for Delta (which is just the changes since the last release), the Snapshot (which is the current state of the ontology) and the Full files (which contain the complete history of the product dating back to 2002). Unless you're doing something very specific with historical records, or tracking changes to the product over time, you're more likely to want the Snapshot. That's SNOMED CT as it exists today.

"Full" is perhaps a misleading term for what those files are - they're not just everything that is, they're also everything that was.

@rjalexa
Copy link
Author

rjalexa commented Nov 4, 2022

Very clear. Thanks. I therefore invoked the script as follows:
./load_release.sh ../../../SNOMEDCT_INT_202210/SnomedCT_InternationalRF2_PRODUCTION_20221031T120000Z.zip snomedct20221031 SNAP
and works perfectly.
Also thanks to your indication I now have well understood I really want the snapshot since I'm not interested in the historical dimension.
Thanks a lot !!!!
PS Oh how I wish SNOMED was available in RDF/OWL !!!! :)

@rjalexa rjalexa closed this as completed Nov 4, 2022
@pgwilliams
Copy link
Member

Oh but it is! Or can be, with a transformation. See https://github.com/IHTSDO/snomed-owl-toolkit

Enjoy!

@rjalexa
Copy link
Author

rjalexa commented Nov 4, 2022

One last question if I may. The loading went smoothly and have SNOMED's tables in the database.
Using DBVisualizer I tried generating the ER view which shows the tables but no relationships between them. Maybe the load process should generate foreign keys and doesn't?

@rjalexa rjalexa reopened this Nov 4, 2022
@pgwilliams
Copy link
Member

pgwilliams commented Nov 4, 2022

Yes that would be a nice feature. We use Elasticsearch internally, so such improvements wouldn't have a direct business benefit for us I'm afraid. Hopefully someone will contribute such a thing as a pull request.

Here's a rough ER diagram showing the links between the core component types : https://confluence.ihtsdotools.org/display/DOCSTART/5.+SNOMED+CT+Logical+Model but also various refset members link to referenced concepts. If you have any specific questions I can answer those, or if it's just a more general "what's going on", then we have a ton of free education material available.

@rjalexa
Copy link
Author

rjalexa commented Nov 4, 2022

The OWL transformation tool worked flawlessly and personally loading it into Protegè to browse it gives me su much more insight to the monumental work than perusing the relational tables which at this point I'll probably drop.
I wonder how come RDF is not THE main format you use to disseminate this. So much more expressive than a relational schema IMHO.
Thanks a lot for your help.
PS I kind of sense the presence of the ghost :) of Barry Smith the ontologist behind BFO ... am I right? :)
PPS Stumbling on the concept of "Hang Glider" made my day. SNOMED almost describes all possible reality :) :)

@pgwilliams
Copy link
Member

Hi Robert. Yes there's no doubt that OWL is much more expressive (especially in terms of nesting), and a number of compromises were made in order to fit the simplified distribution standard of RF2. However, as I said in my email, RF2 does also give us a number of advantages around tracking changes over time and information that cannot be expressed in OWL and file size efficiency was more of a concern back in the day. That said, we do now use OWL (albeit chopped up into a Reference Set) for our Stated View.

We have had meetings with the BFO crowd and put substantial thought into aligning SNOMED CT with BFO. I can point you in the direction of this discussion if you're interested in reading further. But no, Barry Smith was not involved directly with the design of SNOMED CT as far as I'm aware.

Yes I often post a "Concept of the Week" on a Friday for my colleagues. My pick for Halloween was 111946000 |Burial alive (finding)|

@rjalexa
Copy link
Author

rjalexa commented Nov 8, 2022

So SNOMED aims to exist under a closed world assumption: what is not there does not exist :) :) (joking of course)
Thanks a lot.

@pgwilliams
Copy link
Member

No, it's an important point. It's one of the main distinctions with a classification like ICD where everything must exist in exactly one bucket, and all the buckets add up to 100%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants