[MySQL] Error if wanting to generate the Transitive Closure File #54

rjalexa · 2022-11-04T09:19:25Z

MySQL 8
SnomedCT_InternationalRF2_PRODUCTION_20221031T120000Z.zip input file

script invoked as:
./load_release.sh ../../../SNOMEDCT_INT_202210/SnomedCT_InternationalRF2_PRODUCTION_20221031T120000Z.zip snomedct20221031 FULL

If I reply Yes to the
Calculate and store inferred transitive closure? [Y/N]:
I see the following happening:

Including transitive closure table - transclos
Archive:  ../../../SNOMEDCT_INT_202210/SnomedCT_InternationalRF2_PRODUCTION_20221031T120000Z.zip
  inflating: tmp_extracted/._Full    
  inflating: tmp_extracted/._Refset  
  inflating: tmp_extracted/._Terminology  
  inflating: tmp_extracted/._Content  
  inflating: tmp_extracted/._Language  
  inflating: tmp_extracted/._Map     
  inflating: tmp_extracted/._Metadata  
  inflating: tmp_extracted/sct2_RelationshipConcreteValues_Full_INT_20221031.txt  
  inflating: tmp_extracted/sct2_sRefset_OWLExpressionFull_INT_20221031.txt  
  inflating: tmp_extracted/sct2_TextDefinition_Full-en_INT_20221031.txt  
  inflating: tmp_extracted/sct2_StatedRelationship_Full_INT_20221031.txt  
  inflating: tmp_extracted/sct2_Description_Full-en_INT_20221031.txt  
  inflating: tmp_extracted/sct2_Relationship_Full_INT_20221031.txt  
  inflating: tmp_extracted/sct2_Identifier_Full_INT_20221031.txt  
  inflating: tmp_extracted/sct2_Concept_Full_INT_20221031.txt  
  inflating: tmp_extracted/der2_cRefset_AssociationFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cRefset_AttributeValueFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_Refset_SimpleFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cRefset_LanguageFull-en_INT_20221031.txt  
  inflating: tmp_extracted/der2_sRefset_SimpleMapFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_iisssccRefset_ExtendedMapFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cissccRefset_MRCMAttributeDomainFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_ciRefset_DescriptionTypeFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cciRefset_RefsetDescriptorFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_ssRefset_ModuleDependencyFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_cRefset_MRCMModuleScopeFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_ssccRefset_MRCMAttributeRangeFull_INT_20221031.txt  
  inflating: tmp_extracted/der2_sssssssRefset_MRCMDomainFull_INT_20221031.txt  

Generating Environment script for FULL type(s)

Generating loading script for 20221031
Unable to find der2_iissscRefset_ComplexMapFull_INT_20221031.txt or beta version, skipping...
Skipping
mysql: [Warning] Using a password on the command line interface can be insecure.

Ensuring schema snomedct20221031 exists

(re)Creating Schema using tmp_environment-mysql.sql
Generating Transitive Closure file...
can't open tmp_extracted/xsct2_Relationship_Snapshot_INT_20221031.txt at ./transitiveClosureRf2Snap_dbCompatible.pl line 65.

The text was updated successfully, but these errors were encountered:

pgwilliams · 2022-11-04T09:37:29Z

Oh yes, I think I see that. You've chosen to only import the Full files, but the transitive closure routine works with the Snapshot file. I've always imported 'A' for all files, and then the Snapshot file will be found. The reason that the xsct variant is being complained about is because it checks for sct2 first, and then xsct2 so the error message is being a bit lazy there.

I should add in a check that refuses to generate the transitive closure if the Snapshot files are not extracted, but in the meantime, please either select Snapshot or All if you want a transitive closure generated.

rjalexa · 2022-11-04T09:47:44Z

Bear with my ignorance please.
I can only download the file mentioned in the first post following the SNOMED page links (https://uts.nlm.nih.gov/uts/).
So what is the meaning of your suggestion on A or Snapshot? Not sure I know how to get the latter.
Thanks a lot for your help.

pgwilliams · 2022-11-04T10:04:11Z

Oh sorry, I said select 'A' but I see that this code wants you to write out 'ALL' for your command line argument, instead of 'FULL' as you've done above here. Alternatively SNAPSHOT will be quicker if you don't need to know about recent or historical changes.

The zip file should contain 3 sections, one for Delta (which is just the changes since the last release), the Snapshot (which is the current state of the ontology) and the Full files (which contain the complete history of the product dating back to 2002). Unless you're doing something very specific with historical records, or tracking changes to the product over time, you're more likely to want the Snapshot. That's SNOMED CT as it exists today.

"Full" is perhaps a misleading term for what those files are - they're not just everything that is, they're also everything that was.

rjalexa · 2022-11-04T10:09:41Z

Very clear. Thanks. I therefore invoked the script as follows:
./load_release.sh ../../../SNOMEDCT_INT_202210/SnomedCT_InternationalRF2_PRODUCTION_20221031T120000Z.zip snomedct20221031 SNAP
and works perfectly.
Also thanks to your indication I now have well understood I really want the snapshot since I'm not interested in the historical dimension.
Thanks a lot !!!!
PS Oh how I wish SNOMED was available in RDF/OWL !!!! :)

pgwilliams · 2022-11-04T10:12:49Z

Oh but it is! Or can be, with a transformation. See https://github.com/IHTSDO/snomed-owl-toolkit

Enjoy!

rjalexa · 2022-11-04T10:59:08Z

One last question if I may. The loading went smoothly and have SNOMED's tables in the database.
Using DBVisualizer I tried generating the ER view which shows the tables but no relationships between them. Maybe the load process should generate foreign keys and doesn't?

pgwilliams · 2022-11-04T11:56:53Z

Yes that would be a nice feature. We use Elasticsearch internally, so such improvements wouldn't have a direct business benefit for us I'm afraid. Hopefully someone will contribute such a thing as a pull request.

Here's a rough ER diagram showing the links between the core component types : https://confluence.ihtsdotools.org/display/DOCSTART/5.+SNOMED+CT+Logical+Model but also various refset members link to referenced concepts. If you have any specific questions I can answer those, or if it's just a more general "what's going on", then we have a ton of free education material available.

rjalexa · 2022-11-04T16:12:43Z

The OWL transformation tool worked flawlessly and personally loading it into Protegè to browse it gives me su much more insight to the monumental work than perusing the relational tables which at this point I'll probably drop.
I wonder how come RDF is not THE main format you use to disseminate this. So much more expressive than a relational schema IMHO.
Thanks a lot for your help.
PS I kind of sense the presence of the ghost :) of Barry Smith the ontologist behind BFO ... am I right? :)
PPS Stumbling on the concept of "Hang Glider" made my day. SNOMED almost describes all possible reality :) :)

pgwilliams · 2022-11-08T16:34:27Z

Hi Robert. Yes there's no doubt that OWL is much more expressive (especially in terms of nesting), and a number of compromises were made in order to fit the simplified distribution standard of RF2. However, as I said in my email, RF2 does also give us a number of advantages around tracking changes over time and information that cannot be expressed in OWL and file size efficiency was more of a concern back in the day. That said, we do now use OWL (albeit chopped up into a Reference Set) for our Stated View.

We have had meetings with the BFO crowd and put substantial thought into aligning SNOMED CT with BFO. I can point you in the direction of this discussion if you're interested in reading further. But no, Barry Smith was not involved directly with the design of SNOMED CT as far as I'm aware.

Yes I often post a "Concept of the Week" on a Friday for my colleagues. My pick for Halloween was 111946000 |Burial alive (finding)|

rjalexa · 2022-11-08T16:38:02Z

So SNOMED aims to exist under a closed world assumption: what is not there does not exist :) :) (joking of course)
Thanks a lot.

pgwilliams · 2022-11-08T17:00:05Z

No, it's an important point. It's one of the main distinctions with a classification like ICD where everything must exist in exactly one bucket, and all the buckets add up to 100%.

pgwilliams self-assigned this Nov 4, 2022

pgwilliams added the bug label Nov 4, 2022

rjalexa closed this as completed Nov 4, 2022

rjalexa reopened this Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MySQL] Error if wanting to generate the Transitive Closure File #54

[MySQL] Error if wanting to generate the Transitive Closure File #54

rjalexa commented Nov 4, 2022

pgwilliams commented Nov 4, 2022

rjalexa commented Nov 4, 2022

pgwilliams commented Nov 4, 2022 •

edited

Loading

rjalexa commented Nov 4, 2022

pgwilliams commented Nov 4, 2022

rjalexa commented Nov 4, 2022

pgwilliams commented Nov 4, 2022 •

edited

Loading

rjalexa commented Nov 4, 2022 •

edited

Loading

pgwilliams commented Nov 8, 2022

rjalexa commented Nov 8, 2022

pgwilliams commented Nov 8, 2022

[MySQL] Error if wanting to generate the Transitive Closure File #54

[MySQL] Error if wanting to generate the Transitive Closure File #54

Comments

rjalexa commented Nov 4, 2022

pgwilliams commented Nov 4, 2022

rjalexa commented Nov 4, 2022

pgwilliams commented Nov 4, 2022 • edited Loading

rjalexa commented Nov 4, 2022

pgwilliams commented Nov 4, 2022

rjalexa commented Nov 4, 2022

pgwilliams commented Nov 4, 2022 • edited Loading

rjalexa commented Nov 4, 2022 • edited Loading

pgwilliams commented Nov 8, 2022

rjalexa commented Nov 8, 2022

pgwilliams commented Nov 8, 2022

pgwilliams commented Nov 4, 2022 •

edited

Loading

pgwilliams commented Nov 4, 2022 •

edited

Loading

rjalexa commented Nov 4, 2022 •

edited

Loading