Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor processQualityRequest method in Controller class to remove system metadata related arguments #458

Closed
doulikecookiedough opened this issue Jan 8, 2025 · 3 comments

Comments

@doulikecookiedough
Copy link

doulikecookiedough commented Jan 8, 2025

processQualityRequest is called by both the metadig-webapp and the Controller itself when running quality checks. It's signature looks like such:

public void processQualityRequest(String memberNode,
            String metadataPid,
            InputStream metadata,
            String qualitySuiteId,
            String localFilePath,
            DateTime requestDateTime,
            InputStream systemMetadata) throws java.io.IOException {
            ...
            }

The system metadata for a given pid should be available directly through a quality check through hashstore moving forward. This means that processQualityRequest should not require data objects or system metadata and its related arguments in the signature, and access to it should be delegated to the checks itself (which are to be updated)

public void processQualityRequest(String memberNode,
            String metadataPid,
            String qualitySuiteId,
            String localFilePath,
            DateTime requestDateTime,
            InputStream systemMetadata) throws java.io.IOException {
            ...
            }

Refactor this method by removing the following arguments and ensuring that the RabbitMQ message that gets sent via a QueueEntry still contains what is required to proceed with a check.

  • InputStream metadata
  • InputStream systemMetadata
@doulikecookiedough doulikecookiedough changed the title Refactor processQualityRequest method in 'Controller' class to remove system metadata related arguments Refactor processQualityRequest method in Controller class to remove system metadata related arguments Jan 8, 2025
@doulikecookiedough
Copy link
Author

doulikecookiedough commented Jan 10, 2025

Flow chart below to assist with removing InputStream metadata and InputStream systemMetadata.

  • To Do: Determine what exactly InputStream metadata is used for and what it represents. If it's more than just a string, how does it get unpacked?
    • It seems that it is just a string based on how it's supplied by the Controller class
    • But when MonitorJob supplies the value, it appears to be a stream to a data object based on the variable names and logging comments.

In the MonitorJob and Controller classes, we have code that produces a metadata InputStream:

// Controller.java

case "quality":
    log.debug("Processing quality request");
    String metadataPid = tokens[0];

    File metadataFile = new File(tokens[1]);
    InputStream metadata = new FileInputStream(metadataFile);
    ...
    metadigCtrl.processQualityRequest(nodeId, metadataPid, metadata, suiteId, "/tmp",
                                    requestDateTime, sysmeta);
// MonitorJob.java
// .execute()
...

try {
    metadata = getMetadata(run, session, store);
}
...
try {
    controller.processQualityRequest(nodeId, pidStr, metadata, suiteId,
            localFilePath, requestDateTime,
            sysmeta);
} catch (IOException io) {
    JobExecutionException jee = new JobExecutionException("Monitor: Error processing quality request.");
    jee.initCause(io);
    throw jee;
}
// MonitorJob.java
// .getMetadata()
try {
    if (isCN) {
        objectIS = cnNode.get(session, pid);
    } else {
        objectIS = mnNode.get(session, pid);
    }
    log.debug("Monitor: Retrieved metadata object for pid: " + pidStr);
}
...
return objectIS;
flowchart TD
    Start(["processQualityRequest(...)"]) --> LogRequest[Log request details]
    LogRequest --> InitializeVars[Initialize variables: qEntry, sysmeta, message, runXML]
    InitializeVars --> ReadMetadata[Read metadata InputStream to UTF-8 string]
    ReadMetadata --> TrySysMeta{Attempt to unmarshal system metadata to tmpSysmeta}
    TrySysMeta -- Success --> SysMetaType{Is system metadata v1?}
    SysMetaType -- Yes --> ConvertSysMeta[Convert v1 metadata to v2 format]
    SysMetaType -- No --> AssignSysMeta[Assign v2 system metadata directly]
    TrySysMeta -- Failure --> LogError[Log error and continue]

    ConvertSysMeta --> CreateQueueEntry
    AssignSysMeta --> CreateQueueEntry
    LogError --> CreateQueueEntry

    CreateQueueEntry[Create QueueEntry object with metadata and system metadata] --> SerializeQueueEntry[Serialize QueueEntry to byte array]
    SerializeQueueEntry --> WriteToQueue[Write message to InProcess queue]
    WriteToQueue --> LogCompletion[Log completion message]
    LogCompletion --> End([End])
Loading

@doulikecookiedough
Copy link
Author

doulikecookiedough commented Jan 10, 2025

The metadata doc provided is an EML document.

// A resourceMap contains RDF triples

[INFO]: Running suite 'FAIR-suite-0.4.0' for metadata pid urn:uuid:99a4e93f-964e-4cdc-9963-c5d4ea92bf35, for metadataDoc: <eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd" packageId="urn:uuid:99a4e93f-964e-4cdc-9963-c5d4ea92bf35" system="knb"><dataset id="urn-uuid-2971be78-f247-4ed2-8a50-167d14ea0e8e"><title>Test Dataset Jan Ten</title><creator id="1891992087315451"><individualName><givenName>Dou</givenName><surName>Mok</surName></individualName><userId directory="https://orcid.org">https://orcid.org/0000-0002-6076-8092</userId></creator><abstract><para>A very brief overview.</para></abstract><intellectualRights><para>This work is dedicated to the public domain under the Creative Commons Universal 1.0 Public Domain Dedication. To view a copy of this dedication, visit https://creativecommons.org/publicdomain/zero/1.0/.</para></intellectualRights><coverage><geographicCoverage><geographicDescription>UCSB</geographicDescription><boundingCoordinates><westBoundingCoordinate>-119.848946</westBoundingCoordinate><eastBoundingCoordinate>-119.848946</eastBoundingCoordinate><northBoundingCoordinate>34.413963</northBoundingCoordinate><southBoundingCoordinate>34.413963</southBoundingCoordinate></boundingCoordinates></geographicCoverage><temporalCoverage><singleDateTime><calendarDate>2024-01-09</calendarDate></singleDateTime></temporalCoverage></coverage><annotation><propertyURI label="Data Sensitivity Category">http://purl.dataone.org/odo/SENSO_00000005</propertyURI><valueURI label="Non-sensitive data">http://purl.dataone.org/odo/SENSO_00000002</valueURI></annotation><contact id="9590172592980764"><individualName><givenName>Dou</givenName><surName>Mok</surName></individualName><electronicMailAddress>[email protected]</electronicMailAddress><userId directory="https://orcid.org">https://orcid.org/0000-0002-6076-8092</userId></contact><otherEntity id="urn-uuid-2516cb78-c802-4820-8a9a-bc61224843fc"><entityName>Moon_et_al_isotope_data.csv</entityName><entityType>text/csv</entityType></otherEntity></dataset></eml:eml> [edu.ucsb.nceas.mdqengine.Worker:553]

The processes in which this document is used for and parsed in XMLDialect, does not seem to be part of a check. If we want to eliminate this value from being passed, we may need to consider importing the HashStore-java library. So when runCheck is called and parses a resource map for values and information, it grabs the eml document it needs directly using .retrieveObject() with the metadataPid.

Additionally, runCheck is where the system metadata is parsed as well - which could then be retrieved from hashstore directly if we have already imported the library to retrieve the EML document/resourceMap. The actual checks themselves will use hashstore directly to access the data objects and metadata files required to complete a check.

@jeanetteclark
Copy link
Collaborator

closing, as this is not necessary. we need the sysmeta in this method and the only change needed is to change how it is originally retrieved (covered in #457)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants