Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FY2025 Digital Bedrock Content Ingest #2695

Open
18 of 20 tasks
carakey opened this issue Nov 15, 2024 · 1 comment
Open
18 of 20 tasks

FY2025 Digital Bedrock Content Ingest #2695

carakey opened this issue Nov 15, 2024 · 1 comment
Assignees
Labels
Content Priority: High These are issues that should be prioritized for upcoming development efforts

Comments

@carakey
Copy link

carakey commented Nov 15, 2024

In Fall 2024 OSULP began a partnership with Digital Bedrock for digital preservation services. For SA@OSU, in the first phase, this will be a functional replacement for MetaArchive (which is being sunsetted); we are backing up two "Collections," our Graduate Theses and Dissertations, and our Extension and Experiment Station Publications. SCARC and the DPU are also submitting material to DB. Digital Bedrock will be shipping drives to OSULP and providing software training in November or December.

This ticket tracks work to ingest the two SA@OSU collections to Digital Bedrock.

  • Finalize model for SA Collections' file and metadata structures
  • Finalize tools for local copying
  • Harvest GTD and EESC content to local storage
    • Run harvesting script
    • Manually fetch restricted items
    • Handle compounds
  • Receive Digital Bedrock drive
  • Receive training for Digital Bedrock "Package Creator" tool
  • Set up SIPs for Digital Bedrock drive, including manifest document
    • EESC SIP
    • GTD SIP
  • Verify manifest / cleanup any missing data
    • EESC SIP
    • GTD SIP
  • Copy GTD and EESC content to Digital Bedrock drive
    • EESC SIP
    • GTD SIP
  • Ship drive back to Digital Bedrock
  • Verify ingest to Digital Bedrock storage system
  • Archive a report & manifest in SA Google Drive
@carakey carakey added Content Priority: High These are issues that should be prioritized for upcoming development efforts labels Nov 15, 2024
@carakey carakey self-assigned this Nov 15, 2024
@carakey
Copy link
Author

carakey commented Nov 20, 2024

basic data model:

{work_pid}/
  |- {work_pid}-adminMeta.yaml     # select solr data
  |- {work_pid}-descMeta.nt        # RDF metadata for work
  |- {fileset_pid}-checksum.md5    # checksum from solr
  |- {fileset_pid}-filesetMeta.nt  # RDF metadata for fileset
  |- {fileset_pid}.{ext}           # asset file

---variations---

works with multiple filesets:

{work_pid}/
  |- {work_pid}-adminMeta.yaml       # select solr data
  |- {work_pid}-descMeta.nt          # RDF metadata for work
  |- {fileset_1_pid}-checksum.md5    # checksum from solr - 1st fileset
  |- {fileset_1_pid}-filesetMeta.nt  # RDF metadata - 1st fileset
  |- {fileset_1_pid}.{ext}           # asset file - 1st fileset
  |- {fileset_2_pid}-checksum.md5    # checksum from solr - 2nd fileset
  |- {fileset_2_pid}-filesetMeta.nt  # RDF metadata - 2nd fileset
  |- {fileset_2_pid}.{ext}           # asset file - 2nd fileset
  |- (...)                           # repeat checksum + filesetMeta + asset for each fileset


compound objects (nested works):

{parent_work_pid}/
  |- {parent_work_pid}-adminMeta.yaml       # select solr data - parent work
  |- {parent_work_pid}-descMeta.nt          # RDF metadata - parent work 
  |- {child_work_1_pid}/
  	   |- {child_work_1_pid}-adminMeta.yaml     # select solr data - 1st child work
  	   |- {child_work_1_pid}-descMeta.nt        # RDF metadata - 1st child work
  	   |- {fileset_pid}-checksum.md5            # checksum from solr - fileset under 1st child work
  	   |- {fileset_pid}-filesetMeta.nt          # RDF metadata - fileset under 1st child work
  	   |- {fileset_pid}.{ext}                   # asset file - fileset under 1st child work
  |- {child_work_2_pid}/
  	   |- {child_work_2_pid}-adminMeta.yaml     # select solr data - 2nd child work
  	   |- {child_work_2_pid}-descMeta.nt        # RDF metadata - 2nd child work
  	   |- {fileset_pid}-checksum.md5            # checksum from solr - fileset under 2nd child work
  	   |- {fileset_pid}-filesetMeta.nt          # RDF metadata - fileset under 2nd child work
  	   |- {fileset_pid}.{ext}                   # asset file - fileset under 2nd child work
  |- (...)                                  # repeat directory for each child work

compound objects (nested works) which also have one or more filesets:

{parent_work_pid}/
  |- {parent_work_pid}-adminMeta.yaml       # select solr data - parent work
  |- {parent_work_pid}-descMeta.nt          # RDF metadata - parent work 
  |- {fileset_pid}-checksum.md5    # checksum from solr - fileset for parent work
  |- {fileset_pid}-filesetMeta.nt  # RDF metadata - fileset for parent work
  |- {fileset_pid}.{ext}           # asset file for parent work
  |- (...)
  |- {child_work_1_pid}/
  	   |- {child_work_1_pid}-adminMeta.yaml     # select solr data - 1st child work
  	   |- {child_work_1_pid}-descMeta.nt        # RDF metadata - 1st child work
  	   |- {fileset_pid}-checksum.md5            # checksum from solr - fileset under 1st child work
  	   |- {fileset_pid}-filesetMeta.nt          # RDF metadata - fileset under 1st child work
  	   |- {fileset_pid}.{ext}                   # asset file - fileset under 1st child work
  |- {child_work_2_pid}/
  	   |- {child_work_2_pid}-adminMeta.yaml     # select solr data - 2nd child work
  	   |- {child_work_2_pid}-descMeta.nt        # RDF metadata - 2nd child work
  	   |- {fileset_pid}-checksum.md5            # checksum from solr - fileset under 2nd child work
  	   |- {fileset_pid}-filesetMeta.nt          # RDF metadata - fileset under 2nd child work
  	   |- {fileset_pid}.{ext}                   # asset file - fileset under 2nd child work
  |- (...)                                  # repeat directory for each child work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content Priority: High These are issues that should be prioritized for upcoming development efforts
Projects
None yet
Development

No branches or pull requests

1 participant