Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package per version storage #33

Open
zimeon opened this issue Mar 29, 2019 · 28 comments
Open

Package per version storage #33

zimeon opened this issue Mar 29, 2019 · 28 comments
Labels
Component: Specification Confirmed: In-scope Use case will be included in the upcoming version of the spec or implementation notes.

Comments

@zimeon
Copy link
Contributor

zimeon commented Mar 29, 2019

In cases where there are many small files in an object or where the storage infrastructure is not efficient at handling many files, it is useful to package files using a technology such as ZIP. This is addressed for the whole object in #10. However, packaging the whole object as a ZIP/Tar etc. breaks the idea of immutability of version data. One could instead package the inventory and content for each new version as a new ZIP file.

@zimeon zimeon added the Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. label Mar 29, 2019
@zimeon
Copy link
Contributor Author

zimeon commented Mar 29, 2019

This could be something along the lines of:

[object root]
    ├── 0=ocfl_object_1.0
    ├── inventory.json
    ├── inventory.json.sha512
    ├── v1.zip
    ├── v2.zip
    └── v3.zip

this still leaves three potentially small files per object (though inventory.json might not be) but avoids any small files in the object's contents appearing alone in storage, while each v#.zip is immutable.

@rosy1280 rosy1280 added Proposed: In-Scope V2 and removed Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. labels Nov 11, 2020
@rosy1280 rosy1280 added Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. and removed Proposed: In-Scope V2 labels Nov 17, 2020
@rosy1280
Copy link
Contributor

potentially a sub-use case of #39

@ThomasEdvardsen
Copy link

ThomasEdvardsen commented Jul 2, 2021

Hello everybody! The National Library of Norway is in the process of installing a new bit repository (HPSS) that can hold 44 PB of data. In this context, we are considering using OCFL to organize our data packages.

So far, OCFL looks very good, but we are dependent on ZIP per version storage #33 being resolved to be able to use OCFL. This is because we want to limit the number of files so that it becomes more efficient to store/retrieve data from HPSS.

I reckon this needs to be solved using an object extension? Do you have any thoughts on how this can be implemented?

@ThomasEdvardsen
Copy link

ThomasEdvardsen commented Jul 2, 2021

We have begun to think about how this can be implemented based on our needs. This is a very immature first proposal for a new object extension.

We would like to discuss the following:

  1. Whether or not to include a full path to the archived files.
  2. Whether the archive files should be placed on the object's root, or in separate version folders.
  3. Whether a version can consist of more than one archive file.

Arguments for allowing more than one file for each version:

  • Split very large files into smaller files
  • Make it easier to transfer large objects, so you do not have to restart completely if a file fails while uploading.

What are your initial thoughts?

[object root]
├── 0=ocfl_object_1.0
├── extensions/
│   └── nnnn-archived-versions/
│       ├── archived-versions.json
│       └── archived-versions.json.sha512
├── inventory.json
├── inventory.json.sha512
├── v1/
│   ├── v1-1.zip
│   ├── v1-2.zip
│   └── v1-3.zip
├── v2/
│   └── v2-1.zip
└── v3/
    ├── v3-1.zip
    └── v3-2.zip  

Example content of archived-versions.json

{
  "id": "zipped_updates_three_versions_one_file",
  "versions": {
    "v1": {
      "created": "2019-01-01T02:03:04.000Z",
      "archiveAlgorithm": {
        "mime": "application/zip",
        "pronomId": "x-fmt/263"
      },
      "digestAlgorithm": "sha512",
      "files": {
        "0675bdf376e92e9994612c33ea255b12f7": {
          "filePath": "/hpss/storage-root-01/1ec1/8fe/5cd/zipped_updates_three_versions_one_file/v1/v1-1.zip",
          "fileSize": 133410430
        },
        "0675b1ff76e92e9994612c33ea255b12f7": {
          "digestHex": "/hpss/storage-root-01/1ec1/8fe/5cd/zipped_updates_three_versions_one_file/v1/v1-2.zip",
          "fileSize": 520430330
        },
        "067ab1f376e92e9994612c33ea255b12f7": {
          "digestHex": "/hpss/storage-root-01/1ec1/8fe/5cd/zipped_updates_three_versions_one_file/v1/v1-3.zip",
          "fileSize": 8353634100
        }
      }
    },
    "v2": {
      "created": "2020-02-02T02:03:04.000Z",
      "archiveAlgorithm": {
        "mime": "application/zip",
        "pronomId": "x-fmt/263"
      },
      "digestAlgorithm": "sha512",
      "files": {
        "5b23ffdf2709bf393a7d8883fcdf583980": {
          "filePath": "/hpss/storage-root-01/1ec1/8fe/5cd/zipped_updates_three_versions_one_file/v2/v2-1.zip",
          "fileSize": 42644244
        }
      }
    },
    "v3": {
      "created": "2021-03-03T02:03:04.000Z",
      "archiveAlgorithm": {
        "mime": "application/zip",
        "pronomId": "x-fmt/263"
      },
      "digestAlgorithm": "sha512",
      "files": {
        "88492082026f1a3a1c0637d6bd02214dd6": {
          "filePath": "/hpss/storage-root-01/1ec1/8fe/5cd/zipped_updates_three_versions_one_file/v3/v3-1.zip",
          "fileSize": 8743244
        },
        "3a1c0637d6bd02214dd62c5c19ee8d4bbf": {
          "digestHex": "/hpss/storage-root-01/1ec1/8fe/5cd/zipped_updates_three_versions_one_file/v3/v3-2.zip",
          "fileSize": 892345
        }
      }
    }
  }
}

@zimeon
Copy link
Contributor Author

zimeon commented Jul 14, 2021

I support the idea that any solution for packaged content should include support for multiple packages in a version (v1-1.zip, v1-2.zip etc) so that it could address right-sizing both groups of small files and segmenting large files #40)

I think the biggest question is where one describes the logical files vs the physical files (packages). I lean toward having the inventory describe the physical files and thus providing the infrastructure for preservation/fixity/transfer, and then create some new way to describe the logical object content in a way that doesn't make those other processes too cumbersome in the case of objects with large numbers of files. This would potentially mean significant changes in the state ideas that currently map physical to logical files in an object version.

@pwinckles
Copy link

I support the idea that any solution for packaged content should include support for multiple packages in a version (v1-1.zip, v1-2.zip etc) so that it could address right-sizing both groups of small files and segmenting large files #40)

If the spec adds support for zipped versions, does it necessarily need to make special mention of split zips, which are already part of the zip spec?

@julianmorley
Copy link
Contributor

julianmorley commented Jul 14, 2021

If the spec adds support for zipped versions, does it necessarily need to make special mention of split zips, which are already part of the zip spec?

I'd lean towards 'yes', based on our experiences doing something similar with Preservation Catalog. At the end of the day OCFL tracks files and their checksums. It doesn't know, for example, that a .zip file contains information that points to other zip segments, and we want a human reading the manifest to be able to see that the version directory should contain 10 files (file.zip, file.z01, file.z02, ... etc) without having to wonder if the single file.zip in the directory is meant to be just one file or the first in a series of zip segments.

My early guess is that, in OCFL v2, we'll expand inventory.json to be able to say "this version of this physical representation of this object is stored as a zip archive with these parameters", and list out all the zip parts and their checksums, together with a sidecar file that lists all the files in those zips (and their checksums).

@ThomasEdvardsen
Copy link

I just want to point out that we at NLN do not necessarily want to use split-zips to package small files. We may choose to package them in independent individual zip files. Then they are perhaps a little less prone to problems if one of the zip files should become corrupt. For splitting very large files, split-zips may be appropriate.

I therefore see it as an advantage if we do not lock the specification to only support split-zips.

@julianmorley
Copy link
Contributor

We'll be sure to not mandate split-zips. We (Stanford) only split on versions greater than 10GB in our (non-OCFL) implementation of archival objects. Anything less than that goes into a single zip file. We'll probably include a way to specify a per-repo or per-object size at which the object-version would be split into multiple zips.

@qqmyers
Copy link

qqmyers commented Dec 6, 2022

+1 from the Dataverse community. We're using Bags (1 per version, versions created and archived independently over time) today and are interested in OCFL as a way to reduce storage size (via deduplication/forward deltas) but we'd like to retain the write-only, ~one-file-per-version paradigm we have today. I think that is this use case, although the archived-versions.json file discussed above, where info about all versions is one file, would not be write-only (when versions are added over time.)

@neilsjefferies
Copy link
Member

@qqmyers I think we can have an analogous mechanism to the way we treat inventories. Each version could contain a (by definition write-only) copy of the archived-versions.json but there is a separate copy elsewhere that contains the current state.

@rosy1280 rosy1280 added Component: Specification Confirmed: In-scope Use case will be included in the upcoming version of the spec or implementation notes. Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. and removed Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. Confirmed: In-scope Use case will be included in the upcoming version of the spec or implementation notes. labels Sep 22, 2023
@rosy1280 rosy1280 changed the title ZIP per version storage Package per version storage Sep 22, 2023
@zimeon
Copy link
Contributor Author

zimeon commented Sep 22, 2023

Editors' discussion 2023-09-22:

  • Example, object includes three versions: v1.zip, v2.zip, v3.zip
  • Also optional support for multiple zip files per version
  • Should support objects with mixed versions, i.e. some versions zipped, others not
  • Suggestion: a new block would be added to the inventory.json, i.e. ‘zip-manifest’, the presence of which indicated that for each specified version the contents are in one or more zip files instead of in a version directory. Unpacking the zip file(s) would yield the same set of file as a non-zipped version. Otherwise, the inventory.json will represent the object in its unzipped form.
  • Top level folder of a version zip should be v1,v2,v3 so they can be “unpacked in situ”
  • Generalize from zip to package since there might be other packaging approaches

@ThomasEdvardsen
Copy link

ThomasEdvardsen commented Sep 25, 2023

I think this suggestion could be really good, and solve how our organization can use OCFL.

So to be sure - is the new suggested block at top level or at version level? I have made a proposal where the new package block is at the version level. The only drawback I can think of is that it is only possible to have one checksum for each package file. But that might not be a problem.

So, using the example from the OCFL specification:

[object root]
├── 0=ocfl_object_1.1
├── inventory.json
├── inventory.json.sha512
├── v1
│   ├── inventory.json
│   ├── inventory.json.sha512
│   └── content
│       ├── empty.txt
│       ├── foo
│       │   └── bar.xml
│       └── image.tiff
├── v2
│   ├── inventory.json
│   ├── inventory.json.sha512
│   └── content
│       └── foo
│           └── bar.xml
└── v3
    ├── inventory.json
    └── inventory.json.sha512

The same object packed with TAR:

[object root]
├── 0=ocfl_object_1.1
├── inventory.json
├── inventory.json.sha512
├── v1.tar
├── v2.tar
└── v3.tar

v1.tar will unpack to:

v1
├── inventory.json
├── inventory.json.sha512
└── content
    ├── empty.txt
    ├── foo
    │   └── bar.xml
    └── image.tiff

Example of inventory.json with new packageManifest blocks added:

{
  "digestAlgorithm": "sha512",
  "fixity": {
    "md5": {
      "184f84e28cbe75e050e9c25ea7f2e939": [ "v1/content/foo/bar.xml" ],
      "2673a7b11a70bc7ff960ad8127b4adeb": [ "v2/content/foo/bar.xml" ],
      "c289c8ccd4bab6e385f5afdd89b5bda2": [ "v1/content/image.tiff" ],
      "d41d8cd98f00b204e9800998ecf8427e": [ "v1/content/empty.txt" ]
    },
    "sha1": {
      "66709b068a2faead97113559db78ccd44712cbf2": [ "v1/content/foo/bar.xml" ],
      "a6357c99ecc5752931e133227581e914968f3b9c": [ "v2/content/foo/bar.xml" ],
      "b9c7ccc6154974288132b63c15db8d2750716b49": [ "v1/content/image.tiff" ],
      "da39a3ee5e6b4b0d3255bfef95601890afd80709": [ "v1/content/empty.txt" ]
    }
  },
  "head": "v3",
  "id": "ark:/12345/bcd987",
  "manifest": {
    "4d27c8...b53": [ "v2/content/foo/bar.xml" ],
    "7dcc35...c31": [ "v1/content/foo/bar.xml" ],
    "cf83e1...a3e": [ "v1/content/empty.txt" ],
    "ffccf6...62e": [ "v1/content/image.tiff" ]
  },
  "type": "https://ocfl.io/1.1/spec/#inventory",
   "versions": {
    "v1": {
      "created": "2018-01-01T01:01:01Z",
      "message": "Initial import",
      "state": {
        "7dcc35...c31": [ "foo/bar.xml" ],
        "cf83e1...a3e": [ "empty.txt" ],
        "ffccf6...62e": [ "image.tiff" ]
      },
      "packageManifest": {
        "a2b5f8...d97": [ "v1.tar" ]
      },      
      "user": {
        "address": "mailto:[email protected]",
        "name": "Alice"
      }
    },
    "v2": {
      "created": "2018-02-02T02:02:02Z",
      "message": "Fix bar.xml, remove image.tiff, add empty2.txt",
      "state": {
        "4d27c8...b53": [ "foo/bar.xml" ],
        "cf83e1...a3e": [ "empty.txt", "empty2.txt" ]
      },
      "packageManifest": {
        "c1e4d3...f82": [ "v2.tar" ]
      },      
      "user": {
        "address": "mailto:[email protected]",
        "name": "Bob"
      }
    },
    "v3": {
      "created": "2018-03-03T03:03:03Z",
      "message": "Reinstate image.tiff, delete empty.txt",
      "state": {
        "4d27c8...b53": [ "foo/bar.xml" ],
        "cf83e1...a3e": [ "empty2.txt" ],
        "ffccf6...62e": [ "image.tiff" ]
      },
      "packageManifest": {
        "6f4a1d...b58": [ "v3-1.tar" ],
        "9e7c2f...a61": [ "v3-2.tar" ]
      },      
      "user": {
        "address": "mailto:[email protected]",
        "name": "Cecilia"
      }
    }
  }
}

@je4
Copy link

je4 commented Sep 26, 2023

[object root]
├── 0=ocfl_object_1.1
├── inventory.json
├── inventory.json.sha512
├── v1.tar
├── v2.tar
└── v3.tar

This variant could be a bit problematic based on the fact, that inventory.json of the last version (if available) MUST be the same as the inventory.json within the object root. ( https://ocfl.io/1.1/spec/#version-inventory )

Solution could be to get rid of the MUST within the standard or to pack only the content folder of the version, which means, that all inventory.json are aware of the package.

@neilsjefferies
Copy link
Member

I'm coming round the the idea of a separate package-inventory.json file. Then we can decide to zip or unzip a version at any time without having a new inventory.json. It's presence/absence would also be an easy indicator of the existence of packaged versions.

@ThomasEdvardsen
Copy link

I'm coming round the the idea of a separate package-inventory.json file. Then we can decide to zip or unzip a version at any time without having a new inventory.json. It's presence/absence would also be an easy indicator of the existence of packaged versions.

My original thought was to create this as an extension, as I suggested with the archived-versions.json file. I think including this as part of the standard implementation is even better. What are your thoughts on expansion or including it in the standard implementation @neilsjefferies ?

@neilsjefferies
Copy link
Member

@ThomasEdvardsen Editors decided there was enough interest and use cases that it this was in-scope for OCFL V2 discussions.

@rosy1280
Copy link
Contributor

Feedback on Use Cases

In advance of version 2 of the OCFL, we are soliciting feedback on use cases. Please feel free to add your thoughts on this use case via the comments.

Polling on Use Cases

In addition to reviewing comments, we are doing an informal poll for each use case that has been tagged as Proposed: In Scope for version 2. You can contribute to the poll for this use case by reacting to this comment. The following reactions are supported:

In favor of the use case Against the use case Neutral on the use case
👍🏼 👎🏼 👀

The poll will remain open through the end of February 2024.

@MormonJesus69420
Copy link

We (@ThomasEdvardsen, @je4, and I) have worked on a set of proposals for this use case, along with some questions. You can find them here:
OCFL Package Per Version Workgroup Notes

@zimeon
Copy link
Contributor Author

zimeon commented Feb 29, 2024

2024-02-29 Editor's agree that this should be in-scope for v2. Voting at this point is +9 in favor.

@zimeon zimeon added Confirmed: In-scope Use case will be included in the upcoming version of the spec or implementation notes. and removed Proposed: In-Scope Use case is up for discussion and may change the spec, implementation notes, or become an extension. labels Feb 29, 2024
@zimeon zimeon added this to the Supported in v2.0 milestone Feb 29, 2024
@rosy1280
Copy link
Contributor

Package Per Version Notes

These notes reference the Package Per Version Use Case, which is Use Case #33. It potentially addresses the issue of lots of small files as well as splitting large files.

Package characteristics

  • If a version is packaged, then the ENTIRE contents (inventory, sidecar and content) of the version directory are stored in the package.
    • Depending on the packaging format the package may comprise one or more files
    • Packages are stored in the version directory. If the packages were stored in the root directory, then the user might end up with many package files in the root directory.
      • example: Stanford has files with 100+ packages per version, and cases where there is 50TB data in 10,000 5GB chunks
    • The version directory for a packaged version MUST contain no other files than the package file(s)
  • Users MUST choose one package type per version.
  • The package file(s) are enumerated in packages.json stored in the object root. Therefore there is no particular naming convention required for packages - Implementation notes will recommend a simple and systematic approach.

Questions

  • Do we include the version directory within the packages?
    • Users cannot expand the package in place as it writes another version directory.
      • i.e., you end up with v1 > v1 > all the content files, the inventory.json file, and it's sidecar file.
    • If you're recovering to a different location, and all the packages are in the same folder then you can expand all the package files and end up with the complete object.
  • Do we include a flag at the storage root level to indicate the use of packages throughout the storage root?
    • This might allow for better tooling.

packages.json file

  • Manage version packages in a packages.json file maintained at the object root.
    • Should not have any significant scaling issues
    • inventory.json does not change format
  • Separation of concerns between packaging and object versioning
    • Repackaging does not constitute a versioning event (implementation notes will discuss in further detail)
  • A given package contains all the files of a specific version so the package digest can be validated in lieu of the files in that version.
  • All paths in the packages.json file are relative to the object root
  • There is a metadata block per version that provides information about the package with an array of key/values.
    • The key values in the "metadata" block must include "format" and "formatVersion" (avoiding "type" and "version" because of namespace collisions)
    • An optional key is "extension" that points to an extension in the object’s extension folder that allows an organization to store other information about the package.

Implementation Notes

  • Outline the best method for rewriting packages...
    • rewrite the packages,
    • rewrite the packages.json
    • and update the sidecar file.
    • We aren’t going to version the packages.json file.

packages.json example

This strategy replicates the manifest block of the inventory.json file, i.e. “digest”: \[ “filename” \] and then just specifies the order in the packages list for each version:

{
  "digestAlgorithm": "sha512",
  "type": "https://ocfl.io/1.1/spec/#packages",
  "manifest": { 
    "abc..123": [ "v1/v1.zip" ],
    "cde..123": [ "v3/v3.z01" ], 
    "ade..789": [ "v3/v3.z02" ], 
    "ces..229": [ "v3/v3.zip" ]
  },
 "versions": {
     "v1": {
	  "metadata": { 
	     "format": "zip", 
	     "formatVersion": "6.3.10", 
	     "extension": "[extension-name-ref]" 
	     },
	  "packages": ["v1/v1.zip"]  
     },
     "v3": {
	  "metadata": { 
	     "format": "zip", 
	     "formatVersion": "6.3.10", 
	     "extension": "[extension-name-ref]" 
	     },
	  "packages": ["v3/v3.z01", "v3/v3.z02", "v3/v3.zip"]  
     }
 }
  • "manifest" - lists all package files for all versions, this is done in an array to match the inventory.json file.
  • "format" - lists a handful of defined package types, and also links to a controlled vocabulary extension similar to the digest algorithm extension which is optional.
  • "formatVersion" - the precise meaning of version may be dependent on the format used to package up the content.
  • "extension" - an optional extension used to include more information about the package files, the extension must be a local extension in the object, the additional information goes in extension directory
  • "packages" - the list of packages in the version in the order in which they should be unpacked.

The inventory.json file remains unchanged. The files corresponding to the above packages.json example would appear on disk:

[object root]
├── 0=ocfl_object_1.1
├── inventory.json
├── inventory.json.sha512
├── packages.json
├── packages.json.sha512
├── v1
│    └── v1.zip
├── v2
│    ├── inventory.json
│    ├── inventory.json.sha512
│    └── new-file.txt
└── v3
     ├── v3.zip 
     ├── v3.z02
     └── v3.z01

@je4
Copy link

je4 commented Sep 21, 2024

I like the idea of having metadata about the packaging strategy within a packages.json file. The proposed file supports version packaging very well, but is restricted to it. If there's a packaged extensions or logs folder, it'll become ugly to add it to the packages.json i think.

We should think how validation can stay huzzle free even if some parts of the object are packaged.

Replacing the "versions" with "folders" would follow more the Folder as Package Proposal and allows more flexibility.
Furthermore, i moved the packages one level up which allows to include the replaced folder into the package wich means, that expanding it would create a folder instead of putting files directly into the target directory.

[object root]
├── 0=ocfl_object_1.1
├── inventory.json
├── inventory.json.sha512
├── packages.json
├── packages.json.sha512
├── extensions.zip
├── v1
│    ├── inventory.json
│    ├── inventory.json.sha512
│    └── content.zip
├── v2
│    ├── inventory.json
│    ├── inventory.json.sha512
│    └── content
│            └── new-file.txt
├── v3.zip 
├── v3.z02
└── v3.z01
{
  "digestAlgorithm": "sha512",
  "type": "https://ocfl.io/1.1/spec/#packages",
  "manifest": { 
    "abc..123": [ "v1/content.zip" ],
    "cde..123": [ "v3.z01" ], 
    "ade..789": [ "v3.z02" ], 
    "ces..229": [ "v3.zip" ],
    "tuv...375": [ "extensions.zip" ]
  },
 "folders": {
     "extensions": {
	  "metadata": { 
	     "format": "zip", 
	     "formatVersion": "6.3.10", 
	     "extension": "[extension-name-ref]" 
	     },
	  "packages": ["tuv...375"]  
     },
     "v1/content": {
	  "metadata": { 
	     "format": "zip", 
	     "formatVersion": "6.3.10", 
	     "extension": "[extension-name-ref]" 
	     },
	  "packages": ["abc..123"]  
     },
     "v3": {
	  "metadata": { 
	     "format": "zip", 
	     "formatVersion": "6.3.10", 
	     "extension": "[extension-name-ref]" 
	     },
	  "packages": ["cde..123", "ade..789", "ces..229"]  
     }
 }

@srerickson
Copy link

I don't have a preference, but it would be useful for the editors to address how/whether "packaging" should apply to the extensions and logs directories.

"We should think how validation can stay huzzle free even if some parts of the object are packaged."

@je4 can you elaborate on this? I'm not sure I understand what you mean.

@zimeon
Copy link
Contributor Author

zimeon commented Oct 4, 2024

My thought is that neither extensions nor logs would be covered by this packaging approach.

  • We don't currently have an extension that would have many or large files, an extension that did could have its own approach or there could be an extensions-packaging extension (turtles all the way down).
  • Logs is entirely implementation defined beyond that directory, if packaging were needed then that could be done within the directory. Logs are intentionally defined outside the version hierarchy to allow processes to be logged separate from occasions when new versions are minted. It is thus not clear that combining information about logs packaging in the packages.json (which I think will usually be updated only on version creation, though it could be done at other times) would be the best choice.

@je4
Copy link

je4 commented Oct 4, 2024

My thought is that neither extensions nor logs would be covered by this packaging approach.

  • We don't currently have an extension that would have many or large files, an extension that did could have its own approach or there could be an extensions-packaging extension (turtles all the way down).

i am using a thumbnail extension, which would write thumbnails into extension area, if there would be a chance, that extension or extension-subfolders could be packed. since this is not yet clear i have to write thumbnails into the content area.

@je4
Copy link

je4 commented Oct 4, 2024

I don't have a preference, but it would be useful for the editors to address how/whether "packaging" should apply to the extensions and logs directories.

"We should think how validation can stay huzzle free even if some parts of the object are packaged."

@je4 can you elaborate on this? I'm not sure I understand what you mean.

i am looking at these things from the software perspective. and this means, that i like things, which do not have exceptions and things which are easy to describe. if validation or add becomes much more complex and completely different if folders are packaged, then it becomes problematic.

the basic idea for an implementation would be to add a virtual filesystem which hides the packaged folders from the logic of the ocfl functionalities (validate, add, update, ...). to achieve this, the packages.json is a good thing to initialize this layer.

if we restrict packaging to version folder, then there could be a future problem, if other parts of the ocfl object must become part of the virtual storage layer. i could even think to rename packages.json to storage.json and allow packaged as well as remote folders on the same basis...

if there are folders, which must not be packaged, this could be mentioned within the standard instead of restricting the json file.

@neilsjefferies
Copy link
Member

It has occurred to me that we need to think carefully about what happens when a package file is corrupted or deleted. It will interact with the proposed tombstoning mechanism in some way, and may need some indication in the package inventory.

@je4
Copy link

je4 commented Oct 11, 2024

It has occurred to me that we need to think carefully about what happens when a package file is corrupted or deleted. It will interact with the proposed tombstoning mechanism in some way, and may need some indication in the package inventory.

indeed, this has to be addressed.

the same procedure, which would be done to manifest.json if files are corrupt or not available, will apply packages.json.

therefor we should initially decide on Support physical file-level deletion #42 and then apply to packages.json the same strategy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Specification Confirmed: In-scope Use case will be included in the upcoming version of the spec or implementation notes.
Projects
None yet
Development

No branches or pull requests

10 participants