Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for File objects to describe directories #618

Merged
merged 3 commits into from
Feb 13, 2024
Merged

Conversation

goneall
Copy link
Member

@goneall goneall commented Jan 29, 2024

Fixes #83

@goneall goneall requested review from kestewart and zvr January 29, 2024 23:23
@goneall goneall added the Profile:Software Software Profile and related matters label Jan 29, 2024
@goneall goneall added this to the 3.0-rc2 milestone Jan 29, 2024
@goneall goneall mentioned this pull request Jan 30, 2024
32 tasks
Copy link
Member

@zvr zvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree simply extending the definition so that a File can be a directory without any other extra information. Too many things may potentially break -- or we will have to update lots of pieces of documentation to specify that we are really meaning files and not directories. Off the top of my head: typical hashes do not work any more (what's the md5 of a directory?); snippets can't blindly point to files any more (snippet in a dir?), etc. etc.

Why can't we use a Package to represent a directory?
If we can't, should we create a specific Directory class (subclassing Package)? Having it as a subclass of File unfortunately will also have all the issues mentioned above.

If this is also not possible, and it has to be a File, at the very least we should introduce a boolean property isDirectory and update texts to say "...File with isDirectory false"

@sbarnum
Copy link
Collaborator

sbarnum commented Jan 30, 2024

I disagree simply extending the definition so that a File can be a directory without any other extra information. Too many things may potentially break -- or we will have to update lots of pieces of documentation to specify that we are really meaning files and not directories. Off the top of my head: typical hashes do not work any more (what's the md5 of a directory?); snippets can't blindly point to files any more (snippet in a dir?), etc. etc.

Why can't we use a Package to represent a directory? If we can't, should we create a specific Directory class (subclassing Package)? Having it as a subclass of File unfortunately will also have all the issues mentioned above.

If this is also not possible, and it has to be a File, at the very least we should introduce a boolean property isDirectory and update texts to say "...File with isDirectory false"

I agree that simply changing the definition to implicitly make File either a file or a directory without clarity of which is likely to cause lots of problems.

I would strongly support the last proposed approach above (adding a boolean isDirectory property to File). This is by far the simplest approach to add late in the v3 game. It actually covers well the UNIX situation of Directories actually being files while also covering other approaches. Lastly, it is the exact approach leveraged in other ontology standardization efforts I have been involved in and has worked well for many years of real world use.

@goneall
Copy link
Member Author

goneall commented Jan 30, 2024

@zvr See #83 for the reason for this change. If you have a better solution, please propose - but I do think we need to fix the compatibility issue described in #83

@sbarnum - Adding the property sounds like a reasonable approach to me - I can add that in the PR

@zvr
Copy link
Member

zvr commented Jan 30, 2024

@goneall I had read #83, but I still don't understand. Let me state what I understand.

  • SPDXv2 has PackageFilename field for Packages, and we want to be able to express this in SPDXv3 for compatibility
  • In SPDXv2 it was simply a string (as most of fields), here we want to have it a... Relationship
    • but we haven't yet introduced such an entry in the RelationshipType vocabulary (I think). I assume it will come later.
  • the idea is that this new Relationship will be from a Package instance to a File instance
  • but the value of the field might be a directory, instead of a plain file, and that's where we are stuck.

Is the above correct? Did I misunderstand something? What am I missing?
It seems to be that, besides
0. allowing File to be a directory

we also have the alternatives:

  1. allow Package to represent a directory (instead of File) and have this in the Relationship
  2. subclass Package to a new Directory and have this in the Relationship
  3. don't even use a new RelationshipType, and use a string for the name, as in SPDXv2.
    • Does it have to be an object?
    • The SPDXv2 definition is "the actual file name of the package, or path of the directory being treated as a package."

Thinking a little more, I actually prefer the last approach. Have it simply as a string.
If we also have the File object, we can add a Relationship between them:

Package-zlib-1.2.3 packagedBy File-zlib-1.2.3.tar.gz

But there might not be a need to have it as an object; the property as string will do.

@goneall
Copy link
Member Author

goneall commented Jan 31, 2024

@zvr

... Is the above correct? Did I misunderstand something? What am I missing?

That is correct - the issue is we do not have a migration for the packageFileName if it represents a directory.

In terms of the alternatives:

allow Package to represent a directory (instead of File) and have this in the Relationship

I'm not quite sure how that would work - the package content is already a directory - seems a bit recursive

subclass Package to a new Directory and have this in the Relationship

Same concerns as above - i just get confused as to how this would actually work - seems complex to me

don't even use a new RelationshipType, and use a string for the name, as in SPDXv2.

This is also my favorite solution. I have a feeling this was discussed previously and was rejected.

@sbarnum - any opinion on putting back the string?

@goneall
Copy link
Member Author

goneall commented Jan 31, 2024

I added the isDirectory property to this pull request.

@goneall
Copy link
Member Author

goneall commented Feb 4, 2024

@zvr - I found the minutes where we decided to use the file relationship: https://github.com/spdx/meetings/blob/main/tech/2023-02-21.md

Are you OK with the current PR where I added the isDirectory property?

@zvr
Copy link
Member

zvr commented Feb 4, 2024

I had read that, because it was included in #83. And I should note that (text copied from those minutes):

We were thinking of using a relationship.
If it is a folder, then we would probably create a separate "package" definition to represent the folder
This is still under discussion

do not seem like a final decision on the matter, more like a reporting that some discussion had happened.
I think that the middle sentence points to a solution like (2) in my comment above.

Can we please discuss option (3) above? Meaning, keeping it as a string. What is the problem with this approach?

As I said in my first comment above, if we change to File to also mean directory, we should update the text in a number of places. Even in this file getting changed (File.md): what is the allowed contentType if this is a directory?

@goneall
Copy link
Member Author

goneall commented Feb 5, 2024

@zvr - I'm personally OK with option 3 - just being cautious that we don't reopen something we already decided.

I'll see if I can setup a call with @sbarnum just to close on this.

@kestewart
Copy link
Contributor

I'm not thrilled with the isdirectory property, but from discussion with @goneall, my suggestion of making it an enumeration doesn't quite match the semantics, so will go along with what others decide.

@sbarnum
Copy link
Collaborator

sbarnum commented Feb 6, 2024

Consensus of the Feb 06, 2024 Tech call was to pursue adding the isDirectory boolean property to the File class.

A separate issue will be opened to explore, discuss and solve the questions around potentially needed file system objects beyond files and directories post RC2.

@kestewart
Copy link
Contributor

@sbarnum - can you add a formal approval in a review. And we'll get it merged, if no additional issues with the PR.

Copy link
Collaborator

@sbarnum sbarnum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consensus of the Feb 06, 2024 Tech call was to pursue adding the isDirectory boolean property to the File class.

@sbarnum
Copy link
Collaborator

sbarnum commented Feb 6, 2024

@sbarnum - can you add a formal approval in a review. And we'll get it merged, if no additional issues with the PR.

Done

@pmonks
Copy link

pmonks commented Feb 6, 2024

A separate issue will be opened to explore, discuss and solve the questions around potentially needed file system objects beyond files and directories post RC2.

That issue is now created at #630.

@goneall goneall requested a review from kestewart February 12, 2024 02:35
@goneall
Copy link
Member Author

goneall commented Feb 12, 2024

@kestewart - if you agree with this - please go ahead and approve - your last review was just a comment and not a formal approval.

@kestewart
Copy link
Contributor

Discussion on call, was "why" is this needed. Discussion ended up on SPDX 2, artifacts can point to a file - which could be a folder or plain file. For migration compatibilities, we need to add this in. ie. my package is the following folder. How do you express this without a directory, is what this is addressing.

@kestewart kestewart merged commit 6bbf5fe into main Feb 13, 2024
1 check passed
@kestewart kestewart deleted the goneall-patch-3 branch February 13, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Profile:Software Software Profile and related matters
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SPDX 2->3 conversion of PackageFileName
5 participants