Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve software integrity and verification documentation #602

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions model/Core/Classes/Hash.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,11 @@ A mathematically calculated representation of a grouping of data.

## Description

A hash is a grouping of characteristics unique to the result
of applying a mathematical algorithm
that maps data of arbitrary size to a bit string (the hash)
and is a one-way function, that is,
a function which is practically infeasible to invert.
This is commonly used for integrity checking of data.
A hash is a grouping of characteristics unique to the result of applying a mathematical algorithm
that maps data of arbitrary size to a bit string (the hash) and is a one-way function, that is,
a function which is practically infeasible to invert. This is commonly used for integrity checking of data.

The recommended method to verify the integrity of `SoftwareArtifacts` Elements (including `Files`, `Snippets`, and `Packages`) is to use the SoftwareArtifact’s `contentIdentifier` property.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that these sort of case specific details should be placed here.
This is the general structure to be used across SPDX.
Placing case specific variations here is likely to lead to an eventual laundry list of such specializations.

I propose that it would be more appropriate to add this note to the markdown files for the case specific structures (SoftwareArtifact, File, Package, Snippet).


## Metadata

Expand Down
2 changes: 2 additions & 0 deletions model/Core/Classes/IntegrityMethod.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ of a specific Element that correlates to the data in this SPDX document. This id
a recipient to determine if anything in the original Element has been changed and eliminates
confusion over which version or modification of a specific Element is referenced.

The recommended method to verify the integrity of `SoftwareArtifacts` Elements (including `Files`, `Snippets`, and `Packages`) is to use the SoftwareArtifact’s `contentIdentifier` property.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that these sort of case specific details should be placed here.
This is the general structure to be used across SPDX.
Placing case specific variations here is likely to lead to an eventual laundry list of such specializations.

I propose that it would be more appropriate to add this note to the markdown files for the case specific structures (SoftwareArtifact, File, Package, Snippet).


## Metadata

- name: IntegrityMethod
Expand Down
2 changes: 2 additions & 0 deletions model/Core/Properties/verifiedUsing.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Provides an IntegrityMethod with which the integrity of an Element can be assert

VerifiedUsing provides an IntegrityMethod with which the integrity of an Element can be asserted.

The recommended method to verify the integrity of `SoftwareArtifacts` Elements (including `Files`, `Snippets`, and `Packages`) is to use the SoftwareArtifact’s `contentIdentifier` property.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that these sort of case specific details should be placed here.
This is the general structure to be used across SPDX.
Placing case specific variations here is likely to lead to an eventual laundry list of such specializations.

I propose that it would be more appropriate to add this note to the markdown files for the case specific structures (SoftwareArtifact, File, Package, Snippet).


## Metadata

- name: verifiedUsing
Expand Down
12 changes: 9 additions & 3 deletions model/Software/Properties/contentIdentifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ SPDX-License-Identifier: Community-Spec-1.0

## Summary

Provides a place to record the canonical, unique, immutable identifier for each software artifact using the artifact's gitoid.
Used by SPDX producers to record the artifact’s gitoid: a canonical, unique, immutable identifier that can be used for software integrity verification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously is not only a "gitoid", but any content identifier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more appropriate to include an identifier tag in the syntax, like IANA uses to qualify identifier types - similar to what SPDX does today : dns:www.foo.com SHA256:.......

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other content identifiers out there, as Alexios points out. Let's handle the broadening as a separate PR though, and move this forward. The original had gitoid in it, and is just restructured here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Used by SPDX producers to record the artifact’s gitoid: a canonical, unique, immutable identifier that can be used for software integrity verification.
Used to record a canonical, unique, immutable identifier of the artifact, that can be used for verification its identity and integrity.


Used by SPDX consumers to verify the identity and integrity of a software artifact they received.

## Description

The contentIdentifier provides a canonical, unique, immutable artifact identifier for each software artifact. SPDX 3.0 describes software artifacts as Snippet, File, or Package Elements. The ContentIdentifier can be calculated for any software artifact and can be recorded for any of these SPDX 3.0 Elements using Omnibor, an attempt to standardize how software artifacts are identified independent of which programming language, version control system, build tool, package manager, or software distribution mechanism is in use.
**For SPDX Producers:** The contentIdentifier is a canonical, unique, immutable artifact identifier for each software artifact. The ContentIdentifier for any software artifact can be calculated and recorded in SPDX 3.0 Snippet, File, or Package Elements. For additional information, see [OmniBOR](https://omnibor.io): an attempt to standardize how software artifacts are identified independent of which programming language, version control system, build tool, package manager, or software distribution mechanism is in use.

The contentIdentifier is defined as the [Git Object Identifier](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) (gitoid) of type `blob` of the software artifact. The use of a git-based version control system is not necessary to calculate a contentIdentifier for any software artifact.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that contentIdentifier has to be a gitoid. We should leave it for any kind of unique identifier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that not all content identifiers support use in verification. We would also need to somehow specify which content identifier is used (e.g. the enumeration of supported content identifiers).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That appears similar to how SPDX handles this today with the SHA* identifying tags accompanying a hash value in an SBOM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zvr - Let's move forward with this definition and open a separate issue to include better support for SWHID after RC2 - just trying to get RC2 released.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove gitoid and leave generic "unique, immutable artifact identifier" or we can mention both gitoid and swhid that we already know of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not objecting just because I don't want SWHIDs to be forgotten:
The most important part is, given that we know there might be more than one format, modeling it as a single value (of type URI) will not work. There has to be a type defined somewhere (maybe with a vocabulary), etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection of the discussions on this topic from last year were that for v3.0 we would specify gitoid as the default but that the structures and definitions would be generalized (as @zvr is suggesting above) to enable addition of other possible types post v3.0 without breaking backward compatibility.

In other words, I agree that the definitions should be generalized and not locked to one approach.


Expand All @@ -18,7 +20,11 @@ The gitoid is expressed in the ContentIdentifier property by using the IANA [git
Scheme syntax: gitoid":"<git object type>":"<hash algorithm>":"<hash value>
```

The OmniBOR ID for the OmniBOR Document associated with a software artifact should not be recorded in this field. Rather, OmniBOR IDs should be recorded in the SPDX Element's ExternalIdentifier property. See [https://omnibor.io](https://omnibor.io) for more details.
The OmniBOR ID for the OmniBOR Document associated with a software artifact must NOT be recorded in this field. Rather, OmniBOR IDs should be recorded in the SPDX Element's ExternalIdentifier property. See [https://omnibor.io](https://omnibor.io) for more details.

**For SPDX Consumers:** The integrity of software objects can be verified by calculating the gitoid(s) (`git hash-object foo`) of the object(s) and comparing the results to the value stored in the SPDX contentIdentifier field. ContentIdentifiers are canonical: Omnibor specifies a reproducible algorithm for anyone with the software object to perform this calculation. ContentIdentifiers are unique: the gitoid value is the result of a specific implementation of a one-way hash function. If the calculated gitoid value is the same as the gitoid value stored in SPDX, you can be sure it’s the same software. ContentIdentifiers are immutable: if a software object changes the resulting contentIdentifier will differ. These properties enable the verification of software integrity between producer and consumer using SPDX.



## Metadata

Expand Down
Loading