Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metadata parameter to CodeArtifact api #109

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

seschis
Copy link

@seschis seschis commented Nov 4, 2021

No description provided.

@seschis seschis force-pushed the add-prescan-data branch 2 times, most recently from 98a8bfd to a2ea4b7 Compare November 8, 2021 18:43
The Scan API has an optional "metadata" parameter that can be given
which will aid in creating rich sarif reporting content.  This rich
sarif reporting content will be used by the github sarif viewer to
provided better integration support by annotating the actual code in the
repo view with the vulnerability flow and information.

The changeset here adds support for using this extra parameter when
calling the CodeArtifact Client.

Related Tickets:
UC-559
Copy link
Contributor

@gilday gilday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly there, but I'd like to figure out a better API for users to pass the metadata to the SDK.

* @return new {@link CodeArtifactInner} from Contrast API
* @throws IOException when an IO error occurs while making the request to the Contrast API
* @throws UnauthorizedException when Contrast rejects the credentials used to send the request
* @throws ResourceNotFoundException when the requested resource does not exist
* @throws HttpResponseException when Contrast rejects this request with an error code
* @throws ServerResponseException when Contrast API returns a response that cannot be understood
*/
CodeArtifactInner upload(String projectId, Path file) throws IOException;
CodeArtifactInner upload(String projectId, Path file, Path metadata) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API asks that the user provide the metadata JSON as a file. Maybe passing the metadata as a file is the only use case we have now (i.e. the Maven plugin), but it doesn't feel like the most flexible assumption for the SDK. I would expect the SDK to accept the metadata as a Java object that describes the data structure, and I would expect the SDK to marshal that object to JSON for me.

Are the fields in the metadata JSON object well known, or is this more of an open-ended bag of key-value pairs? If it's the former, then we should define a new class for it. If it's the latter, then maybe all we need is a Map.

Copy link
Author

@seschis seschis Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the format of the data is defined here: https://github.com/Contrast-Security-OSS/contrast-scan-prescan/blob/master/src/main/resources/schema/scan-input-metadata-schema-1.0.0.json

I don't anticipate it changing from json, but I can't say it never will. I also don't know how the content of the data will be required to change over time. The main purpose of the prescan metadata is to allow the engine to generate physical absolute paths to files in its sarif report which GitHub uses as part of preview renderings when displaying the sarif findings.
It also allows local sarif viewers, like the one in VSCode to naturally find the right file locally and show code annotated with the data flow path in the editor directly.

Copy link
Contributor

@gilday gilday Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. Knowing that has a well-defined schema, I feel more strongly that we should make a corresponding Java class to hold this data, but I don't fully understand what expectation you have for users who wan to use this API. How do they know how to generate this file?

@@ -62,6 +62,7 @@
*/
CodeArtifact upload(Path file, String name) throws IOException;

CodeArtifact upload(Path file, String name, Path metadata, String metaname) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is metaname here? I don't think it's used.

Copy link
Author

@seschis seschis Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metaname is analagous to the name argument the way metadata is analogous to the file argument. As to if that is even needed, I'm not sure. I don't see why we'd give the user the capability to change the "filename" on the multipart upload... but I figured it was something I didn't understand so I just mirrored the behavior when allowing the prescan metadata to be added to the multipart file upload. It's your call if you want me to take it out as I don't have a valid argument for or against it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc the filename of the upload is reflected in the UI, and that's why we give the user the capability to set that. I don't see a use case for allowing the user to set the name of the metadata file. I'd argue we take that capability out. Also, I'm still questioning whether the metadata should be sent as a file vs as JSON, but I'll continue that discussion in another thread.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see now. I can take it out. Before I do any work on it yet though I'll wait until we resolve the fundamental API design questions you have in this area.

Copy link
Contributor

@gilday gilday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of my remaining comments pertain to how we're sending the metadata. On one hand, the metadata is defined by a well-known JSON schema, so I feel that it should be represented by a Java class, and its representation on the wire as JSON should be abstracted from the SDK user. On the other hand, I'm seeing some evidence that the file is an opaque box to the user, and they should know or care that it's JSON.

I can't tell which is our intention. I think learning more about the API and the way that this file is generated would help me understand.

+ '"'
+ CRLF
+ "Content-Type: "
+ determineMime(metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we determine the MIME this way if we always know that it is JSON?

@@ -52,6 +52,11 @@ public String filename() {
return inner.filename();
}

@Override
public String metadata() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the metadata is a String, does that imply that the SDK user is not intended to parse this structured data; rather, they should treat it as an opaque box? Specifically, is this JSON-encoded JSON inside this JSON metadata property and it's not meant to be decoded?

Copy link
Author

@seschis seschis Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I always considered the generation of prescan data something that was external to the SDK.... but it probably would be more user-friendly to allow the SDK user to call a function that generated prescan data for them.... or perhaps just do it transparently.
In either case, it seems like an opaque thing to the SDK user.

@@ -44,6 +45,10 @@ static Builder builder() {
/** @return filename */
abstract String filename();

/** @return metadata filename */
@Nullable
abstract String metadata();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that the user can retrieve the metadata filename, but not the metadata? What would the user do with the filename?

@@ -62,6 +62,7 @@
*/
CodeArtifact upload(Path file, String name) throws IOException;

CodeArtifact upload(Path file, String name, Path metadata, String metaname) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc the filename of the upload is reflected in the UI, and that's why we give the user the capability to set that. I don't see a use case for allowing the user to set the name of the metadata file. I'd argue we take that capability out. Also, I'm still questioning whether the metadata should be sent as a file vs as JSON, but I'll continue that discussion in another thread.

metadata != null
? boundaryMarker
+ CRLF
+ "Content-Disposition: form-data; name=\"metadata\"; filename=\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we know this form is of type application/json, shouldn't we send it as such instead of sending it as a file? This relates to my question in CodeArtifactClient which is "how does the user know how to build this file?"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this gets back to the API design of the REST endpoint which I did not design. Let me see if I can provide some clarity on a few of your questions for trying to figure the best way to proceed.

The JSON spec I linked to (https://github.com/Contrast-Security-OSS/contrast-scan-prescan/blob/master/src/main/resources/schema/scan-input-metadata-schema-1.0.0.json) is an ingest specification for the scanner engine itself. It's saying, that if you want to provide the scanner engine some "prescan data" this is the format it will accept it in. Its published this way because I just wanted a specification for the input format rather than telling someone, "go read the code", and I also wanted to use code generators like jsonschema2pojo, to automatically generate the libraries to handle the data models of the scanner's prescan ingest format.

  • The scanner makes no promises that this format will always be supported, though we will try to keep it backwards compat best as possible, in all future versions of the scanner engine. The current format represents the simplest format to satisfy the current requirements given the current knowledge of the domain, which we know is not complete.
  • The REST API in front of the scanner engine doesn't do anything with this data specifically and merely passes it along to the scanner engine. From its point of view, it's opaque.
  • If the REST api wanted to validate the data before passing to the engine since it knows what engine version it bundles, its free to do that with the publish json schema, but the engine makes no assumptions that will occur.
  • For "contrast tools" created to generate prescan data as part of the scan request, they have a well defined json-schema to use to get the right data to the engine. But once again, the engine makes no assumptions that it is guarded behind a REST api or anything thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "contrast tools" created to generate prescan data as part of the scan request

Is this SDK an example of such a "contrast tool", or does the SDK user have to use another tool to generate the file first before they can use the SDK to include in their new code artifacts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants