Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename s3 sink object metadata config options #5041

Merged
merged 5 commits into from
Oct 14, 2024

Conversation

kkondaka
Copy link
Collaborator

Description

Renaming S3 sink config options

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

public class PredefinedObjectMetadata {
@JsonProperty("number_of_objects")
private String numberOfObjects;
public class ObjectMetadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the purpose or context for this change

@@ -43,8 +43,8 @@ public class S3SinkConfig {
@JsonProperty("bucket_selector")
private PluginModel bucketSelector;

@JsonProperty("predefined_object_metadata")
private PredefinedObjectMetadata predefinedObjectMetadata;
@JsonProperty("object_metadata")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is locking our configuration in a way that may prevent future useful expansion.

Customers may want to add other S3 object metadata that is not here. I tend to think this should be inverted.

For example, the user might want a static value named pipeline_name and a more dynamic value like number_of_events.

object_metadata:
  my_pipeline_name: pipeline-123
  my_event_count: ${numberOfEvents}

I think Data Prepper does currently lack a good, consistent way for plugins to provide expressions that are specific to that plugin. But, we could follow the pattern used elsewhere where we look specifically for this string. It allows us to extend this in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable I do not think it will prevent future "dynamic" metadata. Once we add support for dynamic (expression based) metadata, old style metadata can be deprecated. I am not sure there is any easy way to add dynamic metadata now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I see is that object_metadata is the place to put dynamic metadata.

I say we keep what we have and then improve it later with the dynamic approach.

@@ -43,6 +43,6 @@ public int hashCode() {

public Map<String, Object> getGroupIdentifierHash() { return groupIdentifierHash; }

public Map<String, String> getMetadata(int eventCount) { return predefinedObjectMetadata != null ? Map.of(predefinedObjectMetadata.getNumberOfObjects(), Integer.toString(eventCount)) : null; }
public Map<String, String> getMetadata(int eventCount) { return objectMetadata != null ? Map.of(objectMetadata.getNumberOfEventsKey(), Integer.toString(eventCount)) : null; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is coupling the metadata with the group itself. We should have a more extensible approach that allows for getting the metadata elsewhere.

Also, this design is intrinsically connected to the count. But, the metadata may be more than just this.

I think having a class to get metadata for any given S3 Object write would make more sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have more extensible approach. We can do that in future. The current approached is NOT connected to count. getMetadata() returns a map which can be more than the count. Currently, it is just a count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 PutObjectRequest API takes metadata as a MAP. We have ObjectMetadata class already being added in the PR. We can have API to populate it and return a MAP instead of creating map outside. I think this can be done in a future PR

@@ -43,8 +43,8 @@ public class S3SinkConfig {
@JsonProperty("bucket_selector")
private PluginModel bucketSelector;

@JsonProperty("predefined_object_metadata")
private PredefinedObjectMetadata predefinedObjectMetadata;
@JsonProperty("object_metadata")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration was already released in Data Prepper 2.9, so this is a breaking change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable It is breaking change but no one is using.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say that is the case or not.

@dlvenable dlvenable added this to the v2.10 milestone Oct 11, 2024
@kkondaka kkondaka force-pushed the s3-sink-option-rename branch from 8801b7d to 7144ea0 Compare October 11, 2024 17:08
@@ -142,8 +145,8 @@ public ObjectKeyOptions getObjectKeyOptions() {
return objectKeyOptions;
}

public PredefinedObjectMetadata getPredefinedObjectMetadata() {
return predefinedObjectMetadata;
public ObjectMetadata getObjectMetadata() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to return either the predefined or the new one here.

Also, add an @AssertTrue to be sure they are not both set by the user.

@@ -70,6 +70,6 @@ public S3GroupIdentifier getS3GroupIdentifierForEvent(final Event event) {
}


return new S3GroupIdentifier(groupIdentificationHash, fullObjectKey, s3SinkConfig.getPredefinedObjectMetadata(), fullBucketName);
return new S3GroupIdentifier(groupIdentificationHash, fullObjectKey, s3SinkConfig.getObjectMetadataConfig(), fullBucketName);
Copy link
Collaborator

@oeyh oeyh Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, this predefinedObjectMetadata option is no longer used anywhere. So even if users configure this predefined_object_metadata option, it will not be used. Is this expected, or am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SInce the option predefined_object_metadata is already released in DataPrepper 2.9, David, did not want to just rename it. So, we will deprecate predefined_object_metadata in future.

@kkondaka kkondaka merged commit 8216fdc into opensearch-project:main Oct 14, 2024
49 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 14, 2024
* Addressed review comments. Introduced a new config, will deprecate the old config

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments. Introduced a new config for metadata

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments. Created a separate class for object metadata

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments.

Signed-off-by: Kondaka <[email protected]>

* Fixed indentation

Signed-off-by: Kondaka <[email protected]>

---------

Signed-off-by: Kondaka <[email protected]>
(cherry picked from commit 8216fdc)
kkondaka added a commit that referenced this pull request Oct 15, 2024
* Addressed review comments. Introduced a new config, will deprecate the old config

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments. Introduced a new config for metadata

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments. Created a separate class for object metadata

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments.

Signed-off-by: Kondaka <[email protected]>

* Fixed indentation

Signed-off-by: Kondaka <[email protected]>

---------

Signed-off-by: Kondaka <[email protected]>
(cherry picked from commit 8216fdc)

Co-authored-by: Krishna Kondaka <[email protected]>
san81 pushed a commit to san81/data-prepper that referenced this pull request Oct 17, 2024
* Addressed review comments. Introduced a new config, will deprecate the old config

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments. Introduced a new config for metadata

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments. Created a separate class for object metadata

Signed-off-by: Kondaka <[email protected]>

* Addressed review comments.

Signed-off-by: Kondaka <[email protected]>

* Fixed indentation

Signed-off-by: Kondaka <[email protected]>

---------

Signed-off-by: Kondaka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants