-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename s3 sink object metadata config options #5041
Rename s3 sink object metadata config options #5041
Conversation
public class PredefinedObjectMetadata { | ||
@JsonProperty("number_of_objects") | ||
private String numberOfObjects; | ||
public class ObjectMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the purpose or context for this change
@@ -43,8 +43,8 @@ public class S3SinkConfig { | |||
@JsonProperty("bucket_selector") | |||
private PluginModel bucketSelector; | |||
|
|||
@JsonProperty("predefined_object_metadata") | |||
private PredefinedObjectMetadata predefinedObjectMetadata; | |||
@JsonProperty("object_metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is locking our configuration in a way that may prevent future useful expansion.
Customers may want to add other S3 object metadata that is not here. I tend to think this should be inverted.
For example, the user might want a static value named pipeline_name
and a more dynamic value like number_of_events
.
object_metadata:
my_pipeline_name: pipeline-123
my_event_count: ${numberOfEvents}
I think Data Prepper does currently lack a good, consistent way for plugins to provide expressions that are specific to that plugin. But, we could follow the pattern used elsewhere where we look specifically for this string. It allows us to extend this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlvenable I do not think it will prevent future "dynamic" metadata. Once we add support for dynamic (expression based) metadata, old style metadata can be deprecated. I am not sure there is any easy way to add dynamic metadata now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I see is that object_metadata
is the place to put dynamic metadata.
I say we keep what we have and then improve it later with the dynamic approach.
@@ -43,6 +43,6 @@ public int hashCode() { | |||
|
|||
public Map<String, Object> getGroupIdentifierHash() { return groupIdentifierHash; } | |||
|
|||
public Map<String, String> getMetadata(int eventCount) { return predefinedObjectMetadata != null ? Map.of(predefinedObjectMetadata.getNumberOfObjects(), Integer.toString(eventCount)) : null; } | |||
public Map<String, String> getMetadata(int eventCount) { return objectMetadata != null ? Map.of(objectMetadata.getNumberOfEventsKey(), Integer.toString(eventCount)) : null; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is coupling the metadata with the group itself. We should have a more extensible approach that allows for getting the metadata elsewhere.
Also, this design is intrinsically connected to the count. But, the metadata may be more than just this.
I think having a class to get metadata for any given S3 Object write would make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have more extensible approach. We can do that in future. The current approached is NOT connected to count. getMetadata()
returns a map which can be more than the count. Currently, it is just a count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 PutObjectRequest API takes metadata as a MAP. We have ObjectMetadata class already being added in the PR. We can have API to populate it and return a MAP instead of creating map outside. I think this can be done in a future PR
@@ -43,8 +43,8 @@ public class S3SinkConfig { | |||
@JsonProperty("bucket_selector") | |||
private PluginModel bucketSelector; | |||
|
|||
@JsonProperty("predefined_object_metadata") | |||
private PredefinedObjectMetadata predefinedObjectMetadata; | |||
@JsonProperty("object_metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration was already released in Data Prepper 2.9, so this is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlvenable It is breaking change but no one is using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say that is the case or not.
…e old config Signed-off-by: Kondaka <[email protected]>
8801b7d
to
7144ea0
Compare
@@ -142,8 +145,8 @@ public ObjectKeyOptions getObjectKeyOptions() { | |||
return objectKeyOptions; | |||
} | |||
|
|||
public PredefinedObjectMetadata getPredefinedObjectMetadata() { | |||
return predefinedObjectMetadata; | |||
public ObjectMetadata getObjectMetadata() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to return either the predefined or the new one here.
Also, add an @AssertTrue
to be sure they are not both set by the user.
Signed-off-by: Kondaka <[email protected]>
Signed-off-by: Kondaka <[email protected]>
Signed-off-by: Kondaka <[email protected]>
Signed-off-by: Kondaka <[email protected]>
@@ -70,6 +70,6 @@ public S3GroupIdentifier getS3GroupIdentifierForEvent(final Event event) { | |||
} | |||
|
|||
|
|||
return new S3GroupIdentifier(groupIdentificationHash, fullObjectKey, s3SinkConfig.getPredefinedObjectMetadata(), fullBucketName); | |||
return new S3GroupIdentifier(groupIdentificationHash, fullObjectKey, s3SinkConfig.getObjectMetadataConfig(), fullBucketName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, this predefinedObjectMetadata
option is no longer used anywhere. So even if users configure this predefined_object_metadata
option, it will not be used. Is this expected, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SInce the option predefined_object_metadata
is already released in DataPrepper 2.9, David, did not want to just rename it. So, we will deprecate predefined_object_metadata
in future.
* Addressed review comments. Introduced a new config, will deprecate the old config Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Introduced a new config for metadata Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Created a separate class for object metadata Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Signed-off-by: Kondaka <[email protected]> * Fixed indentation Signed-off-by: Kondaka <[email protected]> --------- Signed-off-by: Kondaka <[email protected]> (cherry picked from commit 8216fdc)
* Addressed review comments. Introduced a new config, will deprecate the old config Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Introduced a new config for metadata Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Created a separate class for object metadata Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Signed-off-by: Kondaka <[email protected]> * Fixed indentation Signed-off-by: Kondaka <[email protected]> --------- Signed-off-by: Kondaka <[email protected]> (cherry picked from commit 8216fdc) Co-authored-by: Krishna Kondaka <[email protected]>
* Addressed review comments. Introduced a new config, will deprecate the old config Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Introduced a new config for metadata Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Created a separate class for object metadata Signed-off-by: Kondaka <[email protected]> * Addressed review comments. Signed-off-by: Kondaka <[email protected]> * Fixed indentation Signed-off-by: Kondaka <[email protected]> --------- Signed-off-by: Kondaka <[email protected]>
Description
Renaming S3 sink config options
Issues Resolved
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.