Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Kafka buffer data in a consistent format with metadata #3620

Closed
dlvenable opened this issue Nov 9, 2023 · 1 comment
Closed

Write Kafka buffer data in a consistent format with metadata #3620

dlvenable opened this issue Nov 9, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request plugin - buffer A plugin for buffering incoming data
Milestone

Comments

@dlvenable
Copy link
Member

dlvenable commented Nov 9, 2023

Is your feature request related to a problem? Please describe.

The kafka buffer supports data encryption with envelope encryption. In this situation, Data Prepper should be able to write the event data alongside the encrypted data key.

Additionally, the kafka buffer only supports input from byte[]. With additional metadata we could track whether it was serialized using bytes or as JSON from the Event model.

Describe the solution you'd like

Update the kafka buffer to write and read using an internal binary protocol. This can include metadata that may not always be serialized along with an event. This would include the encrypted data key.

enum MessageFormat {
  MESSAGE_FORMAT_UNSPECIFIED = 0;
  MESSAGE_FORMAT_BYTES = 1;
  MESSAGE_FORMAT_JSON = 2;
}

message BufferedData {
   /* The format of the message as it was written.
    */
   MessageFormat message_format = 1;
  
  /* The actual data. This is encrypted if key_id is present. Otherwise, it
   * is unencrypted data.
   */
  bytes data = 2;

  /* Indicates if data is encrypted or not.
   */
  optional boolean encrypted = 3;
   
   /* The data key which encrypted the data field. This will be encrypted.
    * The consuming Data Prepper node must have the ability to decrypt this key.
    */
  optional bytes encrypted_data_key = 4;
}

Describe alternatives you've considered (Optional)

I chose Protobuf for the data format in this design.

There are possible alternatives.

Avro is compact like protobuf. In order to support change in the format over time, each Avro record would need to have a schema id attached to it. This would be done using a schema registry. Using Avro would thus require a schema registry which is adds complexity to the overall architecture.

Protobuf supports change to the schema over time using field numbers. This will be embedded in the binary data and within Data Prepper.

I think Protobuf is preferable to other binary formats such as Thrift because Data Prepper is already making significant use of Protobuf for OTel data.

Non-binary formats are not considered because they would require base64 encoding the binary data that is embedded.

@dlvenable dlvenable added enhancement New feature or request plugin - buffer A plugin for buffering incoming data labels Nov 9, 2023
@dlvenable dlvenable added this to the v2.6 milestone Nov 9, 2023
@dlvenable dlvenable self-assigned this Nov 9, 2023
dlvenable added a commit to dlvenable/data-prepper that referenced this issue Nov 11, 2023
… topic is wrapped in this and then parsed back into this. Contributes toward opensearch-project#3620.

Signed-off-by: David Venable <[email protected]>
dlvenable added a commit to dlvenable/data-prepper that referenced this issue Nov 13, 2023
… topic is wrapped in this and then parsed back into this. Contributes toward opensearch-project#3620.

Signed-off-by: David Venable <[email protected]>
dlvenable added a commit that referenced this issue Nov 13, 2023
Adds a Protobuf buffer message for the Kafka buffer. Data sent to the topic is wrapped in this and then parsed back into this. Contributes toward #3620.

Correct the Kafka buffer tests to test correctly as bytes, adds bytes tests, fixes some serialization issues with the Kafka buffer.

Signed-off-by: David Venable <[email protected]>
@dlvenable
Copy link
Member Author

The basic work for this is complete as of #3635. I've created #3655 to track support for the encrypted data key.

@github-project-automation github-project-automation bot moved this from Unplanned to Done in Data Prepper Tracking Board Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request plugin - buffer A plugin for buffering incoming data
Projects
Archived in project
Development

No branches or pull requests

1 participant