You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The kafka buffer supports data encryption with envelope encryption. In this situation, Data Prepper should be able to write the event data alongside the encrypted data key.
Additionally, the kafka buffer only supports input from byte[]. With additional metadata we could track whether it was serialized using bytes or as JSON from the Event model.
Describe the solution you'd like
Update the kafka buffer to write and read using an internal binary protocol. This can include metadata that may not always be serialized along with an event. This would include the encrypted data key.
enum MessageFormat {
MESSAGE_FORMAT_UNSPECIFIED = 0;
MESSAGE_FORMAT_BYTES = 1;
MESSAGE_FORMAT_JSON = 2;
}
message BufferedData {
/* The format of the message as it was written.
*/
MessageFormat message_format = 1;
/* The actual data. This is encrypted if key_id is present. Otherwise, it
* is unencrypted data.
*/
bytes data = 2;
/* Indicates if data is encrypted or not.
*/
optional boolean encrypted = 3;
/* The data key which encrypted the data field. This will be encrypted.
* The consuming Data Prepper node must have the ability to decrypt this key.
*/
optional bytes encrypted_data_key = 4;
}
Describe alternatives you've considered (Optional)
I chose Protobuf for the data format in this design.
There are possible alternatives.
Avro is compact like protobuf. In order to support change in the format over time, each Avro record would need to have a schema id attached to it. This would be done using a schema registry. Using Avro would thus require a schema registry which is adds complexity to the overall architecture.
Protobuf supports change to the schema over time using field numbers. This will be embedded in the binary data and within Data Prepper.
I think Protobuf is preferable to other binary formats such as Thrift because Data Prepper is already making significant use of Protobuf for OTel data.
Non-binary formats are not considered because they would require base64 encoding the binary data that is embedded.
The text was updated successfully, but these errors were encountered:
Adds a Protobuf buffer message for the Kafka buffer. Data sent to the topic is wrapped in this and then parsed back into this. Contributes toward #3620.
Correct the Kafka buffer tests to test correctly as bytes, adds bytes tests, fixes some serialization issues with the Kafka buffer.
Signed-off-by: David Venable <[email protected]>
Is your feature request related to a problem? Please describe.
The
kafka
buffer supports data encryption with envelope encryption. In this situation, Data Prepper should be able to write the event data alongside the encrypted data key.Additionally, the
kafka
buffer only supports input frombyte[]
. With additional metadata we could track whether it was serialized using bytes or as JSON from theEvent
model.Describe the solution you'd like
Update the
kafka
buffer to write and read using an internal binary protocol. This can include metadata that may not always be serialized along with an event. This would include the encrypted data key.Describe alternatives you've considered (Optional)
I chose Protobuf for the data format in this design.
There are possible alternatives.
Avro is compact like protobuf. In order to support change in the format over time, each Avro record would need to have a schema id attached to it. This would be done using a schema registry. Using Avro would thus require a schema registry which is adds complexity to the overall architecture.
Protobuf supports change to the schema over time using field numbers. This will be embedded in the binary data and within Data Prepper.
I think Protobuf is preferable to other binary formats such as Thrift because Data Prepper is already making significant use of Protobuf for OTel data.
Non-binary formats are not considered because they would require base64 encoding the binary data that is embedded.
The text was updated successfully, but these errors were encountered: