Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-14020 - Add Record and Demarcator support to ConsumeGCPubSub #9530

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pvillard31
Copy link
Contributor

@pvillard31 pvillard31 commented Nov 18, 2024

Summary

NIFI-14020 - Add Record and Demarcator support to ConsumeGCPubSub

At the moment, ConsumeGCPubSub will generate one FlowFile per consumed message (the Batch Size property is only specifying the maximum number of messages we may pull from the subscription with one API call). This can be extremely inefficient.

Similarly to the Kafka processors, we should add the option to have multiple Processing Strategies:

  • Flow File - which is the current behavior - where one message is one FlowFile and FlowFile attributes will be used to store the attributes associated with the message as well as some information such as message ID, ack ID, etc.
  • Demarcator - where messages will be appended into a single FlowFile with a custom demarcator between each message. In this case specific attributes associated to messages will be lost. This however is the most efficient strategy if very high throughput is required and message format is allowing this approach.
  • Record - where a reader and writer can be specified to process the messages. This is useful if we want to change message format on the fly or if the message format is not allowing the demarcator strategy. In addition, an output strategy is available with two allowable values:
    • Value - messages are all added in the same flowfile with the specified writer. In this case specific attributes associated to messages will be lost.
    • Wrapper - in this case, we are overriding the schema of the writer to include the metadata of the message as well as a map of its attributes.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@Lehel44
Copy link
Contributor

Lehel44 commented Jan 3, 2025

Thanks @pvillard31 for adding this improvement! I'm currently reviewing. Could you please resolve the merge conflict? I will then test it as well.

@pvillard31
Copy link
Contributor Author

Done, thanks @Lehel44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants