Running PR to keep track of custom changes to support YB #127

vaibhav-yb · 2024-05-30T06:14:53Z

No description provided.

Initial changes required for the Debezium Connector for Postgres to work with YugabyteDB source.

…st YugabyteDB (#105) This PR includes the changes required for the tests so that they can work against YugabyteDB. YugabyteDB issue: yugabyte/yugabyte-db#21394

Modified Dockerfile to package custom log4j.properties so that the log files can be rolled over when their size exceeds 100MB. Also changed the Kafka connect JDBC jar being used - this new jar has a custom change to log every sink record going to the target database.

Changes in this PR: 1. Modification of Dockerfile to include transformers for aiven at the time of docker image compilation a. Aiven source: https://github.com/Aiven-Open/transforms-for-apache-kafka-connect

…BC driver (#107) ## Problem The Debezium connector for Postgres uses a single host model where the JDBC driver connects to a PG instance and continues execution. However, when we move to YugabyteDB where we have a multi node deployment, the current model can fail in case the node it has connected to goes down. ## Solution To address that, we have made changes in this PR and replaced the Postgres JDBC driver with [YugabyteDB smart driver](https://github.com/yugabyte/pgjdbc) which allows us to specify multiple hosts in the JDBC url so that the connector does not fail or run into any fatal error while maintaining the High Availability aspect of YugabyteDB. Changes in this PR include: 1. Changing of version in `pom.xml` from `2.5.2.Final` to `2.5.2.ybpg.20241-SNAPSHOT` a. This is done to ensure that upon image compilation, the changed code from Debezium Code is picked up. 2. Replacing of all packages from `org.postgresql.*` to `com.yugabyte.*` to comply with the new JDBC driver. 3. Masking the validator method in debezium-core which disallowed characters like `: (colon)` in the configuration property `database.hostname`

**Summary** This PR is to support consistent snapshot in the case of an existing slot. In this case, the consistent_point hybrid time is determined from the pg_replication_slots view, specifically from the yb_restart_commit_ht column. There is an assumption here that this slot has not been used for streaming till this point. If this holds, then the history retention barrier will be in place as of the consistent snapshot time (consistent_point). The snapshot query will be run as of the consistent_point and subsequent streaming will start from the consistent_point of the slot. **Test Plan** Added new test mvn -Dtest=PostgresConnectorIT#initialSnapshotWithExistingSlot test

**Changes:** 1. Providing JMX Exporter jar to KAFKA_OPTS to be further provided to java options. 2. Modifying `metrics.yaml` to include correct regex to be scraped as per Postgres connector.

…peruser (#115) **Summary** This PR adds the support for a non superuser to be configured as the connector user (database.user). Such a user is required to have the privileges listed in https://debezium.io/documentation/reference/2.5/connectors/postgresql.html#postgresql-permissions Specifically, the changes in this revision relate to how the consistent_point is specified to the YugabyteDB server in order to execute a consistent snapshot. **Test Plan** Added new test mvn -Dtest=PostgresConnectorIT#nonSuperUserSnapshotAndStreaming test

…nectorTask (#114) This PR is to add a higher level retry whenever there's a where while starting a PostgresConnectorTask, the failures can include but not limited to the following: 1. Failure of creating JDBC connection 2. Failure to execute query 3. Tserver/master restart 4. Node restart 5. Connection failure

…streaming (#116) ## Problem PG connector does not wait for acknowledgement of snapshot completion offset before transitioning to streaming. This can lead to an issue if there is a connector restart in the streaming phase and it goes for a snapshot on restart. In streaming phase, as soon as the 1st GetChanges call is made on the server, the retention barriers are lifted and so the server can no longer serve the snapshot records on a restart. Therefore it is important that the connector waits for acknowledgement of snapshot completion offset before it actually transitions to streaming. ## Solution This PR introduces a waiting mechanism for acknowledgement of snapshot completion offset before transitioning to streaming. We have introduced a custom heartbeat implementation that will dispatch heartbeat when forced heartbeat method is called but we'll dispatch nothing when a normal heartbeat method is called. With this PR, connector will dispatch heartbeats while waiting for the snapshot completion offset i.e during the transition phase. For these heartbeat calls, there is no need to set the `heartbeat.interval.ms` since we are making forced heartbeat calls which do not rely on this config. Note, this heartbeat call is only required to support applications using debezium engine/embedded engine. It is not required when the connector is run with kakfa-connect. ### Test Plan Manually deployed connector in a docker container and tested two scenarios: 0 snapshot records & non-zero snapshot records. Unit tests corresponding to these scenarios will be added in a separate PR.

#119) **Summary** This PR adds support for the INITIAL_ONLY snapshot mode for Yugabyte. In the case of Yugabyte also, the snapshot is consumed by executing a snapshot query (SELECT statement) . To ensure that the streaming phase continues exactly from where the snapshot left, this snapshot query is executed as of a specific database state. In YB, this database state is represented by a value of HybridTime. Changes due to transactions with commit_time strictly greater than this snapshot HybridTime will be consumed during the streaming phase. This value for HybridTime is the value of the "yb_restart_commit_ht" column of the pg_replication_slots view of the associated slot. Thus, in the case of Yugabyte, even for the INITIAL_ONLY snapshot mode, a slot needs to be created if one does not exist. With this approach, a connector can be deployed in INITIAL_ONLY mode to consume the initial snapshot. This can be followed by the deployment of another connector in NEVER mode. This connector will continue the streaming from exactly where the snapshot left. **Test Plan** 1. Added new test -` mvn -Dtest=PostgresConnectorIT#snapshotInitialOnlyFollowedByNever test ` 2. Enabled existing test - `mvn -Dtest=PostgresConnectorIT#shouldNotProduceEventsWithInitialOnlySnapshot test` 3. Enabled existing test - `mvn -Dtest=PostgresConnectorIT#shouldPerformSnapshotOnceForInitialOnlySnapshotMode test `

… image (#118) This PR adds the dependencies for the `AvroConverter` to function in the Kafka Connect environment. The dependencies will only be added at the time of building the docker image.

This PR adds a log which will be print the IP of the node every time a connection is created.

Retry in case of failures while task is restarting. Right now any kind of failure will lead to task throwing RetriableException exception causing Task restart.

…te source (#120) **Summary** This PR enables 30/33 tests in IncrementalSnapshotIT for Yugabyte source The tests that are excluded are 1. updates 2. updatesLargeChunk 3. updatesWithRestart **Test Plan** `mvn -Dtest=IncrementalSnapshotIT test`

This PR comments out the part in the init_database i.e. the startup script during tests where some extensions are being installed - it is taking more than 2 minutes at this stage and since we do not need it in the tests we use, it can be skipped.

) This reverts commit b908e58.

Throw retry for all exceptions. In future, we will need to throw runtime exception for wrong configurations.

github-actions · 2024-05-30T06:15:08Z