Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect to Kafka PL/PSC service from TiDB Cloud #19323

Merged
merged 66 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
5e94493
changefeed: add private link access steps
grovecai Oct 17, 2024
45491b0
changefeed: setup Kafka private link service in AWS
grovecai Oct 28, 2024
35d80d6
restruct to make it simple for reconfigure exsting Kafka cluster
grovecai Nov 5, 2024
359866d
add FAQ for AWS part
grovecai Nov 5, 2024
9629427
add output values to network section
grovecai Nov 6, 2024
14247e6
rafactor doc since we rename/remove fields
grovecai Nov 6, 2024
da07007
Add setup self hosted Kafka PSC for GCP
grovecai Nov 7, 2024
c9c3d91
Fix typo and images
grovecai Nov 11, 2024
d06e84c
fix terms
grovecai Nov 11, 2024
5b29bae
Apply suggestions from code review
hfxsd Nov 27, 2024
c0049dc
Update setup-self-hosted-kafka-pls.md
hfxsd Nov 27, 2024
c909031
Update setup-self-hosted-kafka-psc.md
hfxsd Nov 28, 2024
ecfb6df
refined wording and format
hfxsd Dec 5, 2024
f2d7676
Update TOC-tidb-cloud.md
hfxsd Dec 5, 2024
9de77a7
refined - ongoing
hfxsd Dec 5, 2024
408b9ad
nearly finished
hfxsd Dec 5, 2024
ec2533b
fix formats
hfxsd Dec 6, 2024
b7ff092
Update changefeed-sink-to-apache-kafka.md
hfxsd Dec 6, 2024
c7ed9bc
removed level5 and level6 headings
hfxsd Dec 6, 2024
acfb472
fix formats
hfxsd Dec 6, 2024
d936516
Update setup-self-hosted-kafka-psc.md
hfxsd Dec 9, 2024
dc11dbf
fix term and remove uneccessary steps
grovecai Dec 9, 2024
3504c09
Apply suggestions from code review
hfxsd Dec 10, 2024
1939d7c
Apply suggestions from code review
hfxsd Dec 10, 2024
b1d58a8
refined
hfxsd Dec 10, 2024
52bfa77
Apply suggestions from code review
hfxsd Dec 10, 2024
65a5b18
Update tidb-cloud/changefeed-sink-to-apache-kafka.md
hfxsd Dec 19, 2024
9b44054
fix spelling errors
hfxsd Dec 19, 2024
d47a249
Use simpletab in Step 2
hfxsd Dec 19, 2024
aa0f3d7
refine wording
hfxsd Dec 19, 2024
db140d2
refine lists in tables
hfxsd Dec 19, 2024
063aef0
Apply suggestions from code review
hfxsd Dec 24, 2024
84c2dc3
changefeed: support more partition dispatchers and more data formats …
grovecai Dec 26, 2024
d526581
Apply suggestions from code review
hfxsd Dec 26, 2024
bead4c1
Update tidb-cloud/changefeed-sink-to-apache-kafka.md
grovecai Dec 26, 2024
ae30ab7
Apply suggestions from code review
hfxsd Dec 26, 2024
a3a747c
add version restrictions
grovecai Dec 27, 2024
b329918
Apply suggestions from code review
hfxsd Dec 31, 2024
a0e89ac
Update tidb-cloud/tidb-cloud-billing-ticdc-rcu.md
grovecai Jan 2, 2025
aca9214
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
32e79e1
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
24b2664
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
2e9e05d
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
b91ddec
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
102b916
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
9cd4466
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
57c81a3
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
d26b381
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
80a1346
Update tidb-cloud/setup-self-hosted-kafka-pls.md
grovecai Jan 2, 2025
1c626b7
Update tidb-cloud/setup-self-hosted-kafka-psc.md
grovecai Jan 2, 2025
39ce3d2
Apply suggestions from code review
grovecai Jan 2, 2025
031db20
tidb-cloud: use keywords in file name
lilin90 Jan 2, 2025
d7f765f
Merge branch 'changefeed-pl' of https://github.com/grovecai/docs into…
lilin90 Jan 2, 2025
3751006
Apply suggestions from code review
lilin90 Jan 2, 2025
2244cf0
change images
grovecai Jan 2, 2025
26ca6d2
Apply suggestions from code review
grovecai Jan 2, 2025
ad04e26
Revert "tidb-cloud: use keywords in file name"
grovecai Jan 2, 2025
a1974bb
Revert "Revert "tidb-cloud: use keywords in file name""
grovecai Jan 2, 2025
13f438a
Update tidb-cloud/setup-self-hosted-kafka-private-link-service.md
grovecai Jan 2, 2025
d86cd20
Update tidb-cloud/setup-self-hosted-kafka-private-link-service.md
grovecai Jan 2, 2025
6c53953
Update tidb-cloud/setup-self-hosted-kafka-private-service-connect.md
grovecai Jan 2, 2025
f0d1cb2
Update tidb-cloud/setup-self-hosted-kafka-private-service-connect.md
grovecai Jan 2, 2025
4130552
Update tidb-cloud/setup-self-hosted-kafka-private-service-connect.md
grovecai Jan 2, 2025
a39f316
fix link
hfxsd Jan 2, 2025
89efb57
Update tidb-cloud/setup-self-hosted-kafka-private-link-service.md
hfxsd Jan 2, 2025
3614d23
Apply suggestions from code review
hfxsd Jan 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions TOC-tidb-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,9 @@
- [To Kafka Sink](/tidb-cloud/changefeed-sink-to-apache-kafka.md)
- [To TiDB Cloud Sink](/tidb-cloud/changefeed-sink-to-tidb-cloud.md)
- [To Cloud Storage](/tidb-cloud/changefeed-sink-to-cloud-storage.md)
- Reference
- [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md)
- [Setup Self Hosted Kafka Private Service Connect in Google Cloud](/tidb-cloud/setup-self-hosted-kafka-psc.md)
- Disaster Recovery
- [Recovery Group Overview](/tidb-cloud/recovery-group-overview.md)
- [Get Started](/tidb-cloud/recovery-group-get-started.md)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 51 additions & 10 deletions tidb-cloud/changefeed-sink-to-apache-kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ This document describes how to create a changefeed to stream data from TiDB Clou
- Currently, TiDB Cloud does not support uploading self-signed TLS certificates to connect to Kafka brokers.
grovecai marked this conversation as resolved.
Show resolved Hide resolved
- Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same [restrictions as TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview#unsupported-scenarios).
- If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios.
- If you select **Private Link** or **Private Service Connect** as network connectivity method, please make sure version of TiDB cluster satisfy following conditions.
- For 6.5.x, require >= 6.5.9
- For 7.1.x, require >= 7.1.4
- For 7.5.x, require >= 7.5.1
- Support all versions of 8.1.x and later
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

## Prerequisites

Expand All @@ -28,7 +33,21 @@ Before creating a changefeed to stream data to Apache Kafka, you need to complet

### Network

Make sure that your TiDB cluster can connect to the Apache Kafka service.
Make sure that your TiDB cluster can connect to the Apache Kafka service. There are 3 kinds of network connection methods can be used to connect to Kafka.
1. Private Connect
2. VPC Peering
3. Public IP

If you want a quick try, you can choose **Public IP**. If you want cost-effective, you can choose **VPC Peering**, trade off VPC CIDR conflict and security. If you want to get rid of VPC CIDR conflict and satisfy security compliance, **Private Connect** is the choice, but it will introduce extra [Private Data Link Cost](/tidbcloud/tidb-cloud-billing-ticdc-rcu.md#private-data-link-cost)
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

#### Private Connect
Private Connect leverages **Private Link** or **Private Service Connect** technologies which provided by cloud vendors, that allow the resources in your VPC to connect to services in other VPCs using private IP addresses, as if those services were hosted directly in your VPC.

Currently, we only support Private Connect to self-hosted Kafka.
1. If your Apache Kafka service already or will be setup in AWS, please follow [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md) to make sure the network connection is set up properly.
grovecai marked this conversation as resolved.
Show resolved Hide resolved
2. If your Apache Kafka service already or will be setup in Google Cloud, please follow [Setup Self Hosted Kafka Private Service Connect in Google Cloud](/tidb-cloud/setup-self-hosted-kafka-psc.md) to make sure the network connection is set up properly.

#### VPC Peering

If your Apache Kafka service is in an AWS VPC that has no internet access, take the following steps:

Expand All @@ -39,7 +58,7 @@ If your Apache Kafka service is in an AWS VPC that has no internet access, take

3. If the Apache Kafka URL contains hostnames, you need to allow TiDB Cloud to be able to resolve the DNS hostnames of the Apache Kafka brokers.

1. Follow the steps in [Enable DNS resolution for a VPC peering connection](https://docs.aws.amazon.com/vpc/latest/peering/modify-peering-connections.html#vpc-peering-dns).
1. Follow the steps in [Enable DNS resolution for a VPC peering connection](https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-dns.html).
2. Enable the **Accepter DNS resolution** option.

If your Apache Kafka service is in a Google Cloud VPC that has no internet access, take the following steps:
Expand All @@ -49,6 +68,10 @@ If your Apache Kafka service is in a Google Cloud VPC that has no internet acces

You must add the CIDR of the region where your TiDB Cloud cluster is located to the ingress firewall rules. The CIDR can be found on the **VPC Peering** page. Doing so allows the traffic to flow from your TiDB cluster to the Kafka brokers.

#### Public IP

If you want to provide Public IP access to your Apache Kafka service, you need to assign Public IPs to all you Kafka brokers. It is not recommend to use Public IP in production environment for security consideration.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

### Kafka ACL authorization

To allow TiDB Cloud changefeeds to stream data to Apache Kafka and create Kafka topics automatically, ensure that the following permissions are added in Kafka:
Expand All @@ -65,17 +88,35 @@ For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources

## Step 2. Configure the changefeed target

1. Under **Brokers Configuration**, fill in your Kafka brokers endpoints. You can use commas `,` to separate multiple endpoints.
2. Select an authentication option according to your Kafka authentication configuration.

1. For **Kafka Provider**, we only provide **Self-hosted Kafka** option, we will support more later.
> **Note:**
> Currently, we treat all the Apache Kafka Services as self-hosted since we didn't make any special integration to different Kafka Providers, such as Amazon MSK, Confluent ... It doesn't mean that we can not connect to Amazon MSK or Confluent Kafka. If the Kafka Provider can provide standard network connection methods, just like VPC Peering, Public IP, Private Link and Private Service Connect, we definitely can connect to them. You may have question "Can you connect to Amazon MSK by multi VPC which is powered by Private Link technology?" Sorry, we haven't supported it yet since it's not a standard Private Link, but may be later.
2. Select **Connectivity Method** by your Apache Kafka Service setup.
1. If you select **VPC Peering** or **Public IP**, fill in your Kafka brokers endpoints. You can use commas `,` to separate multiple endpoints.
2. If you select **Private Link**
1. Please authorize AWS Account of TiDB Cloud, make sure it can create endpoint for your endpoint service. You can find AWS Account of TiDB Cloud in the tip of the web page.
2. Make sure you select the same **Kafka Type**, **Suggested Kafka Endpoint Service AZ** and fill the same unique ID in **Kafka Advertised Listener Pattern** when you [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md) in **Network** section.
3. Double-check the **Kafka Advertised Listener Pattern** by clicking the button **Check usage and generate**, which will show message to help you validate the unique ID.
4. Fill the **Endpoint Service Name** which is configured in [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md)
5. Fill the **Boostrap Ports**, suggest at least one port for one AZ. You can use commas `,` to separate multiple ports.
3. If you select **Private Service Connect**
1. Make sure you fill the same unique ID in **Kafka Advertised Listener Pattern** when you [Setup Self Hosted Kafka Private Service Connect in Google Cloud](/tidb-cloud/setup-self-hosted-kafka-psc.md) in **Network** section.
2. Double-check the **Kafka Advertised Listener Pattern** by clicking the button **Check usage and generate**, which will show message to help you validate the unique ID.
3. Fill the **Service Attachment** which is configured in [Setup Self Hosted Kafka Private Service Connect in Google Cloud](/tidb-cloud/setup-self-hosted-kafka-psc.md)
4. Fill the **Boostrap Ports**, suggest provide more than one ports. You can use commas `,` to separate multiple ports.
2. Select an **Authentication** option according to your Kafka authentication configuration.
- If your Kafka does not require authentication, keep the default option **Disable**.
- If your Kafka requires authentication, select the corresponding authentication type, and then fill in the user name and password of your Kafka account for authentication.
- If your Kafka requires authentication, select the corresponding authentication type, and then fill in the **user name** and **password** of your Kafka account for authentication.

3. Select your Kafka version. If you do not know that, use Kafka V2.
4. Select a desired compression type for the data in this changefeed.
3. Select your **Kafka Version**. If you do not know that, use Kafka V2.
4. Select a desired **Compression** type for the data in this changefeed.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
5. Enable the **TLS Encryption** option if your Kafka has enabled TLS encryption and you want to use TLS encryption for the Kafka connection.
6. Click **Next** to check the configurations you set and go to the next page.

6. Click **Validate Connection and Next** to test the network connection, if all is well it will go to the next page.
> **Note:**
> If you select **Private Link** or **Private Service Connect** as network connectivity method. There will be extra steps compare to **Public IP** and **VPC Peering**.
> 1. After you click the button, we will try to create endpoint in TiDB Cloud side for **Private Link** or **Private Service Connect**. It may take several minutes.
> 2. After endpoint creation, you need to accept the connection request in cloud vendor console with you account login.
> 3. Then go back you TiDB Cloud console to confirm you have already accepted the connection request, then it will navigate to next page.
## Step 3. Set the changefeed

1. Customize **Table Filter** to filter the tables that you want to replicate. For the rule syntax, refer to [table filter rules](/table-filter.md).
Expand Down
Loading
Loading