Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] org.apache.hadoop.fs.FileAlreadyExistsException when loading data from mysql to paimon using paimon cdc-ingestion #4900

Open
1 of 2 tasks
kwokhh opened this issue Jan 14, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@kwokhh
Copy link

kwokhh commented Jan 14, 2025

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.9

Compute Engine

flink 1.20

Minimal reproduce step

1: Deploy flink-mysql-cdc-paimon (Synchronizing Databases) with config: changelog-producer: input
2: Write data to source in mysql db
Config:
"mysql_sync_database",
"--warehouse", "",
"--database" , "",
"--mysql_conf", "hostname=",
"--mysql_conf", "port=",
"--mysql_conf", "username=USER",
"--mysql_conf", "password=PASSWORD",
"--mysql_conf", "database-name=DATABASE_NAME",
"--mysql_conf", "server-id=",
"--mysql_conf", "server-time-zone=",
"--including_tables", "",
"--catalog_conf", "metastore=filesystem",
"--catalog_conf", "case-sensitive=false",
"--table_conf", "bucket=4",
"--table_conf", "changelog-producer=input",
"--table_conf", "changelog.time-retained=730d"

What doesn't meet your expectations?

Expect the job can successfully write to the paimon database

Anything else?

Error log:

java.lang.RuntimeException: java.io.UncheckedIOException: org.apache.hadoop.fs.FileAlreadyExistsException: /alex/dataproject/flink/data/paimon/warehouse/classroom.db/student/changelog/changelog-1000 for client 123.123.123 already exists
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:389)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2703)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2596)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:799)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:494)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)

at org.apache.paimon.table.sink.TableCommitImpl.expire(TableCommitImpl.java:318)
at org.apache.paimon.table.sink.TableCommitImpl.commitMultiple(TableCommitImpl.java:212)
at org.apache.paimon.flink.sink.StoreCommitter.commit(StoreCommitter.java:112)
at org.apache.paimon.flink.sink.CommitterOperator.commitUpToCheckpoint(CommitterOperator.java:221)
at org.apache.paimon.flink.sink.CommitterOperator.notifyCheckpointComplete(CommitterOperator.java:198)
at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:104)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.notifyCheckpointComplete(RegularOperatorChain.java:145)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:478)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointComplete(SubtaskCheckpointCoordinatorImpl.java:411)
at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:1565)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$20(StreamTask.java:1506)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$23(StreamTask.java:1545)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:101)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:414)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:383)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:368)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:229)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:973)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:917)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:970)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:949)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:763)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.base/java.lang.Thread.run(Unknown Source)

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@kwokhh kwokhh added the bug Something isn't working label Jan 14, 2025
@yangjf2019
Copy link
Contributor

Hi @kwokhh ,could you please provide more detail code?and also,open the debug mode to print more logs.

@yangjf2019
Copy link
Contributor

What are your paimon sink parallelism and flink parallelism settings? Can you set them both to 1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants