Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create new Ensemble #38

Open
hangc0276 opened this issue Jul 13, 2023 · 0 comments
Open

Failed to create new Ensemble #38

hangc0276 opened this issue Jul 13, 2023 · 0 comments

Comments

@hangc0276
Copy link
Owner

hangc0276 commented Jul 13, 2023

Steps to reproduce

  1. Set up a Pulsar cluster with 1 Zookeeper, 2 Bookies (bookie0 -> 127.0.0.1:3181, bookie1 -> 127.0.0.1:3182) and 1 Broker
  2. Set Bookie with RackAwarePlacementPolicy and doesn't configure rack info
  3. Set Broker's E-W-A to 2-2-2
  4. Use pulsarctl-bookie-rackinfo command to set rack info for bookie0 pulsarctl-bookie_rackinfo -b 127.0.0.1:3181 -z 127.0.0.1:2281 -r /test-region/test-rack (Note: for RackAwarePlacementPolicy, the rack name /test-region/test-rack is not allowed in bin/pulsar-admin, but allowed in pulsarctl-bookie-rackinfo command. So we use pulsarctl-bookie-rackinfo command to set rack info)

After execute the above command, the broker throws the following exception, but new ledger can be created.

2023-07-13T18:33:03,327+0800 [main-EventThread] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T18:33:03,406+0800 [main-EventThread] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T18:33:03,438+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 1
Expected number of leaves:1
/default-rack/127.0.0.1:3182

2023-07-13T18:33:03,441+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
  1. Restart bookie1. The broker throws the following exception and new ledgers can't be created
2023-07-13T18:34:48,787+0800 [pulsar-registration-client-33-1] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3182
2023-07-13T18:34:56,751+0800 [main-EventThread] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 127.0.0.1:3182 -> BookieServiceInfo{properties={}, endpoints=[
EndpointInfo{id=bookie, port=3182, host=127.0.0.1, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-07-13T18:34:56,764+0800 [main-EventThread] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 127.0.0.1:3182 -> BookieServiceInfo{properties={}, endpoints=[
EndpointInfo{id=bookie, port=3182, host=127.0.0.1, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-07-13T18:34:56,765+0800 [pulsar-registration-client-33-1] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3182> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0

2023-07-13T18:34:56,765+0800 [pulsar-registration-client-16-1] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3182> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0

2023-07-13T18:34:56,765+0800 [pulsar-registration-client-33-1] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Unexpected exception while handling joining bookie 127.0.0.1:3182
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.handleBookiesThatJoined(TopologyAwareEnsemblePlacementPolicy.java:719) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.handleBookiesThatJoined(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.handleBookiesThatJoined(RackawareEnsemblePlacementPolicy.java:249) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onClusterChanged(TopologyAwareEnsemblePlacementPolicy.java:665) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onClusterChanged(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.onClusterChanged(RackawareEnsemblePlacementPolicy.java:92) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.BookieWatcherImpl.processWritableBookiesChanged(BookieWatcherImpl.java:197) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.BookieWatcherImpl.lambda$initialBlockingBookieRead$1(BookieWatcherImpl.java:233) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient.lambda$updatedBookies$6(PulsarRegistrationClient.java:183) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.87.Final.jar:4.1.87.Final]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-13T18:34:56,765+0800 [pulsar-registration-client-16-1] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Unexpected exception while handling joining bookie 127.0.0.1:3182
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.handleBookiesThatJoined(TopologyAwareEnsemblePlacementPolicy.java:719) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.handleBookiesThatJoined(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.handleBookiesThatJoined(RackawareEnsemblePlacementPolicy.java:249) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onClusterChanged(TopologyAwareEnsemblePlacementPolicy.java:665) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onClusterChanged(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.onClusterChanged(RackawareEnsemblePlacementPolicy.java:92) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.BookieWatcherImpl.processWritableBookiesChanged(BookieWatcherImpl.java:197) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.BookieWatcherImpl.lambda$initialBlockingBookieRead$1(BookieWatcherImpl.java:233) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient.lambda$updatedBookies$6(PulsarRegistrationClient.java:183) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]

The ledger created failed with the following logs

2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/tt/persistent/t2-partition-0] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 116, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 50, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 48], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
2023-07-13T18:35:08,412+0800 [bookkeeper-ml-scheduler-OrderedScheduler-9-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
  1. Use the following command to set bookie1's rack info with the same format as bookie0 pulsarctl-bookie_rackinfo -b 127.0.0.1:3182 -z 127.0.0.1:2281 -r /test-region/test-rack. The broker will throw the following exception and new ledger still can't be created out.
2023-07-13T19:12:10,051+0800 [main-EventThread] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-r
ack, hostname=null), 127.0.0.1:3182=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T19:12:10,061+0800 [main-EventThread] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T19:12:10,061+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0

2023-07-13T19:12:10,062+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
2023-07-13T19:12:10,075+0800 [main-EventThread] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Bookie rack info updated to Optional[{default={127.0.0.1:3181=BookieInfoImpl(rack=/test-region/test-r
ack, hostname=null), 127.0.0.1:3182=BookieInfoImpl(rack=/test-region/test-rack, hostname=null)}}]. Notifying rackaware policy.
2023-07-13T19:12:10,078+0800 [main-EventThread] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/127.0.0.1:3181
2023-07-13T19:12:10,078+0800 [main-EventThread] ERROR org.apache.bookkeeper.net.NetworkTopologyImpl - Error: can't add leaf node <Bookie:127.0.0.1:3181> at depth 3 to topology:
Number of racks: 0
Expected number of leaves:0

2023-07-13T19:12:10,078+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to update bookie rack info: 127.0.0.1:3181
org.apache.bookkeeper.net.NetworkTopologyImpl$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
        at org.apache.bookkeeper.net.NetworkTopologyImpl.add(NetworkTopologyImpl.java:416) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.lambda$onBookieRackChange$0(TopologyAwareEnsemblePlacementPolicy.java:754) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onBookieRackChange(TopologyAwareEnsemblePlacementPolicy.java:746) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onBookieRackChange(RackawareEnsemblePlacementPolicyImpl.java:80) ~[org.apache.bookkeeper-bookkeeper-server-4.14.7.jar:4.14.7]
        at org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.lambda$handleUpdates$3(BookieRackAffinityMapping.java:265) ~[io.streamnative-pulsar-broker-common-2.10.4.3.jar:2.10.4.3]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:244) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:188) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490) ~[io.streamnative-pulsar-metadata-2.10.4.3.jar:2.10.4.3]
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[org.apache.zookeeper-zookeeper-3.6.4.jar:3.6.4]
  1. Restart the broker, this issue is resolved and new ledgers can be created.

Broker logs
pulsar-broker-MacBook-Pro-3.lan.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant