KAFKA-17877: Only call once maybeSendResponseCallback for each marker #17619

CalvinConfluent · 2024-10-28T17:07:37Z

We should only call once maybeSendResponseCallback for each marker during the WriteTxnMarkersRequest handling.
Consider the following 2 cases:

First
We have 2 markers to append, one for producer-0, one for producer-1
When we first process producer-0, it appends a marker to the __consumer_offset.
The __consumer_offset append finishes very fast because the group coordinator is no longer the leader. So the coordinator directly returns NOT_LEADER_OR_FOLLOWER. In its callback, it calls the maybeComplete() for the first time, and because there is only one partition to append, it is able to go further to call maybeSendResponseCallback() and decrement numAppends.
Then it calls the replica manager append for nothing, in the callback, it calls the maybeComplete() for the second time. This time, it also decrements numAppends.

Second
We have 2 markers to append, one for producer-0, one for producer-1
When we first process producer-0, it appends a marker to the __consumer_offset and a data topic foo.
The 2 appends will be handled by group coordinator and replica manager asynchronously.
It can be a race that, both appends finishes together, then they can fill the markerResults at the same time, then call the maybeComplete. Because the partitionsWithCompatibleMessageFormat.size == markerResults.size condition is satisfied, both maybeComplete calls can go through to decrement the numAppends and cause a premature response.

Note: the problem only happens with KIP-848 coordinator enabled.

https://issues.apache.org/jira/browse/KAFKA-17877

jolshan

Thanks for finding this. LGTM

…apache#17095)

CalvinConfluent · 2024-10-31T16:07:35Z

@jolshan Can you help take a look? I updated the fix for the second case.

jolshan

LGTM as long as tests pass

core/src/test/scala/unit/kafka/server/KafkaApisTest.scala

core/src/main/scala/kafka/server/KafkaApis.scala

core/src/test/scala/unit/kafka/server/KafkaApisTest.scala

jeffkbkim · 2024-11-04T16:28:31Z

core/src/main/scala/kafka/server/KafkaApis.scala

@@ -2431,8 +2431,12 @@ class KafkaApis(val requestChannel: RequestChannel,
        }

        val markerResults = new ConcurrentHashMap[TopicPartition, Errors]()
-        def maybeComplete(): Unit = {
-          if (partitionsWithCompatibleMessageFormat.size == markerResults.size) {
+        val numPartitions = new AtomicInteger(partitionsWithCompatibleMessageFormat.size)


To understand the first case:

we call replicaManager#appendRecords (with empty records) while simultaneously appending via the new coordinator. This incorrectly decrements the number of appends counter and thus sends the premature response.

The similarity in both cases is that we prematurely decrement this numAppends counter.

Is my understanding correct?

That is correct. It is common if the new coordinator returns an error for the partition(NOT_LEADER_OR_FOLLOWER) fast enough to get ahead of the callback of the replicaManager#appendRecords

jolshan · 2024-11-04T20:04:25Z

@jeffkbkim @dajac Any further comments? Looks like the build is green.

jeffkbkim

LGTM. thanks for the find and fix!

dajac

LGTM, thanks for the fix.

…apache#17619) We should only call once `maybeSendResponseCallback` for each marker during the WriteTxnMarkersRequest handling. Consider the following 2 cases: First We have 2 markers to append, one for producer-0, one for producer-1 When we first process producer-0, it appends a marker to the __consumer_offset. The __consumer_offset append finishes very fast because the group coordinator is no longer the leader. So the coordinator directly returns NOT_LEADER_OR_FOLLOWER. In its callback, it calls the maybeComplete() for the first time, and because there is only one partition to append, it is able to go further to call maybeSendResponseCallback() and decrement numAppends. Then it calls the replica manager append for nothing, in the callback, it calls the maybeComplete() for the second time. This time, it also decrements numAppends. Second We have 2 markers to append, one for producer-0, one for producer-1 When we first process producer-0, it appends a marker to the __consumer_offset and a data topic foo. The 2 appends will be handled by group coordinator and replica manager asynchronously. It can be a race that, both appends finishes together, then they can fill the `markerResults` at the same time, then call the `maybeComplete`. Because the `partitionsWithCompatibleMessageFormat.size == markerResults.size` condition is satisfied, both `maybeComplete` calls can go through to decrement the `numAppends` and cause a premature response. Note: the problem only happens with KIP-848 coordinator enabled. Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>

KAFKA-17877: Skip calling replica manager append if nothing to append

6b4dfcc

github-actions bot added core Kafka Broker small Small PRs labels Oct 28, 2024

jolshan added transactions Transactions and EOS KIP-848 The Next Generation of the Consumer Rebalance Protocol labels Oct 28, 2024

jolshan approved these changes Oct 28, 2024

View reviewed changes

Kafka-17877: Only call once maybeSendResponseCallback for each marker (…

6052c78

…apache#17095)

CalvinConfluent changed the title ~~KAFKA-17877: Skip calling replica manager append if nothing to append~~ KAFKA-17877: Only call once maybeSendResponseCallback for each marker Oct 31, 2024

jolshan approved these changes Oct 31, 2024

View reviewed changes

dajac reviewed Nov 4, 2024

View reviewed changes

core/src/test/scala/unit/kafka/server/KafkaApisTest.scala Outdated Show resolved Hide resolved

jeffkbkim reviewed Nov 4, 2024

View reviewed changes

Address comments

483fc3d

jeffkbkim approved these changes Nov 4, 2024

View reviewed changes

dajac approved these changes Nov 5, 2024

View reviewed changes

dajac merged commit c91243a into apache:trunk Nov 5, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-17877: Only call once maybeSendResponseCallback for each marker #17619

KAFKA-17877: Only call once maybeSendResponseCallback for each marker #17619

KAFKA-17877: Only call once maybeSendResponseCallback for each marker #17619

KAFKA-17877: Only call once maybeSendResponseCallback for each marker #17619

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment