[Questions] Classic queue overdelivers to closed channel with global prefetch #13229
Unanswered
gomoripeti
asked this question in
Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Community Support Policy
RabbitMQ version used
other (please specify)
Erlang version used
26.2.x
Operating system (distribution) used
ubuntu
How is RabbitMQ deployed?
Debian package
rabbitmq-diagnostics status output
See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics
Logs from node 1 (with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 2 (if applicable, with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 3 (if applicable, with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
rabbitmq.conf
See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location
Steps to deploy RabbitMQ cluster
deb package
Steps to reproduce the behavior in question
_
advanced.config
See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location
Application code
# PASTE CODE HERE, BETWEEN BACKTICKSKubernetes deployment file
What problem are you trying to solve?
Note: this issue happens when a channel uses global qos which is a deprecated feature (however it is still permitted on latest main and 4.0.x.)
When AMQP 0.9.1 global qos is used the queue process asks the limiter process of the consumer channel if it can deliver the next message. However if the limiter process does not exist or shutting down,
can_sendallows the delivery. This puzzling behaviour results in the queue delivering messages to a dead consumer over the original prefetch count until it runs out of credit (200 by default). While the queue process is executingrun_message_queuein a loop it does not handleDOWNmessages from terminated consumer channels. There is extra cost of the manycan_sendcalls if the queue and channel processes are on different nodes. And the unacknowledged messages have to be loaded and kept in memory this way the queue process memory gets bloated.The problem in production that we've observed is a classic queue getting into a "bad" state when it has a long (~10K) internal gen_server2 message queue, it has a lot of messages (let's say 100K) and more and more go into unacked state although on the consuming client side there is none received. Normally there are 60 consumers of the queue (each from a separate connection/channel) The client application tries to scale out by adding a few more consumers but
basic.consumealways times out and the connections are closed and subscription is retried. Because the queue is always way behind processing thebasic.consumeand channel down events it can never recover. (Or very slow to recover after all the clients are stopped from retrying).Stacktrace of the queue process is dominantly:
My questions:
ExitValue = trueinrabbitmq-server/deps/rabbit/src/rabbit_limiter.erl
Line 221 in a87036b
false, to prevent sending messages to dead consumers? (I think thattrueis a leftover from 2008 when the limiter process was shut down when it was not used, but today the limiter is always running next to the channel process but marked by the queue asdormantif unused)ExitValue = trueaffect any other code path than amqp 0.9.1 consumer with global qos? (eg amqp 1.0 consumers?)ExitValuefromtrueto `false be accepted?Beta Was this translation helpful? Give feedback.
All reactions