[Questions] Messages stuck in unack after consumers scaled down, TPS dropped to 0 #14611
-
Community Support Policy
RabbitMQ version used4.0.7 Erlang version used27.3.x Operating system (distribution) usedRHEL 9 How is RabbitMQ deployed?RPM package rabbitmq-diagnostics status outputSee https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics Logs from node 1 (with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs Logs from node 2 (if applicable, with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs Logs from node 3 (if applicable, with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs rabbitmq.confSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location Steps to deploy RabbitMQ clusterWe run RabbitMQ as a 3-node bare-metal cluster on RHEL 9 virtual machines, fronted by a load balancer. Each VM has 24 CPUs, 48 GiB RAM, and 1 TB storage. The load balancer exposes ports 5672, 15672, and 15692. Steps to reproduce the behavior in questionThe issue is intermittent, but observed pattern is:
advanced.configSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location Application codeKubernetes deployment fileWhat problem are you trying to solve?During message processing in RabbitMQ, the TPS for delivery/ack suddenly drops to 0 while incoming messages continue. This leads to a backlog. When consumers are scaled down, some messages remain stuck in unack instead of moving back to ready. Even with no active consumers, RabbitMQ did not release those messages. Only after restarting RabbitMQ services one by one did the messages resume flowing, but database write queues still had pending backlog. What could be the underlying issue that causes for messages to remain stuck in unack after consumers are scaled down (no consumers present)? And TPS (delivery/ack rate) to drop to 0 while incoming continues? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Are you confident all consuming connections have been closed? There may be clues in your broker logs or in the management UI. |
Beta Was this translation helpful? Give feedback.
-
That's no evidence of a bug in RabbitMQ. We observe consumers that do not acknowledge deliveries all the time and had to introduce a protection mechanism a few years ago because for quorum queues at the moment, such unconfirmed messages become a problem beyond a certain number of unconfirmed messages. It can be a matter of consumer connections not being detected as closed immediately, too. Automatic requeueing should eventually take care of it. |
Beta Was this translation helpful? Give feedback.
-
|
RabbitMQ |
Beta Was this translation helpful? Give feedback.
-
This sounds like you have a bug in your consumers. I've seen cases exactly like this where users think there must be something wrong with RabbitMQ when in fact their consumers process a message, hit an exception, and then just stop, because they don't handle the exception correctly.
I seriously doubt that you had 0 consumers. RabbitMQ's behavior since day one is to re-enqueue unacked messages when consumers disappear. As @michaelklishin said, the version of RabbitMQ you're using isn't eligible for free community support. If you'd like to receive free support from the RabbitMQ maintainers, you must do the following -
|
Beta Was this translation helpful? Give feedback.
This sounds like you have a bug in your consumers. I've seen cases exactly like this where users think there must be something wrong with RabbitMQ when in fact their consumers process a message, hit an exception, and then just stop, because they don't handle the exception correctly.
I seriously doubt that you had 0 consumers. RabbitMQ's behavior since day one is to re-enqueue unacked messages when consumers disappear.
As @michaelklishin said, the version of RabbitMQ you're using isn't…