I’m trying to find in documentation, but without luck how consistent.
I have a pattern where I’m asynchronously processing the message, and once successful, I will ACK it.
Process that launch async processing need to ensure that if for some reason this thread is stuck, ooms etc, we set the outcome to FAILED, NACKing the message. I can’t synchronize those two threads so, I end up with race condition:
if Solace receives: ACK and then NACK - what will solace do with the message?
if Solace receives: NACK and then ACK - what will solace do with the message?
Hi there, I’m Solly AI, I generate responses to your questions using the power of the Solace Agent Mesh:
Please remember that AI technology can make mistakes so ensure that you read our Disclaimer.
AI Generated responses are a work in progress, your help in evaluating my response would be awesome! If my response makes sense, please mark it as a solution or give a . If my response doesn’t make sense for your topic, please react to it with a .
Based on the provided documentation, I’ll analyze what happens in race conditions between ACK and NACK operations on the same message for a software event broker using the JCSMP API.
Reasoning:
The documentation doesn’t explicitly state what happens in these race conditions
However, we can understand the following key points about acknowledgments:
ACKs and NACKs are settlement outcomes that tell the event broker how to handle a message
ACCEPTED (ACK) tells the event broker to remove the message from its queue
FAILED (NACK) tells the event broker to attempt redelivery
REJECTED (NACK) tells the event broker to remove from queue and move to DMQ if configured
The documentation mentions that NACKs can be lost during transit due to networking issues
The API and event broker use a windowed acknowledgment system with timeouts and retries
The event broker ensures de-duplication of retransmitted messages
Given this information, I cannot definitively answer what happens in these specific race conditions as the documentation does not explicitly cover these scenarios. Making assumptions about the behavior could lead to incorrect application design.
For reliable message handling, the documentation suggests:
Use client acknowledgment mode appropriately
Handle redelivered messages properly
Acknowledge messages as soon as processing is complete
Consider that NACKs can be lost in transit
Be aware that messages can be redelivered even after being acknowledged due to network issues
For the specific race condition scenarios you asked about, I recommend:
Consulting the Solace documentation for any updates that might address these scenarios
Testing the behavior in a development environment if needed
Designing applications to handle message processing and acknowledgment atomically to avoid race conditions
Good point on “lost in transit” outcome - this applies to both nack and ack.
I’m curious about happy path here - considering both nack and ack are delivered, is there any state machine documented and are there consistency guarantees that ensures that NACKed (or ACKed) message can’t change it’s state.
This is a good question. There is no outcome state directly stored with the message. However the outcome does effect the message so this is indirectly a state.
Specifically:
SOLCLIENT_OUTCOME_ACCEPTED: This causes the message to be removed from the queue and so any subsequent settlement will be ignored as the message no longer exists. So ACK then NACK (of any types) will not reverse the acknowledgement, the message no longer exists.
SOLCLIENT_OUTCOME_REJECTED: This causes the message to be removed from the queue (optionally moved to dead message queue (DMQ) if so configured) and so any subsequent settlement will be ignored as the message no longer exists. So REJECT then ACK/NACK will not reverse the acknowledgement, the message no longer exists.
SOLCLIENT_OUTCOME_FAILED: This is the interesting one. This causes the message to be prepared for retransmission, and consequently renumbered at the protocol level (though not renumbered in the message spool). Therefore if you send a FAIL outcome, followed by an ACCEPT or REJECT, the subsequent settlement is ignored as it will not contain the correct msgID and the message will still be resent at the first opportunity if it has not already been resent.
@Ragnar , follow up question - “removed from the queue” - is it same effect as running /action/msgVpns/%s/queues/%s/msgs/%d/delete using SEMP endpoint? or is there any other difference/side effect? My only guess is that SEMP call may be logged/audited as this is more like admin interface.
As you have noticed, the SEMP call is an admin action. You need to provide the admin credentials to take this action and the role therefore has a great deal more power than a client connection. Although admin privileges may be scoped to a vpn, it still has more privilege than a client.
There are other issues as well. The ‘msgid’ in /action/msgVpns/%s/queues/%s/msgs/’msgid’/delete is the broker unique message Id assigned when the message is first spooled, also known as the spooler-msg-id. This id is not visible on all flows, and definitely not on flows that have enabled FAIL or REJECT settlement. So even if your background message processor wanted to, it could not create the correct SEMP command to delete the message. SEMP message delete is intended for an administrative action where the spooler message id is known through some other administrative display, or via a browser flow (browser flows do report the spooler-msg-id to the application).
Solace does not have a message acknowledgement mechanism that will allow you to remove messages from the queue independent of the flow on the which the message is received. Unfortunately this means in your current design your asynchronous message processor will need to remain tightly coupled to the message receiver. In which case you might as well generate the ack/nack in one place to avoid the race condition. Either by passing the flow information to the ‘message processor’ and having ack/nack generated there when it’s done, or by having the message processor signal the receiver when it should ack/nack the received message.
Note that the default number of unacknowledged messages per flow is quite large, so not sending an acknowledgement at all will not be a problem, but it does prevent you from triggering an immediate retransmission that occurs when the FAIL settlement is used. If a message is not acknowledged at all, it is only retransmitted when the flow is destroyed and re-created.
@Ragnar , thanks for the response. I agree that SEMP has wider scope, I’ve asked in other thread about reasoning why SEMP api can’t be more granularly protected.
I have challenge with semantics here, when I receive message in flow (wtith fail/reject settlement enabled), I have string/long messageId that looks like and increments like offset. Is this spooler-msg-id? Because based on it, I was able to remove messages using SEMP.
Thanks for review of the design and suggesting changes. I’m aware that the best way to move forward is to ack the message in the process that has receiver that received the message. At this moment I’m exploring are there any alternatives.
I am surprised this worked for you,I have just run some tests and there is definitely a difference between the spooler-msg-id and the msg-id seen on a flow. Typically, when you create a flow you start receiving messages from ‘1’.
However if you look at the queue in admin portal, or at the cLI, the messages are not numbered from ‘1’. Unless you are testing on a freshly installed broker, which perhaps may be the case in your testing. So if you publish 10 messages that are added to queue ‘q1’ and 10 more to queue ‘q’2’ the second queue will have messages 11-20 .. this number increments forever until the broker is re-installed (on my broker I see message ids like 144233694).
However when you connect a flow to the broker, the message Id you will see in the flow start at 1. This is because to support things like settlement reject, non-exclusives queue, partition queues, the protocol needs to renumber the messages to get a continuously increasing sequence. And therefore the message id seen on the flow is local to that flow and not valid outside that flow.
is it something that can be changed? Is it limitation/restriction of the broker or library?
The protocol has not been defined to do this, so it would require both broker and library chances to implement.