Random "failed to retrieve" error in streaming scenario

Hello Ragnar,
thank you for the quick response.
Let me try to wrap up the architecture behind my project. The general exercise is to create a real time backend with stream processing to aggregate any kind of data.
My data generator is a self designed pos application which sends out transaction data, this goes through a traefik loadbalancer to validation api, to check the schema and do some corrections on data (ex. track if item price sum up to total price). The validation service just let through the validated data and this is send to a solace topic. Behind that topic is a queue. From that queue I receive the data in a solace consumer and emit the data to my apache flink job (which doesnt work atm, also tried it with bytewax).
self designed pos application: scaled to 5 replicas in docker compose setup

loadbalancer traefik: just one

validation service: scaled to 5 replicas in docker compose setup

solace consumer and therefore the flink job: just one, task manager has parallelism to 4 and 2 replicas with 2 numOfTask each.

All services are written in python, for flink I use pyflink as the bridge to the underlying java components.

The Solace Producer and Solace Consumer are also created in python (SMF) and had the Solace HowTo GitHub as reference.
As for now the complete pipeline is just passing through data, because the flink job has some interruptions and do nothing. I tried to check on this, and it could be that the resulting errors I received are tracked back to the flink job, that is interrupting and not receiving the messages in the right way (No good ACK work by me). That maybe it could result in a scenario where the flink job got duplicates in some way and due to error throwback the broker acts like you see in my errors. (Just a guess, but maybe it helps)

Hope I could bring some light in this,

if you need more - let me now

BR Jonas