Aaron.
Thanks again for the feedback.
Like you I expected the 1gbps upgrade to speed things up. Hence I was very surprised when it seemed to introduce degradation. We will try again once we have a new switch and cat 6 cables.
My goal is to replicate the performance that you yourself have seen… 250,000 msg / sec is our target now.
We are making progress, and will be sure to highlight anything we learn on this post.
At any rate, your team is quite responsive. Its hugely appreciated.
@dsargrad I’ve seen similar behaviour to what you’re seeing with a good rate for a few seconds then rapidly degrading when running in the cloud. There, it’s because the network interfaces have “tokens” that are consumed during high bandwidth periods. This is to allow bursts of activity, but when testing you see a high rate at the start, then all the tokens are consumed and the rate drops. So perhaps you’re running the broker in a shared virtualised environment, like VMWare, and the administrator has enabled network throttling on the VMs.
Guys,
I have a follow up concern.
We just purchased a new 16 port gigabit Netgear switch to replace our 10/100 Netgear switch. We’ve swapped that into our test configuration. Additionally each and every box is cabled with a brand new CAT6 connector. We are using Centos 7.
I query the NIC and see that the embedded NIC supports the gigabit network
The unfortunate fact is that sdkperf is now running again in an extremely degraded mode. Though configured for 4000 msg/sec we are seeing less than 1/10th of that, and huge variability.
TomF… Your insight is something we will look at further. We are not in a cloud environment. Nor are we on virtual machines. However your thoughts do give us an idea or two relative to experiments that we can try to further isolate this.
I’ll be sure to follow up as soon as we learn more.
We seem to have isolated our problem with performance desegregation with the solace. We setup Solace PubSub+ in two environments.
Plain Docker container using the provided image
In a Kubernetes cluster using the Solace provided helm charts
We ran the tests with SdkPerf, pub-subbing at a rate of 4,000 messages per second and a size of 100 bytes, on both instances and we saw no performance degradation with broker running in a plain docker container. We were able to reach the capacity of our gigabit network on the first broker.
On the second broker however, we see the same performance degradation mentioned above. To us it seems that the problem is with Kubernetes and not our network hardware. There seems to be some problem with using a 1gbps network and how we have setup Solace on Kubernetes/ our cluster as a whole.