As I mentioned in my previous article, I have been busy doing performance tests of WebSphere MQ and Apache ActiveMQ (also see the video of the benchmark). Now I am ready to publish results, but only for the persistent tests. Unfortunately the networking setup of my server could not keep up with the performance of WebSphere MQ for non-persistent messages and my VMware virtual network adapter got saturated with traffic before I could use up all of the 8 cores on my VM. Hence I have two choices – either cap number of cores to something less than 8 or find a hardware with faster network. I will be doing both of these things in the next few weeks.
Test setup and configuration
Detailed description of the test and topology, hardware specs, software setup instructions and tuning options for WMQ and AMQ can all be found in my previous article. In short, the benchmark used persistent transactional JMS messaging for both WebSphere MQ 7.5 and Apache ActiveMQ 5.9. Load driver was IBM Performance Harness for JMS. You can download benchmark scripts and inspect configuration details from my Dropbox folder.
Persistent messaging performance comparison
The reason persistent tests did not run into the network limitation is because messages are stored on disk and disks along with CPU become a bottleneck in the benchmark before network capacity is used up. In my tests WebSphere MQ ran 3 to 4 times faster with non-persistent messages than it did with persistent messages. But, again, network was a bottleneck in non-persistent tests, so I expect overall non-persistent performance to be higher than 3 or 4 times I observed in my tests.
The results you can see in the diagram below were obtained by iterative tuning of both Apache ActiveMQ and WebSphere MQ. I have tried different settings on the server and on the client, while trying to find the maximum rate of messages per second for both systems. It turned out that for my configuration (your own mileage will vary), the best message rate was obtained with 80 concurrent client threads running IBM Performance Benchmark for JMS. I have tested different message sizes (from 256 bytes up to 1 MB). As discussed in part 1 of this series, I had 8 cores per VM and 4 SSDs. Because I had 4 SSDs, the best performance was when running 4 instances of WebSphere MQ Queue Managers and 4 instances of Apache ActiveMQ Brokers. Each instance had its own SSD to write data to. In case of WebSphere MQ I had one SSD for the transaction log file and another SSD for the queue data file. Unfortunately the queue data file shared the disk access with another Queue Manager transaction log. If I had 8 SSDs, I could have been able to run WebSphere MQ even faster because all transaction logs and queue data could be split across individual disks.
The measurements shown above are average across six runs, each test run was 20 minutes long. Each test run used one of the 5 messages sizes (256 bytes, 1K, 10K, 100K and 1MB). Production workload would normally have many different message sizes, but it is easy to approximate the performance using numbers above as a reference. It is possible to change individual tuning options in WMQ and AMQ to make certain messages sizes run a bit faster at the cost of slowing down other message sizes. After many hundreds of different permutations of tuning options I settled on the one that gave overall best performance for both WMQ and AMQ – the reason being that production workload usually has a mix of different message sizes.
As you can see, on average in my tests WebSphere MQ 7.5 was 60% to 90% faster compared to Apache ActiveMQ 5.9 for persistent tests. What does this mean? You would need to use 60% to 90% more resources to run the same workload on ActiveMQ as you would on WebSphere MQ, which in turns means:
- 60% to 90% more hardware cost
- 60% to 90% more data center space
- 60% to 90% more cooling
- 60% to 90% more power
- 60% to 90% more software installed (and additional cost if you buy ActiveMQ support)
- 60% to 90% more administration cost to manage all of the above
Even if all other functions of WebSphere MQ and ActiveMQ were equal (and they are not), the mere performance gains may pay for the cost of the WebSphere MQ licenses and be worth the investment. You do not have to trust my word. I made all of the scripts and tuning options available for all to see (see part 1).
What is next?
I just got a new server with 10GB Ethernet cards and will run non-persistent tests on that server. I may also try to cap number of cores per VM to 2 or 4 (down from 8) and publish results in those configurations.
How credible is this work?
I have done my best tuning WMQ and AMQ, have read performance manuals from IBM, Apache and Red Hat, but there is no guarantee that I did the best possible job tuning WMQ or AMQ. I am sure someone in the IBM UK Hursley lab or Apache ActiveMQ performance experts who do it every day as full time job can make things go a bit faster. But given that I have spent considerable amount of time tuning both systems, I would not expect the average end user of WMQ or AMQ to get any better results than I got and therefore I believe this is credible result in a sense that it is statistically likely (while may not be optimal). However, if anybody looks at my scripts and makes suggestions on how to improve upon this work, I am happy to listen and rerun the tests.
Automatic software tuning
The performance testing could be a slow process mainly because it is so iterative and manual. However I believe it is possible to solve performance tuning problem automatically. For the time being I have not gotten around to implementing this just yet, but here is my idea on how this can be done quickly and efficiently using neural networks or some sort of genetic algorithm for optimization. Both WMQ and AMQ have many independent tuning and load parameters that influence the message rate. I would like to develop an automated method that would produce the maximum message per second rate on a given hardware.
- # of CPUs, memory, network and other hardware parameters are a constant (or at least there is a known maximum).
- Version of OS and the messaging products is a constant (RHEL 6.5 and WebSphere MQ 7.5 as an example)
- Message size
- Between 1 to 20 minutes to complete one test measurement
- Maximum rate of messages per second
- OS tuning options (kernel, TCP and other network and memory stuff)
- Messaging software tuning settings – of which there are several dozens (buffer sizes of all sorts, number of threads, number of Queue Managers, number of queues, different kinds of bindings for clients, security settings, log types, number and sizes of log and data files, etc…..)
- Number of message producers and consumers (say a range from 1 to 10,000)
Currently performance tuning is done as an iterative process – change one variable at a time, measure msg/sec, rinse and repeat. I have developed a script framework that can iterate over different settings for WMQ and AMQ, but if I do this as a nested loop that is 50 levels deep and iterates over every possible value for each variable this process will take hundreds of years to complete. Including a random factor might help, but what I would like to do is to use neural network, genetic algorithm or other method to find optimum settings for those tuning variables. I found some research papers that talk about this kind of thing, but could not find any real implementations. Has anybody done it? Any recommendations on how to proceed?