Portico Performance Reports

For those interested in the performance one can get/expect from a Portico based federation, this page contains the results of benchmarking tests run during May 2009. Tests tests compared performance results for Portico v0.9, Portico v1.0 and RTI-NGv6.

The Benchmarking Suite
For these tests, a special federation known as PGauge was developed. This test suite contains both Java and C++ federates, whose algorithms are as close to identical as possible. If you are interested, you can obtain PGauge from the Portico source code repository (see the above link for more information).

Throughput Tests
This test is designed to measure the raw throughput capabilities of an RTI. Each federation can contains two types of federates: senders and listeners.

The sender federates register x number of objects (each containing a single attribute) and then attempt to send updates for that attribute as fast as they can. The duration it takes to complete the test is measured (from before the first update until after the last) and the throughput is calculated as the number of updates that can be sent per second. PGauge will also calculate the throughput in terms of KB/s as:

Throughput = (total updates * size of individual update) / duration in seconds

The configuration of the federation used in these tests was as follows:

Computer One (Windows XP SP2, Core 2 Duo 2.33Ghz, 2GB Ram)

This computer contained a single sender federate, sending 100,000 updates a single 256-byte attribute.

Computer Two (Windows XP SP2, Pentium 4 2.8Ghz, 2GB Ram)

This computer contains a variable number of listener federates (from 1-4) as seen in the results.

Both computers were connected via a standard 100mb network switch.

The Test Runs
Each test was completed 5 times, with the average of the 5 runs being taken. For each RTI, this meant that a total of 15 test runs were made: 5 in each configuration (1 sender, 1/2/4 listeners). None of the federates were time regulating or constrained.

For those interested, the precise command lines used to start each of the federates was:

throughput-sender -payload=256b -iterations=100000 -printInterval=10000 throughput-listener -printInterval=10000

RTI Configurations

Where possible, each RTI was configured to have message bundling enabled. Further discussion of the effect of this is included below.

Throughput Results
The following graph shows the results obtained for the Java interfaces:



For all tests, Portico v1.0 and RTI-NG remained very close to one another, deviating only by small margins. These results also clearly demonstrate the dramatically increased performance of Portico from version 0.9 to version 1.0, averaging an improvement of around 500%. This increase is the result of the optimization work undertaken between releases and the use of message bundling services provided by JGroups. This service allows multiple smaller messages to be bundled together and sent in larger blocks, making more efficient use of network resources. RTI-NG also employs such a strategy.

The following graph shows the results obtained for the C++ interfaces:



Again, the level of performance increase between Portico v0.9 and Portico v1.0 is significant, averaging well over 500%. However, RTI-NGv6 throughput performance in C++ is consistently stronger than Portico v1.0 (averaging 45%). This is to be expected as C++ is the native language that RTI-NG is written in, whereas Portico is written in Java. There are still places where Portico performance can be improved with continued refinement. Given the considerable gains made between Portico v0.9 and Portico v1.0, the eventual goal is to gain throughput performance equal to that of RTI-NG.

Results without Bundling
Message bundling works by delaying the sending of messages until one of two things happen:


 * 1) The size of stored messages reaches a certain threshold
 * 2) The time messages have been waiting to be sent exceeds a threshold

By storing up messages and sending them in one larger block, an RTI is able to make more efficient use of network resources. However, due to the need to delay messages, bundling can have an adverse affect on performance in many scenarios. In an ideal situation, the size limit would be reached quickly, meaning that messages are only delayed for a minimal amount of time. However, in the absence of a constant stream of larger amounts of data, performance can be slowed as messages are potentially queued unnecessarily.

Both Portico v1.0 and RTI-NGv6 (and it's decedents) allow you to configure the various size and time thresholds to suit your individual requirements, however, given the variable effect bundling can have on throughput performance, it was considered prudent to re-run all the throughput tests with bundling disabled.

The following graph shows the results obtained for the Java interfaces with bundling disabled:



When forced to send each message individually, Portico was able to perform significantly better than RTI-NGv6, with throughput results averaging 60.74% faster. Portico was also able to better withstand the increased load of having to send each message across the network individually. Across the 15 test runs, the throughput performance of Portico dropped 42.91% compared to 66.12% for RTI-NGv6. As with the message bundling enabled tests, Portico v1.0 shows considerably increased performance over Portico v0.9, although not as much as with bundling enabled (this is expected as Portico v0.9 doesn't support bundling).

The following graph shows the results obtained for the C++ interfaces with bundling disabled:



Unlike the results of the C++ throughput tests with bundling enabled, in this case, Portico v1.0 proved to be the fastest RTI, despite the interface being different from the native language of the RTI implementation. Further, as the number of federates increased, so did the percentage margin by which Portico outperformed RTI-NG (17.13%, 25.73% and 47.65% for the 1, 2 and 4 listener tests respectively). Once again, the performance increase from Portico v0.9 to Portico v1.0 was clear, with the latest version performing approximately 2.5 times faster over the 15 test runs.

Conclusions
What conclusions can we draw from these results?

Firstly, bundling can have a significant impact on throughput performance. Although it has adverse influences on latency, it looks worth experimenting with different threshold values to find the best bundling configuration to meet individual requirements.

Secondly, from the results of the bundling-disabled tests, Portico appears to make better use of the network than DMSO when forced to send a large volume of messages. When doing some profiling tests as part of the performance turning for Portico, it was noticed that in extremely high-throughput scenarios, Portico becomes CPU bound rather than network bound. Portico throughput performance currently appears to be limited by its ability to package/un-package messages rather than to send them over the network. In the bundling-enabled test, DMSO was better at handling this type of task and better able to get messages ready for transmission, consequently giving it better results.

Latency Tests
This test is designed to measure how quickly a message can be sent from one federate to another. In PGauge, this test is implemented by having a federation with two federates: one requester and one responder.

The requester sends a Ping interaction to the federation, at which point the responder should receive it and send back an acknowledgement. The time it takes to send the interaction and for a response to be received is recorded and defines the latency. This test therefor measures roundtrip latency (how long it takes to send and get a response). To get an idea of one-way latency, you can halve the provided values.

The test runs complete this process x number of times and then present a number of statistics about those runs. The results presented here used the "80% average" results. These are generated by discarding the best and worst 10% of results, and taking the average of the middle 80%.

The configuration of the federation used in these tests was as follows:

Computer One (Windows XP SP2, Core 2 Duo 2.33Ghz, 2GB Ram)

This computer contained the requester, sending an interaction with a 256-byte parameter.

Computer Two (Windows XP SP2, Pentium 4 2.8Ghz, 2GB Ram)

This computer contained the responder, sending back acknowledgements with a 256-byte parameter.

Both computers were connected via a standard 100mb network switch.

The Test Runs
Each test was completed 5 times, with the average of the 5 runs being taken. All results are in microseconds.

For those interested, the precise command lines used to start each of the federates was:

latency-requester -payload=256b -iterations=10000 -printInterval=1000 latency-responder -payload=256b -printInterval=1000

RTI Configurations

Where possible, each RTI was configured to have message bundling disabled. In a situation like this, we want the messages to be sent as quickly as possible, without unnecessary delay.

Latency Results
The following graph shows the results obtained for the Java interfaces:



These results show that on average, Portico v1.0 had the lowest latency of all three RTIs. It has improved by 45% from Portico v0.9 and is 13.5% faster than RTI-NG.

The following graph shows the results obtained for the C++ interfaces:



In this case, Portico v1.0 has again significantly improved from version 0.9 (36.5%). However, when compared to RTI-NG, the results have flipped from the Java tests and Portico is slightly slower (27%). These results can be explained by the fact that Java and C++ are the native implementation languages for Portico and RTI-NG respectively. In both cases, the observed gaps between Portico v1.0 and RTI-NG are quite small.

Conclusions
These tests were all executed on a local-area network, where any sort of significant transmission delays are rare. In those situations where latency is a crucial aspect of performance, the networks in use are typically less ideal, perhaps geographically dispersed or limited in their connected speed and reliability.

In a clean-room environment it is clear that the differences between Portico v1.0 and RTI-NG are minimal, with each gaining a small advantage depending on which interface you are using.

Performance Tuning
For more information on how you can tweak the Portico configuration in an effort to tune it to suit your individual requirements, please see the Portico Performance Tweaking page.