Speed has been losing importance in the programming world over the years. The success of Java is a clear example of this: who would want to run any enterprise software on what was basically an interpreted language 30 years ago? Even today, when an application really needs to squeeze as much as possible from hardware, it is written in C/C++ (games, for example).
And yet, speed can still be a problematic business requirement. It is not very prevalent when it comes to billing systems, with a few exceptions: invoice generation, payment processing and mediation (processing of CDRs).
For invoice generation and payment processing, jBilling has been performing very well for years. Even the most demanding large corporation has been well served with relatively modest hardware when it comes to processing thousands of invoices and payments per day with jBilling.
The processing of mediation records is another story, a story that does not go back in time that far. After all, the mediation module is only a couple of years old. The design of this module prioritized flexibility over anything else. That is why jBilling uses a full-blown rules engine like jBoss Rules rather than hard-coding logic, inventing some cumbersome new 'mediation language,' or condemning the user to deal with tons of XML configuration.
But performance suffered quite a bit, to the point that,when it was clear that real-time mediation was the next goal, the performance of the mediation module had to be reviewed.
We quickly found that we were wasting a lot of CPU by calling the rules engine three times for each CDR. The solution was two-fold: make it possible to do mediation, rating, and item management in one call to the rules engine, and then go to the next level and allow for multiple CDRs at the same time with just one call to the rules engine.
To get this done, we had to work very hard: many months of work went into achieving this. The type of work that makes one feel that instead of progressing, we were going backwards: break something, and then try to put it back together so it would do the exact same thing (but hopefully faster). And how much faster it would end up being was anyone’s guess.
To make a long story short, the results were great, but also surprising. For example, having many CDRs being process in one call to the rules engine does not make it faster. The rules engine takes longer to handle the presence of multiple records in memory, so a batch size of two instead of one is much faster, because you get half the overhead of calling the rules engine twice but not much of a penalty from having to manage two at the same time. But having 1000 is slower than having one.
The eureka moment came when we were working on a modest notebook: I saw a test of 10,000 CDRs taking about 300 seconds. That was an average of more than 30 CDRs per second. This was before optimizing the rules for the long distance rating (which could double the output), and on hardware that would never be used in a production system.
After the 2.1.0 release, I saw deployments with a performance of hundreds of records per second on servers that are far from the best hardware money can buy. Now we can relax: there is a clear open source option when it comes to processing call records!