This post will highlight the performance improvements in the Sitecore Publishing Service v2; covering the utilization of the hardware resources, the speed of the publishing process, and the mitigation of network latency.
The new Sitecore Publishing Service brings massive performance improvements to customers compared to the old publishing. The publishing component was completely redesigned with the following performance goals in mind:
Efficiency in Consuming Hardware Resources
The more efficient the service is, the cheaper it is to run on the cloud. The new publishing service consumes as little CPU and Memory as possible. A considerable amount of effort has been invested in making sure that the code is efficient; this included multiple iterations of CPU and Memory profiling, re-designing and refactoring the code to make it as efficient as can be. This led to many optimizations such making sure that the service releases any unused references as early as possible. The service is even clever enough to de-reference duplicate string objects stored in each item such as language strings (e.g. “en”), so that only one object is referenced. Such optimizations have massively decreases the memory footprint.
The publishing service has been load tested using a large dataset that included over than 1 million item variants (languages and versions).
The CPU usage was about 17% and the memory usage didn’t exceed 400mb.
Speeding Up the Publishing Process
One of the main differences between the new and old publishing is the data layer. Unlike the old publishing, the publishing service doesn’t talk to the databases via the Sitecore item APIs. However, it uses its own data layer, which only performs databases operations in bulks. The bulk operations improve the performance dramatically by mitigating the network latency problems (see the next section).
Another difference is that most of the processing in the publishing service is done in parallel/pipelined approach. For example, while a batch of items is being evaluated for restrictions, another batch is being retrieved from the database at the same time.
The new publishing service also utilizes in-memory indexed trees, which capture the state of the source and target databases at the start of each publish job. Those indexes allow the service to do as much work as possible efficiently (in memory) without talking to the database.
Network Latency and Old Publishing
Network latency can have a negative impact on the performance of any client/server communication. The more operations you can do in one call, the less network latency you are going to experience.
In a typical setup where Sitecore and SQL servers are located in the same network, the average network latency is about 18ms. Here are some stats about publishing the default Sitecore database in such setup using the old publishing.
Total number of items is 4628 items (default Sitecore database)
Total number of database calls (64,079) (monitored using MS SQL Profiler)
Total time is 22 minutes
Total latency = number of database calls * network latency = 64,079 * 18ms = ~19 minutes
This is 87% of the total publishing time (wasted doing nothing!)
Can the Total Latency be Estimated?
Yes, it can. Here is how to do it:
By profiling the database calls and analyzing the Item APIs operations, Sitecore does the following database calls per item/language/target
- One call to the links database
- One call to select item and fields from the source database
- One call to delete the fields from the target database
- One call per field to insert into target database => this is 8 standard fields + item custom fields
- One call to update/insert item into target database
So, the total number of database calls (Tc) can be estimated as follows:
Tc = Languages * Targets * ((12 + avg. number of custom fields) * number of items)
Total Latency = Tc * Latency
You can use the above equation to find out how much the network latency is affecting your existing publishing process. If it’s a lot, start using the new publishing service straight away 🙂
First, let’s see how fast the publishing process can be if you switch to the new publishing service.
The following is a comparison of the speed of the new vs the old publishing using different network setups. The test involved publishing the default Sitecore database and was repeated multiple times with different locations, and here is the result.
As the results suggest, the network latency is mitigated, publishing time went down from hours to seconds!
Please note that, the test is done using the default bulk operations batch sizes, and there is still a room for more performance tuning by tweaking the service configuration if needed.
- Customers with large datasets who have multiple geo-located target databases will get the most benefit from using the new publishing service.
- The new publishing service is cheaper to run as it consumes less hardware resources and it doesn’t require buying a new license.
- No need to use SQL replication to move data across multiple geo-locations.
- The new publishing service can run as Azure cloud application, again .. cheaper than using VMs.