Trade Simple November 2011 Product Update - Part 3

Infrastructure Activities

Database housekeeping/management:

For many months, the infrastructure team has been dealing with serious issues around the lack of space available for the trade simple databases on both the Hospitality and eparts platforms (see also Document Repository management below). The following steps were taken to address these issues:

  1. All non-critical data was identified and purged from the system. This was done out of hours to minimise possible performance issues. Ongoing housekeeping procedures ensure that this a continuous process

  2. This freed-up hundreds of Gigabytes within the database. For comparison, approximately 1GB (109 bytes) is required to store about 1,000 average-length novels in English or 7 minutes of HDTV!

  3. Unfortunately,however, SQL Server did not make this free space available on the disks for other databases to use — so the problems continued

  4. After much research and testing the Infrastructure Team now understand far more about why this was and have now begun a project to really free-up all of this space

This has removed a huge issue from the trade simple platforms.

Message-retry management:

The Trade simple hospitality platform currently processes approximately 25 million separate messages per year. Most of these are successfully handled but — for various reasons — a percentage of these messages fail to be processed first time.

In most cases, any failing message will automatically be retried 10 times by the trade simple engine before being moved to an error queue and flagged up to the Message Monitoring Team, via their console, for action. The trade simple engine currently uses Microsoft's Message Queuing services (MSMQ) to manage all message-processing, including these retries.

However, there seem to be two key problems with this approach:

  1. A bug in the current functionality denies the team the flexibility that they need in order to configure and manage parameters, like the number of retries and the retry wait interval, to suit particular categories of failure. This one-size-fits-all constraint is very limiting

  2. The retry functionality creates a huge additional load on MSMQ and the associated servers and it is suspected that it is a direct contributor to certain regular live support issues that occur on trade simple

Therefore a significant rework of this retry functionality to move from using MSMQ to a purpose built solution using the SQL database has been completed. This has allowed the removal of a third of all existing queues (581 now remain) and reduced the number of polls of these queues by half, down by approximately 24 million polls per week. In addition, retry intervals can now be configured for delays meaning reports will get automatically reprocessed — a great step forward for the support team.

The team introduced this to live on 17 November and things are looking good.

Performance improvements:

This is an ongoing activity but a recent example shows the difference it can make:

  • A single commonly-used stored procedure was discovered to be responsible for about 70% of all database reads in an average day!

  • Initial work has already improved this and it is now using less than 10% of the previous resource

  • More improvements are in the pipeline…

Catalogue functionality split:

Nearly 4 years ago we introduced the concept of the ‘new’ trade simple catalogue functionality and database. This was a wholesale rewrite of the Online Ordering and Catalogue Management functionality using .NET. Not only was this easier to support and manage but the usability and performance of this new functionality was a huge improvement on many aspects of the old one.

All new trade simple customers were put on to the new functionality and we have also migrated most of the existing customers over to use the new website functionality as well. However, we have still not removed the dependency of these customers on the ‘old’ catalogue database and functionality for the processing of their messages. Therefore, Fourth has not been able to realise the full benefits of these changes for our management of the platform.

However, the work to complete this separation is currently under way and should result in both performance improvements around the processing of messages and further reduction of the load on the Hospitality database servers.

Document Repository management:

In order to provide the high degree of resilience that trade simple offers to our customers, as well as its significant database requirements, it also needs to retain a huge repository of documents and document versions on disk.

It has been a constant challenge over the past few years for the Infrastructure team to manage these repositories. They were always struggling to find hard disk space for them to grow into and, more significantly, the sheer amount of data made it very complicated to back it up for business continuity purposes.

Due to some key enhancements, we now have the ability to create multiple document repository archives as required. This means the team can now actively manage them by, for example, introducing new storage when the existing capacity becomes close to full. It also means that every archive can be easily and (relatively) rapidly backed up for off–site storage.

There is still work to do here but this is a huge step forward.

Message monitoring and management:

A project for the future: Nick wants to look at evolving the next generation of trade simple message-monitoring and management tools. These would be focused around removing the warnings or notifications that are provided today and concentrating on the actual alerts that need action. This would eliminate the need for a human being to be constantly watching the monitors, which in turn would allow the automation of a lot of the alert handling and greatly improve the out-of-hours service that the platform can provide to the customer.

Comments

Popular posts from this blog

Trade Simple December 2011 Product Update

Trade Simple February 2012 Product Update

Trade Simple November 2011 Product Update - Part 2