When implementing a new system it is common to make high level assumptions about how the system will be used, based on the information provided by the business community on how they expect the system to be used. This form of input tends to be fairly high level, such as the number of transactions per day that they expect once the system is fully operational. This is a strong start point for working out the likely level of demand that will be placed on the system, which is a process that involves a set of estimates and inferred values in order to arrive at a likely behaviour profile. These figures are then used as the basis for performance and load testing of the system. Consider the future briefly though, since this is also when a little forethought in the specification and design of the original system can deliver a huge benefit.
When a system has been implemented it is likely to either require upgrade or to experience performance problems at some point, and then the basis for modelling can be much stronger. There is a system in place with real users performing real transactions, and so detailed information should be available. This is especially important when investigating performance problems. This is because how often certain operations occur, and what else was happening on the system at the same time, can be critical in understanding the likely root cause.
It is possibly surprising how rare it is to be able to gather meaningful data from a system on how it is being used, and the experience that users are having from the system. Without this being considered early in a system’s design, or in the evaluation of a product, it is often the case that information on the system’s usage in production is either completely unavailable or only available indirectly using analysis techniques that are difficult to produce. There are a significant number of ways to make this information available, but it has to be implemented and tested in the original system design. This data gathering is very difficult to retro-fit to a system, but generally easy to add during development.
So what should the requirements for a system be?
The following is a starting list:
- Information should be recorded about all operations that users perform, and the experience they obtain from those operations. This should not only be limited to where an error is experienced.
- The system should have standard reports from this data that provides key metrics on a regular basis. These metrics should be directly used to monitor the service level, and used to detect if there are problems in the system as early as possible.
- Such a data gathering system can generate large volumes of data, and so a data archiving and management mechanism must be in place. Without this it is likely that the growth of monitoring data will itself cause performance problems in the future.
- The solution for monitoring and reporting on user operations should not negatively impact the user experience.
- It should be possible to recover retrospectively what was happening at a certain point in time on the system, down to user IDs and arguments used in operations.
It is worth noting that having this sort of information will be valuable throughout the delivery of a system anyway, in particular during load testing and performance model calibration.
The next question is how does such information get collected? There are many different way, and the most effective way varies with the technology being used. Where possible the information should be gathered as a side effect of something else. In web systems for example it is sometimes possible to gather this sort of data from server log files. In systems where the code is generated it may be possible to add data gathering into the templates. If all else fails then place timers and system logging throughout the code and make sure you have the necessary log processing tools in available.
If you need help in working out an effective approach to collecting data, or in framing the non-functional requirements necessary to manage the performance of a system then e-mail me. It can be surprising how often having access to this information can help to resolve issues – and not just those in the scope of system performance problems.
[…] year (in this article) I asserted that monitoring of all user operations on an on-going basis was highly valuable. At the […]