In the bulletin so far I have mainly concentrated on the subject of Capacity Management as a whole. On engagements with clients, however, it is quite common for me to be called in once there is already an identified performance problem. The good news is that the best approach to resolving a performance problem that already exists is very similar to the Capacity Management already discussed elsewhere. The bad news, however, is that this means going through similar processes in a much reduced timescale. The chances of managing this are greatly improved if the original system project used a capacity management approach on implementation.
This Managing Performance Problems Presentation was created as an overview of the techniques to use. The best advice is to start with a clear statement on the symptoms that are being experienced. The obvious symptoms will be user complaints that the system is too slow, or in many cases that it is unstable. A system that is performing badly will tend to exhibit near-random instability as different parts of the system experience race conditions or timeout behaviour. This problem statement is important because it is all too easy to find a potential performance issue and rush to solve it. Unless this issue is likely to be the cause of the symptoms being experienced, however, resolving the identified issue will not resolve the symptoms.
When investigating the source of the technical issues, therefore, it is important to relate them back to the symptoms being experienced. To evaluate whether the issue identified is likely to be causing the symptoms experienced will need many of the techniques and data sets from building a capacity plan. For example, let us assume that an issue is located whereby two functions executed on the same data at the same time will cause both executions to hang. This is a performance problem that could be located via stress testing of the system. If you have production logging of the operational environment then it is possible to state whether this is occurring around the time of the reported symptoms. If it is then there is a strong case that this is the root cause of the problem. If that isn’t available then a detailed knowledge of business process and a performance model will allow the probability of this occurring to be evaluated. If it seems probable then there is a weaker case that this is the root cause. If the probability of this happening is very low then place this on a list to be fixed and keep looking for the root cause.
Using this sort of approach it is likely that a resolution to the current symptoms will be identified and a resolution approach found. It is important to then consider whether the capacity management approach being used for a system is sufficient. System performance problems are not like hardware failures – they often develop and build over time rather than having a single point in time cause. If processes are in place to proactively manage the performance of an operational system then it should be possible to identify a potential performance problem before it starts to impact the business. If an IT department does not have this capability for all of the core operational systems then it is worth considering developing it. The development of this capability is usually expensive, but there is also significant business value in having fast IT systems. It is worth looking closely at the business case investing in improving and maintaining the performance of your systems for the business.
If you need help in resolving performance problems that you have identified, or in working out how to improve you performance management processes then e-mail me. Alternatively, if you are involved with or aware of business struggling with slow systems then please feel free to forward this bulletin to them.