If you have read “Principles of Capacity Management” then you should understand the place and value of a performance model. In the last bulletin I made available a Generic Performance model online. The model, however, is relatively complex and may be hard to use without appropriate documentation. In this bulletin, therefore, I am providing a quick tour of the model so as to support its usage. The tour provides a “tab by tab” description:
- Title sheet: Simply a record sheet for information such as the system name and model version number. I do ask you to leave the Sarquol message in place, and to print when keeping the results of the model.
- Function Definition: Allows the definition of up to 30 functions to be modelled. These should usually be at “Use case” level rather than individual operation level. As far as possible keep the functions relevant to the business, so a suitable candidate would be “Generate daily activity report”, rather than a set of functions such as “Click on Submit button of the Daily activity report form”. Each function may have a short “Name” and a longer description. The short names will be displayed elsewhere in the sheet.
- Location Definition: Complex systems are normally geographically dispersed and so the model allows up to 30 locations to be defined. A later sheet allows their network connections to be defined, and so a field is provided to connect each location into the network model. The style is the same as for the function definitions.
- Behaviour Type definition: Different classes of users of the system will behave differently. This sheet begins to separate the user population by defining the different classes of behaviour that are present. Suitable behaviour classes may include “Administrators”, “Call centre staff”, “Customer manager” and so forth. Again there can be up to 30 classes of behaviour, with just a name and description being provided at this point.
- User Section Definition: Further separates the users into up to 30 segments, each with a location and behaviour type. These segments need consideration, since they are used to define the numbers of each user segment, and the working times and usage profile of the system. Thus if there is shift working, for example, each shift may need to have its own user segment.
- Usage profile: These pages define the variation in activity across various time periods, including Intra-hour, Hourly, Daily, Weekly, and Monthly. This allows analysis of the usage peaks within the system, meaning that average behaviour can be compared to peak behaviour more effectively. The intra-hour value is possibly the hardest to estimate, since it requires a ratio between peak and average activity to be entered within an hour. The other forms require a level of activity to be defined across their separation. The simplest way is to define these is to choose a representative function and enter the expected number of these functions across each unit. Note that a “default” can be defined at the top of each sheet, and only modified with individual values where the more detailed separation information is available. It is worth noting that the units of the values are not too important, since the values are normalised before being applied.
- Usage profile – Annual: This page is significantly different from the other usage profile pages. It defines the number of users in each section on a year by year basis. The starting year for the model is also defined on this page, and the expected number of “working days” in a year. These figures are important in “scaling up” the values of the model, and hence in projecting how the system is likely to respond in the future.
- Behaviour Type function usage: This sheet defines the number of times each function will be used each day by behaviour type. Thus the behaviour types are listed in the rows and functions in columns. The number does not have to be an integer. If, for example, a function was used once a month then a suitable value might be (1/30) or 0.033. Because the figures are used in the scaling up process these fractions of a use per day can have a significant impact on overall performance.
- Server Definition: This is used to define the servers present in the overall system architecture, including where they are placed and their relative capacity compared to a “baseline” server. The relative scales can be used to provide an estimate of the performance that will be achieved where the power of test machines varies from those used in production. If the values are not filled in they will be assumed to be “1.0”, meaning that the modelled system is equal in capability to the calibration system.
- Service Characterisation: This is where the model starts to get distinctly technical. It defines the amount of CPU and disk activity that each of the functions uses. The data entry is done in parts: Each function is assigned a “Usage Profile” for each server, and a count of the number of times the profile is invoked. The profiles are then defined in terms of the resources that they use: CPU, Disk and network to and from the server.
- Daily function demand growth and Estimated Function demand: These are intermediate calculation sheets, used to estimate how users demand the various functions, and thus how function demand varies over time. This includes the average daily function usage, and following this through to average and peak operations per second.
- Annual Function Demand: This calculation sheet provides the number of functions per year that are estimated to be performed by the user populations. To this also added up as a cumulative function usage year-on-year.
- Data Definition: This allows the key data in the system to be defined, along with any initial volume and the way that it gets accumulated with function usage. Growth of data volume can be a significant factor in the degradation of performance over time, and this allows this effect to be estimated.
- Daily Data Growth by function and Data Growth: These calculation sheets estimate the data grow over time, based on the Annual function usage and the Data definition sheets. The figures are primarily used as intermediate information in other calculations, but can be of use themselves in various circumstances.
- Resource utilisation factors: This page provides technical calibration data. The model uses a quadratic approximation to estimate the degree of performance loss caused by resource utilisation. It is worth noting that estimated utilisation over 100% is allowed by the model, but will cause significant performance degradation. This is where the model differs significantly from a queuing theory based model, which would not provide an estimate in these circumstances. The meaning of this and the justification for using it is a topic in its own right. The figures provided by default, represent a reasonable first approximation based on past experience. Improving on these figures is probably best approached from detailed load testing data.
- Network definition: Provides the outline of the networking that the system uses, in terms of the nodes used and the segments that connects them. This includes a definition of the routing used between different network nodes, the capacity of the network segments and any background utilisation of the segments.
- Network loading and Network service times: These sheets are used to estimate the way that the network segments will be utilised, and hence the likely service times to be exhibited by the different network segments. These are used as intermediate calculations, but can be important information in its own right. If the peak utilisation of a network is estimated to be high then this would be a serious bottleneck in a system. The meaning of “high” in this context varies with network technology – an Ethernet based system, as an example, begins to fail above about 80% utilisation.
- Server demand, server loading and server service times: Further intermediate calculations that estimate the likely demand on the different servers that have been defined within the architecture, and thus the loading that the servers would experience.
- Performance Summary: Finally the likely performance of the application at a key location is provided as an estimate. These figures are in terms of the estimated “best”, “average” and “peak” response times that users will experience over time. Only one location is presented in order to rationalise the level of data presented. Other locations can be examined by changing the “key location” and then re-calculating the model.
As mentioned in the last bulletin, the model is complicated. It needs to be to be able to support useful modelling of performance in a variety of different scenarios. Used appropriately, however, it can aid in the effective management of the capacity of a system.
If you need help in managing the performance of a complex system, or can not readily understand how the use of modelling might aid you then please e-mail me. There are many resources that can help, including those that I have provided in these bulletins, but when an issue is urgent there is not always sufficient time to develop the necessary skills yourself. This is where specialist expertise can add significant value in accelerating an organisation along the learning curve.
[…] model would be more useful if it was better documented. The is a tour of the model available (here), but that doesn’t make it clear how to apply the model ina typical usage scenario. I have, […]