Does a large database need an index?

It is standard wisdom that if you want a database to perform well then you carefully design a set of indices for the tables. Thus, by careful design of the database tables based on their contents and the common queries you can build a database that performs well. The index structures make such a difference that if they are not used on tables with large numbers of rows the performance is unusable. When this approach works then it works really well. There are problems, however, in environments that can lead to real issues, though I won’t go into detail here.

This is where technologies such as Netezza and its competition approach the problem a different angle. Their solution is to break a data warehouse into pieces spread across large numbers of processing elements. The individual elements then have processing capacity and storage space. They provide a fragment of the answer for each query, and pass their part of the answer back to a process to perform a merge. The fragments aren’t necessarily indexed because the data fragment it runs on can be small enough that it doesn’t provide a performance improvement. The time to process the query improves linearly with the number of processing elements – so pick the number of elements to achieve the required response time. There are many systems that would find this sort of solution very useful.

There are potential pitfalls here too though, especially where there are large numbers of parallel queries occurring, or queries are relatively complex. As usual there is “no silver bullet” and the solution needs to fit the needs of the problem. If you practical experience of these technologies then please share your experience of the approach with me at dh@sarquol.com, or call on +44 7887 536083.

This entry was posted on Sunday, July 15th, 2007 at 9:16 am and is filed under Other. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Sarquol Ltd