Update Nov 27, 2014: Just posted this YouTube video that shows how to easily identify top SharePoint Performance Problems: SharePoint Performance Analysis in 15 Minutes

Have you ever wondered what is really going on under the hood when you index a column in SharePoint? Everybody keeps talking about indexing columns once your lists exceed the 2000 items limit. Knowing that everything in SharePoint is basically stored in one single table made me wonder how indexing individual columns is actually implemented. You might be surprised!

Scenario 7: Boosting performance with indexed columns?

Let’s first have a look at the main table in the SharePoint Content Database: AllUserData is the table that stores all items in any list of the SharePoint site. For backward compatibilty, a SQL View with the name UserData exists that returns data from AllUserData.
Besides the main columns in AllUserData that are used to identify the list, site, version, author, … we also see multiple columns for every possible data type that SharePoint columns can have. We can for instance see 64 nvarchar columns used to e.g.: store text columns, 16 ints, 12 floats, 8 datetime, …

Parts of the AllUserData Table Schema in the SharePoint Configuration Database
Parts of the AllUserData Table Schema in the SharePoint Configuration Database

This table schema makes it impossible to define database column indices on the custom columns of a SharePoint list. Why? Because every list is stored in this table and every list would have different columns with different index settings.

How does the index work?

Once you decide to index a column in SharePoint you will see that the table NameValuePair[_{SQLCollation}]has been filled with the values of the columns that you indexed. Lets have a look at the table schema.

Each indexed column value is stored in this table with a reference to its list item
Each indexed column value is stored in this table with a reference to its list item

The table will be filled with the current column values once you decide to index a column. When editing, deleting or updating values in your SharePoint list – the changed values of the index column will also be updated in the NameValuePair table.

How is the index table accessed?

If you define a view or if you use the SharePoint object model to execute a CAML query the NameValuePair table will be joined with the UserData view in case you specify a filter on your indexed column. I defined a list where I indexed one of my text columns. I also defined a view that specified a filter on this column. Browsing to this SharePoint view now only shows me the items in the list that match my filter leveraging the NameValuePair table. Here is a sample SQL Statement that was executed when accessing my view:

NameValuePair table JOIN's UserData and is used in WHERE clause
NameValuePair table JOIN

You can see the join as well as the WHERE clause that was defined on the indexed column in the NameValuePair table.

What is the overhead?

As we now know – every list item that has an indexed column will create a new row in the NameValuePair table. The table also needs to be updated whenever indexed columns change their value or when items are added or deleted. Having a list with 100000 items and an index column for instance means that you also add 100000 items to the NameValuePair table. And – every time you update your list – NameValuePair need to be updated as well.

Conclusion

The SharePoint Content database schema doesn’t allow indexing in the AllUserData table itself. To overcome this limitation the NameValuePair table was introduced. Using SharePoint indices will definitely speed up selective access to items stored in large lists or when the overall numbers of items in your content database is high. You should however consider the side-effects of indexed columns. Every indexed value is stored in a separate table and those values must be kept up-to-date when items are changed.

As SharePoint is built on the .NET Platform you might also be interested in my latest White Papers about Continuous Application Performance for Enterprise .NET Systems