We are using Confluence for both our internal Wiki as well as for our external Community Portal. I just came across a very nasty performance bug in the version we are running on our external system. We run 3.2 and the User Search Feature keeps me waiting several minutes each time I search for users. I analyzed the problem and want to give some answers to folks that have similar problems with either confluence or other search features in any type of software. It turns out to be the typical N+1 Query and Requesting Too Much Data Problem Pattern that is responsible for this performance problem.
While writing this article I reached out to a Performance Engineer at Atlassian to confirm this behavior and get some guidance. It seems they were already aware of this problem and a solution for this is on its way with Confluence 3.5 – That’s Good News :-). But now to the problem we ran into and how I analyzed it.
We almost have 25k users on our external Confluence installation. I admit – it is a lot – but Confluence works really well for us. When using the User Search Feature though, I always have to wait more several minute until I get to the empty User Search dialog where I can enter a filter on username or group membership:
When I search for a specific user or users of a specific group (where I know there is only a handful), the response of this dialog again takes about the same time to come back to me.
Analyzing Slow Request Time
We run dynaTrace on our Confluence installation, allowing me to analyze the internals of the request that opens the blank search dialog:
The PurePath shows that the Servlet first executes a query that returns all usernames and then executes the same SELECT statement for the individual usernames. Looking only at the database activity gives as an overview of how many statements are actually executed:
The database view not only verifies my assumption that there is one SQL statement executed for each user in the database. It turns out that the complete user database is queried twice, totaling to almost 50k SQL Statement executions. Remember – this is just opening the dialog that shows no results, just filter options. We can also look at the individual bind variables to verify that a different username gets passed to the SELECT * FROM userbase queries:
When I enter filter criteria, either a username or a group name, I assumed the search would return must faster. It however turns out that – even though the search result only returns the single user or the users in the specified group – it takes almost as long for the dialog to return as when I first opened it. Analyzing the PurePath for the request where I entered a specific username shows that there is one specific query for that username – which should be enough. After that query we can however see all the same queries for each user in the database again – twice!
Looking at all this I see the following problems:
Data is requested when it would not be necessary
When opening the Search Dialog it should not be necessary to execute any query against the user accounts as the dialog comes up with an empty result anyway prompting the user for filter credentials. Also – when querying for a specific username or a specific group it should not be necessary to fetch user details for every user in the database.
Queries are executed twice
There might be some duplicated code or methods that are called twice while processing the page request. If this is really necessary data caching would definitely help here
Lazy vs. Eager Loading (N+1 Query Problem)
The N+1 Query Problem means that a set of object identifiers is requested with one statement (list of usernames in this example). For every object identifier a separate statement is executed to retrieve details. We often see this behavior when using O/R Mappers with Lazy Loading fetching strategy where Eager Loading would make more sense. For more details on this read the O/R Mapper related articles (Hibernate, JPA, …) from Alois Reitbauer
Confluence 3.5 addresses this problem
The online documentation indicates that performance might not be the best if you have a large number of users in the database. As I mentioned in the initial paragraph I reached out to Atlassian to ask for some guidance on solving this problem. They promptly got back to me with the answer that Confluence 3.5 is going to use a new user management system that will significantly improve performance for scenarios like ours. If anybody is interested check out the Development Releases that you can download for testing purposes.
We have written many blogs about these types of problems as we see them with our clients and summarized the Top 10 Performance Problems with follow up links on detailed articles. Any feedback on this blog is welcome. Thanks for reading