Do you have a PHP application running and have to deal with inconveniences like lack of scalability, complexity of debugging, and low performance? That’s bad enough! But trust me: you are not alone!
I’ve been developing Spelix, a system for cave management, for more than 20 years. It originated from a single user DOS application, and has now emerged into a web application with hundreds of users, used nationwide as the official cave directory in Austria.
Recently I applied for a job at Compuware. For a presentation during my job interview I prepared a case study about how to monitor and improve performance of Spelix with dynaTrace, a tool in Compuware APM’s suite. I found more hotspots than expected, and it was much easier than expected to resolve them. I also killed two birds with one stone: Spelix is really performing now and I’ve got a cool, new job!
Let me share with you my experiences in that process and the best practices I’ve applied to bring spelix.at to its current stage.
The Challenge of Software Evolution
In Spelix I have identified 6 major scopes for performance optimization:
- Server Side Data Caching
- Client Side Data Caching
- Network Traffic
- Browser / CDN Cache
- Server Side Session Handling
To be more digestible, this blog is split into two. This post focuses on the first 3 performance improvement steps and the next one focuses on the last 3.
Step #1: Optimize Database Access
In the early stages of an application, it may not be relevant how your database access is designed. As long as the amount of data is low, a poorly designed query or missing indexes may not really affect overall response times. Therefore, database performance is rarely a topic that comes up in many cases. Once it has become an issue, it may be rather complicated to be handled. It’s important to place enough value in your database design right from the start. Here are some of my lessons learned and best practices:
Interaction of Views & Indexes
I don’t want to get into too much detail about creating indexes as many other articles have covered this. But it’s rather important to understand when an index is not used: be careful when using views in MySQL!
When should you use views? Views are perfect to create complex queries and store them for further use. Views are commonly used to prepare data for presentation, or even for data pre-selection based on user access rights.
When should you avoid views? While a WHERE clause on a simple view may cause an index to be used, this could fail in complex view, even though the WHERE clause is on the primary key for the primary table. Once your query gets too complex, MySQL creates a temporary table for the result of the view, and then applies the query on top of the view, without any indexes to be used. Be alert when your view contains commands like GROUP_BY, ORDER_BY, or UNION. So what? My key advice on this is: when you create a view, define possible WHERE clauses and check the execution plan in the database by using the EXPLAIN command. If your WHERE clause is on a column in a table marked as select type “primary”, you are on the save side to use the view. If it’s “derived” or “dependent subquery”, your query might not use existing indexes. It could be better to execute the query code from the application. If you have executed the SQL statements directly from your business logic, create a data access layer that contains your query code. Consider executing multiple SQL statements and merge the data in your PHP code rather than using complex joins that may spoil your indexes.
Check and Eliminate Redundant Statements
Check if your SQL executions are really necessary! You might have executed your statement in an earlier stage, is there really a requirement to run it again? Would it make sense to keep the data in your current context instead of re-requesting it? The following screenshot shows the database statements executed by the PHP Application.
What to look for? Find your query invocations in your code and avoid repetitive executions.
Optimizations? Depending on the type of information, consider caching your data in your transaction, session storage or overall server side cache, as described in the next section.
Step #2: Use Server Side Data Caching
Why always connect to the database, join multiple tables and create the final data structure in your application? Just build up your structure once and store it in memory. No matter what technology you use, APC, Memcache, Redis, et cetera, the principal is always the same: you check the store in memory for existence of your data. If found, just use them. If not, execute your database queries to retrieve the data and store it in cache. Not only do you get much faster access to your data, you even avoid blocked database access due to too many parallel queries.
As an example, in Spelix we had a recurring statement to get a user’s real name for a given userid.
Even within one request, that SQL got executed multiple times, for different userIDs (creation, modification, approval, etc.). By keeping the userIDs together with the user’s real name in the application cache, we were able to avoid quite a number of database requests.
Tip: It’s not just data that can be cached, you can even render complete sections of your webpage in HTML and cache the result.
Data caching is a rather easy process to be integrated into your data access layer. All the different technologies offer convenient APIs for integration.
Step#3: Client Side Data Caching: Utilize HTML5 LocalStorage
An essential function of Spelix is to organize caves into regionally limited cave groups and display the caves and group data on a map. The base data for most of the caves in the archive do not change regularly, the group data practically never. Querying for these data and transferring to list to the client for map display is quite redundant.
It was quite a performance boost to store these data in the browser’s Local Storage, which was introduced in HTML5, and to use the local data to display the caves and groups on the map. In addition to the data we store a timestamp, when then the last synchronization with the server occurred, and load only those data that have changed since then.
Thus the system can display data for the chosen map extent in a reasonable time while checking for updates in the background.
Tip: Cache output data that very infrequently changes. In my case this dramatically improved performance!
Want to learn more?
Watch out for part 2 of this series, which will cover networking, browser cache and session handling.
Meanwhile, start tracing your application and identify the hotspots that might cause bad performance measures. Download our free trial of dynaTrace or the free dynaTrace Ajax edition and start monitoring your performance. And don’t forget to share your experiences with us in our forum!