Week 15 – Optimizing Data Intensive Web Pages by Example

Lately I was checking out ShowSlow. The site is really great. It combines YSlow and PageSpeed metrics and visualizes them in a really nice way. When I clicked on the URLs Measured Tab I had to wait quite some time until the page finished downloading. While this page is really displaying a lot of information, I still wondered why it takes so long to load. I then used Dynatrace AJAX Edition to analyze the page. Then the reason for the long load time became obvious. As you can see below the page has an overall download time of 17 seconds which results from a total size of 5 MB of HTML data.

Metrics for the All URLS Tab in ShowSlow
Metrics for the All URLS Tab in ShowSlow

Although the page is really displaying a lot of data, the size still seemed too big for me. I then took a closer look at the HTML source and found a lot of inline style definitions like the one shown below.

<td style="padding-left:10px; overflow: hidden; white-space: nowrap;">

As the table displays a row for each URL measured it gets really big, resulting in a very high number of style definitions. In order to better understand the impact I wrote a little Java program which detects every inline style definition and counts the occurrences and calculates the total size. The table below shows the result for this page:

Style Definition Occurrences
Total Size
width: 55px; height: 0.7em; background-color: #C6EE00 27 1431
width: 52px; height: 0.7em; background-color: #EEEE00 17 901
width: 68px; height: 0.7em; background-color: #9FEE00 98 5194
width: 73px; height: 0.7em; background-color: #77EE00 203 10759
width: 88px; height: 0.7em; background-color: #28EE00 314 16642
float: right 1 12
width: 91px; height: 0.7em; background-color: #28EE00 249 13197
width: 1px; height: 0.7em; background-color: #EE0000 1 52
width: 72px; height: 0.7em; background-color: #77EE00 209 11077
padding-left:10px; text-align: left 1 35
width: 57px; height: 0.7em; background-color: #C6EE00 39 2067
width: 80px; height: 0.7em; background-color: #4FEE00 299 15847
width: 67px; height: 0.7em; background-color: #9FEE00 120 6360
width: 77px; height: 0.7em; background-color: #77EE00 363 19239
width: 86px; height: 0.7em; background-color: #28EE00 228 12084
text-align: right; padding:0 10px 0 10px; white-space: nowrap; 9280 575360
width: 59px; height: 0.7em; background-color: #C6EE00 49 2597
width: 49px; height: 0.7em; background-color: #EEEE00 10 530
width: 63px; height: 0.7em; background-color: #9FEE00 68 3604
width: 71px; height: 0.7em; background-color: #77EE00 170 9010
width: 44px; height: 0.7em; background-color: #EEC600 1 53
width: 76px; height: 0.7em; background-color: #77EE00 270 14310
width: 50px; height: 0.7em; background-color: #EEEE00 4 212
width: 81px; height: 0.7em; background-color: #4FEE00 312 16536
width: 93px; height: 0.7em; background-color: #28EE00 232 12296
width: 65px; height: 0.7em; background-color: #9FEE00 96 5088
width: 95px; height: 0.7em; background-color: #00EE00 158 8374
width: 90px; height: 0.7em; background-color: #28EE00 220 11660
width: 99px; height: 0.7em; background-color: #00EE00 195 10335
width: 89px; height: 0.7em; background-color: #28EE00 249 13197
width: 53px; height: 0.7em; background-color: #EEEE00 17 901
width: 79px; height: 0.7em; background-color: #4FEE00 304 16112
width: 75px; height: 0.7em; background-color: #77EE00 216 11448
width: 62px; height: 0.7em; background-color: #C6EE00 74 3922
width: 64px; height: 0.7em; background-color: #9FEE00 87 4611
width: 56px; height: 0.7em; background-color: #C6EE00 23 1219
width: 96px; height: 0.7em; background-color: #00EE00 188 9964
width: 87px; height: 0.7em; background-color: #28EE00 217 11501
width: 54px; height: 0.7em; background-color: #EEEE00 24 1272
width: 92px; height: 0.7em; background-color: #28EE00 256 13568
width: 94px; height: 0.7em; background-color: #00EE00 162 8586
width: 66px; height: 0.7em; background-color: #9FEE00 123 6519
width: 60px; height: 0.7em; background-color: #C6EE00 58 3074
width: 97px; height: 0.7em; background-color: #00EE00 360 19080
width: 101px; height: 0.7em; background-color: #00EE00 313 16902
width: 51px; height: 0.7em; background-color: #EEEE00 22 1166
width: 98px; height: 0.7em; background-color: #00EE00 234 12402
width: 82px; height: 0.7em; background-color: #4FEE00 302 16006
width: 47px; height: 0.7em; background-color: #EEC600 3 159
width: 70px; height: 0.7em; background-color: #9FEE00 153 8109
width: 85px; height: 0.7em; background-color: #4FEE00 233 12349
width: 100px; height: 0.7em; background-color: #00EE00 357 19278
width: 40px; height: 0.7em; background-color: #EEC600 1 53
width: 61px; height: 0.7em; background-color: #C6EE00 72 3816
width: 69px; height: 0.7em; background-color: #9FEE00 155 8215
background-color: silver; width: 101px 9280 352640
width: 46px; height: 0.7em; background-color: #EEC600 3 159
width: 83px; height: 0.7em; background-color: #4FEE00 325 17225

This means that about 2 MB of the total 5MB are used for inline style definitions. Some of the definitions are repeated more than 9000 times. All the definitions of width, height and background-color are used for the small bar charts you see on the page. Normally one would never put them into a style definition; however as you can see in this case it makes sense to download all possible bar chart styles and then just use classes. The major contributors however are the styles background-color:silver;width:101px; and text-align: right; padding:0 10px 0 10px; white-space: nowrap; definitions which make up nearly 1 MB of text.

I then took a close look how many URLs this page is displaying. Obviously ShowSlow really has become a victim of its own success – it is over 9000 URLs. So, did they do something wrong? Actually what happened is quite typical for an iterative development approach. You build an application that satisfies certain requirements and take care of further requirements later. Additionally, you normally don’t expect to be so successful so fast. So let’s now look how we can optimize this page.

Externalizing Styles

As I always feel challenged to optimize performance, I was thinking of a way to improve this page. I extended my little program to replace all inline style definitions. I created an additional style tag and added the required class definitions. The style declaration for 64 style declarations make up 4 KB of text and the class attributes for the HTML elements still make up 553 KB. This is simply because of the large number of elements. Nevertheless I achieved a reduction in download size of about 1.7 MB. We can additionally put the style definition into a separate file which will be cached. This will reduce download size even more.

Another interesting side effect of this change is the reduction in layout calculation time. As you can see below the change also reduced the layout calculation time from to 2.8 to 2.5 seconds.

Load Time Comparison with CSS Classes (ShowSlowFast)
Load Time Comparison with CSS Classes (ShowSlowFast)

Content Only

While this is a quite good improvement; I was not yet satisfied. Well, we could implement paging here; however, I felt much more challenged by finding a way to make the page faster while still sending all data.

So what can we do? Looking at the problem, most of the page content is caused by the large number of URLs. Below you can see the already streamlined HTML code for one single URL.

  <td class="class4">B (83)</td>
 <td><div class="class5" title="Current YSlow grade: B (83)"><div class="chart84"/></div></td>
 <td class="class4">B (84)</td>
 <td><div class="class5" title="Current YSlow grade: B (84)"><div class="chart85"/></div></td>
  <td class="class1"><a href="details/?url=http%3A%2F%2Fwww.amazon.com%2F">http://www.amazon.com/</a></td>

The actual payload however is rather low. It is the URL, the YSlow and PageSpeed ranking information. We see a lot of redundant text here like Current YSlow grade; and also the URL is written twice – one time for the link and one time for the link text. So obviously we send much more data than we need. What can we do to avoid this? Obviously the way to go is to just send the payload and generate the markup dynamically.

Following this approach I extract all payload as JSON content and put it into a JavaScript file. The results look very promising. The actual HTML page now only has 12 KB and the JSON is 863 KB. While this is still a large amount of data we managed to come below 1 MB compared to the initial 5.2 MB. Below you can see a sample JSON record that was created

{"yslow":"B (83)",'pagespeed':"B (84)",'text':'http://www.amazon.com/'}

The problem we now have to solve is how to populate the table. If we would write all rows immediately via JavaScript this will definitely make our web page completely unresponsive. So I decided to render only the first 200 rows and then dynamically render an additional 100 rows when the user scrolls to the end of the page

The important part here is how we insert these rows. If we do it in an un-clever way it will take forever and the browser will freeze. So we better do it in a clever way :-). The fastest approach is to create a String containing all markup for the rows and then insert it into the DOM. We only have to be careful when doing this in Internet Explorer 7.  Andi wrote an interesting  blog post about String concatenation problems in IE.

This works pretty well. The page is now significantly faster and also more responsive from an end-user perspective. The figure below shows a comparison of  all three pages – the original page, the one with CSS class definitions and the JSON approach. As you can see the page using JSON is massively faster. Additionally we see that the JavaScript execution time is much slower. The reason for this is that JavaScript code inserting elements into the DOM now executes much faster.

Overview of Performance Improvement Impact
Overview of Performance Improvement Impact


What are our takeaways? First, avoid repetitive markup on pages as much as possible. In our case, just getting rid of the style definitions helped us to reduce the payload by 1.7 MB. As we have seen this is especially important for data-driven sites, as each additional byte of content multiplies by the number of data items presented.

Secondly, you have to be careful with the markup-to-actual-data ratio. In this case it was rather low. Admittedly this will be different for other web pages; however, you always should keep this in mind. Especially for data-driven services you should consider only sending the actual data and build the markup on the client side. In this case you additionally have to be careful regarding JavaScript execution time.

From an application design perspective this means we have to separate our application into the classical three MVC layers.

  • The model layer – meaning the “pure” data – which in our case was put into the JSON file.
  • The controller layer – which in our case is the HMTL page itself and the JavaScript logic to dynamically manipulate the DOM.
  • The view layer – in our case the style definitions – which are separated from all the other two layers.

Especially in web applications mixing up these layers easily results in bad performance – in addition to other problems like bad maintainability.

If you want to read more articles like this visit dynaTrace 2010 Application Performance Almanac

If you want to learn more about the fee Dynatrace AJAX Edition, please visit /topics/ajax-edition/

Alois is Chief Technology Strategist of Dynatrace. He is fanatic about monitoring, DevOps and application performance. He spent most of his professional career in building monitoring tools and speeding up applications. He is a regular conference speaker, blogger, book author and Sushi maniac.