Monday, April 7, 2008

White Spaces in HTML Source - Be Careful

Hmm, Interesting. I am going to show you a very interesting figures about white spaces in HTML source of our ASP.NET pages.

I was a little puzzled when I saw this figures and understood that there is no optimization or something in IIS that removes or somehow optimizes this thing.

Suppose we have a page with a following markup:

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title>Untitled Page</title>
</head>
<body>
<form id="form1" runat="server">
<div>
<% for (int i = 0; i < 20000; i++)
{
%>test <%
} 
%>
</div>
</form>
</body>
</html>

The page simply writes word "test" 20 000 times to the output of the page. Note that in this code I did not use indents and formatting for every new nested element as we normally do.

Then I saved the source of the page to the disk and opened "properties" for the txt file, size was 98,1 kb.

Now let's rewrite the page in the manner we normally write, with indents etc. :

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title>Untitled Page</title>
</head>
<body>
    <form id="form1" runat="server">
    <div>
        <% for (int i = 0; i < 20000; i++)
           {
        %>
                                        test
        <%
            } 
        %>
    </div>
    </form>
</body>
</html>

I saved the source and the size iiis .. 1.06 MB !!!

Amazing - just indenting word "test" using several tab symbols gives such a huge difference !

Size of new page is 10 times larger then previous one.

 

Lets see if it actually applies to network traffic. I will use firebug add-in of Mozilla Firefox browser to measure this:

Here is request for first, "optimized" page:

image

And that is the network request for second page:

image

Response time is 5 times faster in first case as well. And this is on my local machine! What if I have to load the page over the internet?

 

I was interested with problem enough to start watching to source code of popular sites. And .. they've got a secret ! It seems they know about this very well.

Just try it yourself - go to the microsoft.com and see the source of the page :) Then go to the google.com and do the the same.

Why this is not well documented fact?

Did you hear about it before?

 

Technorati Tags:
kick it on DotNetKicks.com

3 comments:

Anonymous said...

Just use any compression module out there, and your problem solved and you don't need to worry about white spaces anymore...

Miron
http://www.mironabramsom.com/blog

Tom Pridham said...

The secret is that Browsers have supported GZIP as a transport mechanism for many years. When an http request comes in, one of the header values states whether the Browser making the request supports GZIP (99.9% of requests do). So in Java Enterprise land, I install a filter on the http response, if GZIP is supported, a compressed html document is sent and the end-user browser does the work of unzipping. Google has been using this technology since day 1 with great success.

Kirill Chilingarashvili said...

Thanks Miron, Tom,
for suggestion. I already interested with compressing modules, but did no implement it yet. I use AJAX almost everywhere in my last project and did not have a chance to investigate every aspect of using module with it.
Did you have some problems of using AJAX with compression modules? I mean use compression with UpdatePanels, service calls etc?

Thanks.