Optimizing HTML

Optimizing HTML

Creating good HTML will improve both the speed and search engine ranking of a website.

If we take advantage of how data is sent across the internet and how modern browsers render HTML files we can improve the speed even further.

What is good HTML?

Good HTML is using the correct elements to mark up the data being displayed. It goes beyond using syntactically correct markup with properly closed tags and doctype elements at the top of the file, and looks at what the data is and how it should be presented.

To demonstrate this, the following code shows good syntax but has a number of problems:

01. <?xml version="1.0" encoding="UTF-8"?>
02. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
03. <html xmlns="http://www.w3.org/1999/xhtml">
04.
05.   <head>
06.     <title>Demonstration of Bad HTML</title>
07.     <!-- File header data goes here -->
08.   </head>
09.
10.   <body>
11.   <!-- body data goes here -->
12. 	 <div class="the_page" id="page">
13.
14.
15.       <div id="header">
16.         <!-- page header data goes here -->
17.         <h1>Demonstration of Bad HTML</h1>
18.       </div>
19.
20.       <div class="the_content" id="content_wrapper">
21.         <!-- content data goes here -->
22.
23.         <h2>Shopping list</h2>
24.
25.
26.         <p>
27.           Please buy the following, dont forget the milk!
28.           <br />
29.           <br />
30.           Cheese is on offer at the moment
31.         </p>
32.
33.         <div id="list_wrapper">
34.
35.          <!-- a list will go here -->
36.          <ul id="the_list">
37.
38.      	   <li>
39.      	     <span class="list_number">1.</span>
40.               Milk
41.      	   </li>
42.
43.      	   <li>
44.      	     <span class="list_number">2.</span>
45.      	     Cheese
46.      	   </li>
47.
48.      	   <li>
49.      	     <span class="list_number">3.</span>
50.      	     Eggs
51.      	   </li>
52.
53.      	   <li>
54.      	     <span class="list_number">4.</span>
55.      	     Tomatoes
56.      	   </li>
57.
58.       	  </ul>
59.
60.       	</div><!-- end list_wrapper -->
61.
62.         <p>
63.           So what is wrong with this <span class="ABR" title="Hyper Text Markup Lanuguage">HTML</span>?
64.         </p>
65.
66.       </div><!-- end content_wrapper -->
67.
68.     </div><!-- end page -->
69.
70.   </body>
71.
72. </html>

filesize: 1.6Kb

At first glance there appears to be nothing wrong, however closer inspection shows

  • There are 2 <br /> tags on lines 28 & 29, the second being added to create extra space between the 2 lines of text. This is completely unnecessary and the same effect can be achieved through setting the height of a single <br /> in a css file
  • The <ul> is wrapped in a <div> on lines 33 & 60. Wrapping a block-level element within another block-level element simply adds code to the html and is only necessary when we need to associate 2 or more elements e.g. In this example we might have had a To-do list as well as a shopping list.
  • <span> is being used on lines 39, 44, 49 & 54 to add a level of order to each item in the list. Whilst a human reader would understand this, a search engine would simply ignore this. For search engines to understand order in lists, an <ol> or ordered list would need to be used. Ordered lists can be styled to add a number before the content, so we can lose the <span> tags.
  • On line 63 a <span> is used to add meaning and style to the letters HTML. Whilst this would allow the user to mouse over these letters and see that HTML stands for Hyper Text Markup Language, it would take a lot more styling to provide visual clues to the user to understand this. A search engine wont necessarily differentiate this with any other content wrapped in a <span> or understand that this is an abbreviation. An <abbr> tag would be automatically styled by a browser to provide visual clues to a user and search engines will understand this.
  • On line 63 inline styles are being used to style the <span>. A tidier option is to put all styling within a separate file, as it separates content from design. The benefits are: search engines can spider the content without having to wade through style data, and the site will load and render faster as simultaneous requests for content and styling can be made.

Below is the good HTML and a style sheet:

01. <?xml version="1.0" encoding="UTF-8"?>
02. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
03. <html xmlns="http://www.w3.org/1999/xhtml">
04.
05.   <head>
06.     <title>Demonstration of Good HTML>/title>
07.     <!-- File header data goes here -->
08.     <link rel="stylesheet" type="text/css" href="style.css" />
09.   </head>
10.
11.   <body>
12.   <!-- body data goes here -->
13.
14.     <div class="the_page" id="page">
15.
16.       <div id="header">
17.         <!-- page header data goes here -->
18.         <h1>Demonstration of Bad HTML</h1>
19.       </div>
20.
21.       <div class="the_content" id="content_wrapper">
22.         <!-- content data goes here -->
23.
24.         <h2>Shopping list</h2>
25.
26.         <p>
27.           Please buy the following, dont forget the milk!
28.           <br />
29.           Cheese is on offer at the moment
30.         </p>
31.
32.         <!-- a list will go here -->
33.         <ol id="the_list">
34.
35.      	   <li>
36.      	     Milk
37.      	   </li>
38.
39.      	   <li>
40.      	     Cheese
41.      	   </li>
42.
43.      	   <li>
44.      	     Eggs
45.       	  </li>
46.
47.       	  <li>
48.       	    Tomatoes
49.       	  </li>
50.
51.         </ol>
52.
53.         <p>
54.
55.           So what is wrong with this <abbr title="Hyper Text Markup Lanuguage">HTML</abbr>?
56.         </p>
57.
58.       </div><!-- end content_wrapper -->
60.
61.     </div><!-- end page -->
62.
63.   </body>
64.
65. </html>

style.css

01. /* style.css */
02. br {
03. height:25px;
04. }
05.
06. abbr {
07. font-weight:bold;
08. }

Filesize: 1.4Kb, (the size of the css is 47b)

As we can see in any brower, the list is displayed in the same way, and the abbreviation HTMLhas visual clues to mouseover so the meaning can be displayed.

 

So what has been achieved by writing good HTML?

  • File size reduced by 7 lines of code or 0.2kb or approx 15%. This is only a small file but when this is scaled to a real website, this a significant value.
  • Web browsers now display the content differently
  • Search engines understand the meaning of the content better. The new page is going to have a better search result than the original.

Optimizing techniques

Comments:

Comments in code are fantastic tools for software developers, they allow the programmer to deduce in natural language what the program is doing.

However in software development, they are not of any benefit in the final program and are ignored by the compiler (the tool used to turn the program code into a working program). In fact many C and Java programmers go to great lengths to remove them prior to releasing the final program.

In Web development, comments are ignored by browsers and search engines, but they are often left in by web developers.

Removing them from the example reduces the file size of the good html from 1.4kb to 1.1kb. This is a reduction of approx 17%.

Whitespace:

Using white space in development allows the programmer to see how each html element relates to others around it. However web browsers and search engines don’t need or use whitespace to process the relationships between elements or to render content.

To understand this we need to consider how files are transmitted across the internet and how browsers process html tags.

  • Text files are transmitted as streams of characters (1 character at a time)
  • Web browsers intercept these streams and attempt to render them by recognizing tokens and printing them to the display.
  • Web browser finds a token that tells it to render html, whitespace characters are simply dropped, everything else is either an instruction to process the html or content to be displayed.
  • As tokens are being read, they can be processed straight away. This can be seen in the browser by a page being revealed from top to bottom (albeit very fast).

Removing whitespace from html files reduces file size and increases download & processing speed, allowing users to see the page quicker and search engines to spider the site faster.

As most modern websites are generated from server scripts such as php or ruby, removing whitespace is very easy to do but often neglected by developers

If every whitespace character is a byte, the example file will reduce in size from 1.1Kb to 0.7Kb or approx 37%. This is approx 55% of the original example.

Advanced Optimization:

By shortening the names of variables used in the html or using a url shortener such as bit.ly to shorten external links further optimizations can be made.

The final listing shows variable shortening with partial whitespace removal

01. <?xml version="1.0" encoding="UTF-8"?>
02. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
03. <html xmlns="http://www.w3.org/1999/xhtml">
04. <head>
05. <title>Demonstration of Bad HTML>/title>
06. <link rel="stylesheet" type="text/css" href="s.css" />
07. </head>
08. <body>
09. <div class="pg" id="pg">
10. <div id="hdr"><h1>Demonstration of Bad HTML>/h1></div>
11. <div class="cntnt" id="cntntWrpr">
12. <h2>Shopping list</h2>
13. <p>Please buy the following, dont forget the milk!<br />Cheese is on offer at the moment</p>
14. <ol id="lst">
15. <li>Milk</li>
16. <li>Cheese</li>
17. <li>Eggs</li>
18. <li>Tomatoes</li>
19. </ol>
20. <p>So what is wrong with this <abbr title="Hyper Text Markup Lanuguage">HTML</abbr>?</p>
21. </div>
22. </div>
23. </body>
24. </html>

File Size: 0.8Kb (full whitespace removal 0.6Kb)

Conclusion

By using the right elements to get the desired result, and optimizing we can obtain these benefits.

  • File size is reduced significantly.
  • Download and rendering speed increase.
  • Development time reduces as there is less work to get the desired effect.
  • Search results improve as search engines know how to handle specific elements better than generic elements.

If you like this article, and especially if it has helped solved a problem for you, we would be grateful if you can either share this article, like this article, or link to this page so that others may enjoy this too.
Thank You.