Sample page using semantic mark-up as viewed in Internet Explorer 6. As you can see, the banner breaks in IE 6 but everything else looks pretty much as expected. The page also breaks in IE 7 and IE 8 though the IE 8 version looks different. I've included these samples as a reminder to always check sites in multiple browsers.
The other day, Aaron Rester posed this question on Twitter:
Webbies: any advice on explaining semantic HTML to non-webbies?
I wrote back that I usually show them some source code and walk them through things like using h1 and so forth for headers. That is what I usually do. In fact I'd just done that the other day when I was showing someone the changes I would recommend for search engine optimization (SEO).
But the question made me wonder if my explanations were adequate. If someone doesn't know anything about HTML or other mark-up languages, will such an explanation make sense? Or is there a better way to explain the differences between structural and presentational mark-up to clients and new Web designers?
Typically this topic comes up most often when one is discussing issues related to search optimization and accessibility. If a client needs to enhance their site for SEO, I may be recommending changes to the code that they won't even see when looking at the page in their browser. Understandably they will want to know why they should pay me to do things to their site that they won't notice. The changes we make may not be visually apparent, but they will convey additional information to Web browsers and search engines that can aid accessibility, usability and searchability.
Semantics is the study of meaning. Merriam-Webster's Online Dictionary provides us with a definition that relates closely to how the term is applied to HTML "3 a : the meaning or relationship of meanings of a sign or set of signs; especially : connotative meaning." HTML uses elements that convey structural meaning to Web browsers and other user agents such as search engine crawlers.
Writing a page in semantic HTML simply means that you are applying the appropriate structural elements to the various bits of content on a page. Huh? Code elements act like labels that tell the Web browser what each section of content is. HTML gives us structural elements to indicate headers, paragraphs, lists, tables and so forth. If I want to tell the browser to start a new paragraph, I'll type <p>. If I want to start a new subhead, I'll type <h5>.
You would think so, but no. Web browsers can be both fussy and forgiving. I can code the same content in multiple ways that will each look very similar when viewed, but will actually convey differing amounts of information to user agents.
Instead of using an <h5> I could use <p><strong>in front of my subhead and make it look the same as it would using <h5>. Such usage would be considered presentational mark-up. It can affect how the header looks, but it is not semantically correct because it doesn't let user agents know that this is a subhead. If a Web designer applies the incorrect elements to page content, the site may look perfectly acceptable. But it is not passing on vital information that user agents may need to:
Sample page using semantic mark-up as viewed in the text-based browser, Lynx. Note how the page retains a sense of order, similar to an outline. This is more apparent when you view the enlarged version of the page.
Sample page using non-semantic mark-up as viewed in the text-based browser, Lynx. When you view the enlarged version of the page you really see the difference. This version seems more like a plain text file without any obvious formatting.
To illustrate this point I've created 2 very simple Web pages, one uses semantic mark-up and one using non-semantic mark-up. Basically the non-semantic version uses <p> for just about everything. When you view the pages through a regular browser you'll see that the semantic and non-semantic versions look pretty similar. They both look normal in Firefox, Safari and Opera, and they both break in various versions of Internet Explorer. You don't see the difference visually until you look at the pages in the text browser, Lynx.
When viewing the semantic page on Lynx, we can see that there is order to the page; it looks a bit like an outline. The menu looks like a menu and the headers standout to provide an introduction to the other text.
Text readers for the visually impaired and search engine spiders are getting even more information than we can see in the Lynx semantic html example. They know that each menu link should be distinguished from the next. The use of an unordered list for the menu tells user agents to separate these links in a way that use of <p> does not, and allows users of screen readers to jump through or skip these elements to proceed to the main text. In the non-semantic version this is not clear, a text reader may speak all of those links together, making it more difficult for the user to navigate.
These user agents can also tell that the first header is more important than the second because it has been coded as an <h3> while the second header is an <h4>. Headers are ranked in order of importance from 1-6. Here we're using an <h1> for the site name, the most important header on the page. Search engine crawlers will see words in an <h1> as being more descriptive of the page as a whole. This is useful for search engine optimization, because we can include our keywords and phrases in our various headers to let the search engines know that core topics we are covering on the page. Thus on this page you'll notice that I've used the phrase "semantic HTML" in both the text and subheads (which in this case are <h5>'s.) That said I've not used it in every subhead because having these headers make sense to you, the reader, is still more important than SEO. People come first, then robots.
In this example I've focused on just a few a few of the many HTML elements that are important to semantic mark-up, but hopefully these will give you a clearer sense of how such usage can help SEO and accessibility. Other elements such as address, cite and blockquote can add additional meaning to a page's code. You can learn more about other elements and related issues in the reference links below.
It's always a good practice to validate your code to check for errors and potential problems, but site validation doesn't guarantee that you've used the best mark-up for the site. While the validator can make sure you've used allowed elements, it has no way of knowing if you've them in the most appropriate manner. Both the semantic and non-semantic page samples used in this post were produced using valid W3C standards compliant XHTML and CSS. One is clearly better formed than the other, but both also break in Internet Explorer.
Thus it's also important to check sites in multiple browsers and to simply think carefully about how visitors will read the site. If my samples were for real sites, I'd fix the IE problem, but I used it here to remind us that using valid semantic code is just the beginning. There will always be additional details we must consider.
It's also worth noting that, when it comes to SEO, a semantically well-formed site is not a substitute for good content. Search engines such as Google are designed to help users, like you and me, find the most relevant pages for the information we seek. With that goal in mind they have to accommodate a wide variety of coding differences. If your competitor has great content and plenty of good inbound links, while yours does not, then his/her site will still win out, even if the code is atrocious. But if you can produce great content and present it in the appropriate format you will be off to a good start.