SEO

How to build and publish a sitemap

by Niklas Waller on May 6, 2008

in SEO

There are many ways to create a web site for good search engine optimization. One of the things you should look into is creating a sitemap for your site.

A sitemap is a page that lists all (or at least the most important) pages on a site and the path to them.

Some say it is enough to just list them (the HTML-way) as links (as they are) or as links in a structured list to indicate on which level they are located. Others say that the sitemap should follow a special protocol, a specific xml-structure.
Both will do good to your site but the latter is better since it will provide web crawlers with more hints in a really structured way to do a better job crawling your site. This way is also the official way on how to create and use sitemaps.

All about sitemaps can be found on sitemaps.org. How the sitemap should look like is specified in the protocol (the Sitemaps XML format),

The sitemap can be named anything as far as I know even though the common name is sitemap.xml. You can create a static sitemap named sitemap.xml and place in the root of your site or you can create a dynamic one that changes instantly on new changes on the site. And that’s also how we have done it here at Wohill.

According to the protocol (where details, examples and further explanations can be found) the sitemap should have the following structure:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>
<url>
<loc>http://www.example.com/sublink</loc>
<lastmod>2008-04-01</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
</urlset>

Your sitemap should contain url elements (tags) for each page/link on your site. Every link /page is represented by an url-element. The url-element or tag contains four other elements: loc (the url), lastmod (when the url was last modified), changefreq (how often is this modified) and priority (the importance of this element on your entire site).

Create a static sitemap:

Write it yourself according to the protocol specification or have it done for you using a sitemap generator; for example this one. It will parse your site for public pages and create a sitemap with default values that you specify before. Place it in the rool of your site named sitemap.xml.

Create a dynamic sitemap:

Write it yourself according to the protocol specification dynamically.
Wohill’s sitemap is named sitemap.php. Have a look at it if you want to. It is validated according to several validators so you can safely use it as a reference on how it could look like.

Each time the sitemap is accessed it is recalculated with the most recent content from the database.
More technically the site is build using queries to the database for categories, tags, entries and comments and displayed using php-code and loops.

Making your sitemap visible:

You can submit your sitemap to several places, for example to Google, Yahoo and Ask. Google have their webmaster tools and Yahoo their Site Explorer, which noth allows for more advanced settings, especially Googel. Ask provides a special URL you that can use.

http://submissions.ask.com/ping?sitemap=SitemapUrl

MSN have no formal interface either. To submit your sitemap to the MSN search index, use the following url:

http://api.moreover.com/ping?u=http://yourdomain.com/yoursitemap.xml

You can also, and should also put a reference to your sitemap in robots.txt for auto-discovery. The major crawlers Google, Yahoo, MSN and Ask have agreed on a sitemap parameter for the robots.txt. So basically what you should do is also to create a robots.txt in the root of your site (unless you have one already) and add a row on the following format:

Sitemap: http://www.example.com/YOUR_SITEMAP

The bots visiting your site will always look for the robots.txt. This parameter helps them to quckly find the sitemap.

Sitemap validaton:

It is important to have a validated sitemap as well. There are several validators that you can use and some of them are listed below. If you’re not sure why it doesn’t get validated, then take a close look at the protocol for more details on how the structure should be formed.
- XML-Sitemaps.com
- Smart IT Consulting
- The W3C Markup Validation Service

Share and Enjoy:

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • email
  • Google Buzz
  • RSS
  • Slashdot
  • Technorati
  • Add to favorites
  • DZone
  • LinkedIn
  • MySpace
  • Tumblr

9 comments

Rewrite URLs with htaccess

by Niklas Waller on April 8, 2008

in SEO

If your website is on an Apache server you also have an .htaccess file (a hypertext access file). This file can be useful in several ways. It is often used for authorization and authentication, to customize default error pages (i.e. 404) and to rewrite urls. I have modified our .htaccess file to rewrite our urls in a specified way for specific reasons.

Those reasons are mainly usability and search engine optimization (SEO).
It is good to only have one address pointing to your site from a SEO standpoint. Most often it is possible to use at least two and these are with “www” and without. So you could use http://www.wohill.com and http://wohill.com and still get to the same site.
Now, we want all of our visitors to use one address and we can force people to end up at the same address no matter what they choose to type in the address bar. This is accomplished with two lines in the .htaccess file.

RewriteCond %{HTTP_HOST} !^(.*)\.wohill\.com$ [NC]
RewriteRule ^(.*)$ http://www.wohill.com/$1 [R=301,L]

If you didn’t use www.wohill.com, these lines will redirect you to the address you wanted to go to in the www.wohill.com domain.

Another thing you can do is to create more readable urls. This is for two reasons:
- It is easier for the visitor to remember and it looks nicer.
- It matters to search engines (good SEO)

Say that you have a front page ‘main.php’ that has a few querys to the database and presents different content based on the results of those querys. For example you might have a blog that makes a query to the database for all blog posts in march. These are displayed as links on the blog and people can click on them to only display that blogpost. Then the url would look something like this:

http://www.wohill.com/main.php?id=33

And if you want to display comments as well you have to add another parameter to the querystring:

http://www.wohill.com/main.php?id=33&comments=1

And so on… This is not a good looking url. Its format is widely used however. But there is an easy way to fix it into a more readble format. What you do is that instead of displaying those ugly urls on the webpage you make them look better. The equivalent to the first line one above would be:

http://www.wohill.com/the-name-of-the-post

Once again we make use of the RewriteRule command:

RewriteRule ^design/([0-9]+)/(.*).html$ http://www.wohill.com/main.php?id=$1 [L]

This means that if the incoming url is on the form – ‘design/’ (after http://www.wohill.com/) followed by any number (more than one), followed by a dot followed by anything and ending with ‘html’ – then it should (behind the scenes) be directed to the real address ‘http://www.wohill.com/main.php?id=anyNumber.

Create a rule like this for every different type of url formats you need on your site.
If you take a closer look at Wohill’s urls, clicking on any blog post, you will see that it is formed like this and that the name before ‘.html’ actually is the name of the blogpost but where each word is separated by a ‘-’ instead of space.
This is easily done with a JavaScript or PHP-function, but I’ll save that one as a tip for a day when I’m out of ideas instead.

NOTE!
The .htaccess has to start with some mandatory lines and end with some, to function. Here is an example of a fully functional .htaccess file that could be used as a small template:

# -FrontPage-

IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

# Enable mod_rewrite, start rewrite engine
RewriteEngine On

RewriteRule ^design/([0-9]+)/(.*).html$ http://www.wohill.com/main.php?id=$1 [L]
RewriteRule ^design/([0-9]+)/(.*).html/comments$ http://www.wohill.com/main.php?id=$1&comments=1 [L]

RewriteCond %{HTTP_HOST} !^(.*)\.wohill\.com$ [NC]
RewriteRule ^(.*)$ http://www.wohill.com/$1 [R=301,L]

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

Read more about regular expressions for the Apache Web Server here and about htaccess here.

Share and Enjoy:

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • email
  • Google Buzz
  • RSS
  • Slashdot
  • Technorati
  • Add to favorites
  • DZone
  • LinkedIn
  • MySpace
  • Tumblr

6 comments