Sitemaps

For general web development questions that are not specifically related to CSS HTML Validator. This includes (but is not limited to) general HTML, CSS, Accessibility, JavaScript, and SEO questions.
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX

Sitemaps

Post by Albert Wiersch »

Does anyone here use sitemaps? If so, what do you think about them?

https://www.sitemaps.org/
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
PhilMorris
Rank 0 - Newcomer
Posts: 1
Joined: Wed Oct 31, 2007 1:42 pm

Post by PhilMorris »

Yes, extensively! Google uses them but dropping pages from the sitemap doesn't mean that Google drops them from its cache. Google now has a feature where you can tell it what pages to drop from the cache, but I haven't used that yet.

Because our sites (job banks) have a common code base (the look and feel, as well as content is determined by the websiute URL) we have several sitemaps. We originally put them all into the Robots.txt file because the major search engines have agreed to use that approach to identify sitemaps. However, not knowing if specifying multiple site maps in a single robots.txt file is okay, we recently implemented an HTTP handler for txt files and now generate a site specific robots.txt file on the fly. Only Google has a decent interface for submitting sitmaps (at least that I'm aware of) and both Yahoo and MSN seem much slower than Google to cache website pages, so I'm hoping that site specific robots.txt file does the job.

Anyway, there's nothing to loose from using sitemaps (except time) and everything to gain.
User avatar
MikeGale
Rank VI - Professional
Posts: 726
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Sitemaps have a place

Post by MikeGale »

I recently did some work on a site that was having trouble getting indexed. The site is a technical magazine for a specialist audience, with new material published monthly and much of the content only available to subscribers.

(The structure and design were not something I touched. Navigation was disrupted by some alterations to directory structure and navigation was written browser side by script, which, I think, the search engines don't detect.)

The attempted solution was to create a sitemap and let the search engines know about it.

Tool used was the Google sitemap generator (Python program) fired by a batch file, so it is a one-click process that probably takes less than a second.

Search engines were informed by submission of the sitemap and listing the sitemap in robots.txt. The submission process worked with Google, the others I'm not sure.

I only followed progress for a few days. The results, as I recall them were:

1) Live and Yahoo picked up the sitemap within 24 hours.
2) Google was reading it within 3 days.
3) Soon after reading the sitemap, the listed pages were spidered.
4) Appearance in search results not checked yet. (I may check it later when I've settled into my new country.)

Conclusions:
1. Sitemaps really do work.
2. The results are quick, a few days.
3. The path to getting into results will take a little longer, the sequence is a) read sitemap, b) spider listed pages, c) add to index (I have not investigated what determines the timings).
4. List in the robots.txt.

I've also written a custom sitemap generator (using Powershell). This was needed to implement some requirements that the standard generators don't implement (as far as I know). One of those requirements is honouring a robots noindex instruction on individual pages. This takes more effort and skill than just latching onto an existing program.

In a world where the engines no longer index everything they find, I suspect, that site owners, who help the engines, get some sort of edge. (If anyone has some thorough research on that I'd love to see a link!!)
User avatar
MikeGale
Rank VI - Professional
Posts: 726
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Check of what's listed

Post by MikeGale »

I just ran a check to see the number of pages now listed, after the sitemap was introduced.

The specific target was to list an entire subset of publicly available pages on the search engines. Previously only some (about 9%) of these pages were listed.

It appears that all of the pages are now listed in Google.

Conclusion: For this project sitemaps were completely successful.
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX

Re: Check of what's listed

Post by Albert Wiersch »

MikeGale wrote:Conclusion: For this project sitemaps were completely successful.
Thanks Mike! I would like to implement them on htmlvalidator.com but haven't had a chance to yet. It looks like it might be beneficial to find the time to do it but I think all or most of my site is already indexed so I'm not really sure. I should probably go ahead and do it anyway to learn more about sitemaps and the related processes.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
MikeGale
Rank VI - Professional
Posts: 726
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Post by MikeGale »

Hi Albert,

For a quick approach I suggest the Google sitemap generator. The process might be:

1) Install Python if not already there.
2) Install sitemap generator program.
3) Configure it. I suggest always disabling the feature that contacts Google each time a new map is generated. The filters are order dependent and a little thought goes a long way. Interactively check the results till you're happy.
4) I find a batch file (or equivalent) that does the job is a good finishing touch.
5) List the map in robots.txt. (There are other ways to submit, but I won't use them next time I do this.)

The default maps contain priority values, typically all the same. Deleting all these settings is more satisfying, but not essential.

If the site is all public pages the job is easy. If not, it can take a bit of extra work.
Rudy102
Rank 0 - Newcomer
Posts: 1
Joined: Thu Jun 26, 2008 8:14 pm
Location: US

Post by Rudy102 »

We have used sitemaps for quite some time in both xml and text format. Both appear to have worked well for the indexing of sites in Google, Yahoo and MSN. There are a number of free online xml sitemap creators that we've tested. A validator for sitemaps is a great idea. Keep us posted.
cadomagenge
Rank I - Novice
Posts: 10
Joined: Thu Oct 01, 2015 5:37 am

Re: Sitemaps

Post by cadomagenge »

I think XML Sitemap is useful to Google bot and HTML sitemap is useful to Users, especially for the large website. It will make google index more pages.