robots.txt permissions

For technical support for all editions of CSE HTML Validator. Includes bug reports.

robots.txt permissions

Postby Dandello2000 » Mon Sep 12, 2011 12:36 pm

Edited: sorry, I have .htaccess on the brain at the moment.
I have some sub-directories blocked from being spidered - but that also means that CSE is blocked from accessing those sub-directories as well when checking the site online, so it throws 401 errors all over the place.
How is CSE Html identified as a spider so I can give it the proper permissions in my robots.txt file?
Dandello2000
Rank I - Novice
Rank I - Novice
 
Posts: 12
Joined: Wed Nov 04, 2009 7:59 pm

Re: robots.txt permissions

Postby Albert Wiersch » Mon Sep 12, 2011 2:00 pm

Hello,

CSE HTML Validator does not pay attention to the robots.txt file, so it doesn't matter what is in there.

You can set the 'Agent' that CSE HTML Validator uses when making HTTP requests, however. By default it is "CSE HTML Validator Professional (http://www.htmlvalidator.com/)". You should be able to determine the request is from CSE HTML Validator by using that. It can be set in a target's properties or when adding a URL target to the Batch Wizard.

I hope this helps.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2435
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX

Re: robots.txt permissions

Postby Dandello2000 » Mon Sep 12, 2011 2:34 pm

I was right the first time *grin* - it's an .htaccess issue
I'll try it and see how it goes.
Dandello2000
Rank I - Novice
Rank I - Novice
 
Posts: 12
Joined: Wed Nov 04, 2009 7:59 pm


Return to CSE Tech Support

Who is online

Users browsing this forum: No registered users and 2 guests