validation generating server errors

For technical support for all editions of CSS HTML Validator. Includes bug reports.
Post Reply
AQiccl135
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Wed Aug 14, 2013 3:20 pm
Location: Florida

validation generating server errors

Post by AQiccl135 » Wed Aug 14, 2013 3:39 pm

When I point 11.02 Pro at some of my company's websites, another programmer gets an Unhandled Exception error claiming that "/WebResource.axd" cannot be found.

Main question: can I tell CSE to ignore that file extension? If not, what else can I do to stop getting that error?

Additional info if needed: as far as I understand from the other programmer, .NET looks at that extension and knows not to treat it as a page; he doesn't know if CSE understands that. He is also wondering if the request method being "head" is part of the problem.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3451
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: validation generating server errors

Post by Albert Wiersch » Thu Aug 15, 2013 7:58 am

Hello,

Because it's a HEAD request, it must be the link checker trying to check that link. It must have been linked to somewhere in your website.

In the Batch Wizard, go to the Target List Options tab. Try entering "*.axd" (without the quotes) in the "Don't check these links" field and see if that prevents the problem. It tells the Batch Wizard not to link check any URLs with an axd extension.

I hope this helps. Please let me know if you have any more questions.
Image
Albert Wiersch

AQiccl135
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Wed Aug 14, 2013 3:20 pm
Location: Florida

Re: validation generating server errors

Post by AQiccl135 » Thu Aug 15, 2013 9:21 am

In that same tab, is it possible to make CSE ignore code in <noindex> or <nofollow> tags? If not this tab, where else can I save those sections of code?

(I was just informed that I deleted a small DB yesterday afternoon :oops:)

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3451
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: validation generating server errors

Post by Albert Wiersch » Thu Aug 15, 2013 9:33 am

AQiccl135 wrote:In that same tab, is it possible to make CSE ignore code in <noindex> or <nofollow> tags? If not this tab, where else can I save those sections of code?
Perhaps. Can you send me a small sample or demo page that demonstrates exactly what you want ignored and when?
AQiccl135 wrote:(I was just informed that I deleted a small DB yesterday afternoon :oops:)
Sorry! :(
Image
Albert Wiersch

User avatar
RSteinwand
Rank VI - Professional
Rank VI - Professional
Posts: 600
Joined: Mon Jun 09, 2008 2:12 pm
Location: Fargo, ND
Contact:

Re: validation generating server errors

Post by RSteinwand » Thu Aug 15, 2013 9:36 am

I bet that "other programmer" gets a LOT of messages and yours are just a drop in the bucket. :D

If I look in our IIS logs, it's scary all the hacking attempts I see there on a daily basis.
Rick

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3451
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: validation generating server errors

Post by Albert Wiersch » Thu Aug 15, 2013 9:48 am

RSteinwand wrote:If I look in our IIS logs, it's scary all the hacking attempts I see there on a daily basis.
Same here... our server is under constant attack. and our online validator service at http://www.OnlineWebCheck.com/ is constantly abused (mostly by bots I think), even though most of the abusive requests are just denied. They don't care about the results, they just want to post a URL to it. I guess they think it's going to increase their search engine rankings or traffic. :roll:
Image
Albert Wiersch

AQiccl135
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Wed Aug 14, 2013 3:20 pm
Location: Florida

Re: validation generating server errors

Post by AQiccl135 » Thu Aug 15, 2013 10:05 am

Albert Wiersch wrote: Perhaps. Can you send me a small sample or demo page that demonstrates exactly what you want ignored and when?

Code: Select all

<html>
     <head>
           <meta name="robots" content="noindex, nofollow" />
     </head>
</html>
We think that CSE saw the user-specific "delete" link for an ad page and followed it.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3451
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: validation generating server errors

Post by Albert Wiersch » Thu Aug 15, 2013 10:17 am

Thanks.

Do you want CSE HTML Validator to abort the validation when it encounters that meta tag? The document would not be validated and the links in it would not be followed or checked.

If that's what you want, then I think it should be possible with a custom user function... please let me know if so and I'll give you more details on how to do this.
Image
Albert Wiersch

User avatar
RSteinwand
Rank VI - Professional
Rank VI - Professional
Posts: 600
Joined: Mon Jun 09, 2008 2:12 pm
Location: Fargo, ND
Contact:

Re: validation generating server errors

Post by RSteinwand » Thu Aug 15, 2013 10:27 am

AQiccl135 wrote:

Code: Select all

<html>
     <head>
           <meta name="robots" content="noindex, nofollow" />
     </head>
</html>
Can't you use use a robots.txt file instead (or better yet, a sitemap.xml file)?
Rick

AQiccl135
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Wed Aug 14, 2013 3:20 pm
Location: Florida

Re: validation generating server errors

Post by AQiccl135 » Thu Aug 15, 2013 10:56 am

Albert Wiersch wrote: Do you want CSE HTML Validator to abort the validation when it encounters that meta tag? The document would not be validated and the links in it would not be followed or checked.
That is what I believed to be the easiest fix, but I really don't want to test the scope of the fix on any other DB's.
Albert Wiersch wrote:I think it should be possible with a custom user function... please let me know if so and I'll give you more details on how to do this.
I'll need a bit more direction please.I am currently the "expert" of CSE in my department; and my experience is limited to pointing CSE at a site, telling it where to generate the report, and clicking the Start button.
RSteinwand wrote: Can't you use use a robots.txt file instead (or better yet, a sitemap.xml file)?
As far as I am aware, all of our sites do use one or both of the above mentioned. The problem is that I have no idea how to make CSE make use of it. The next problem is that there are X-number of methods for protecting a page or its code. If I tell CSE to use just ignore the ones of which I am currently aware, I'll have to break something else to be able to learn about new ones.

Basically, I need the broadest fix possible, or advice from people who have fixed this type of problem before.

~~~~~~~~~~~~~~~~~~

Since my last post, I have noticed some patterns and thought of a few questions.

As far as we are currently aware, the only things that got deleted were the ones that had the delete option be a link instead of JavaScript. We are now changing known links to be JS, but we are worried that we might be missing something due to an unfamiliarity with the inner workings of CSE.

My question is how the delete link was activated. Does CSE actually load the page to be able to validate it?

My co-worker's question is if there is any way to give a regular expression to CSE and tell it to ignore links that match the expression.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3451
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: validation generating server errors

Post by Albert Wiersch » Thu Aug 15, 2013 3:32 pm

Oh, I see. It sounds like CSE HTML Validator requested a URL that deleted the database?

CSE HTML Validator certainly needs to request URLs in order to validate them.

In the Target List Options Tab of the Batch Wizard, there is one option for not checking links and one option for not checking targets. You can specify strings to match against and use the "*" character at the beginning and/or end of each string. Perhaps these would work? However, they don't support regular expressions (but see below).

You can further tune what to check using custom user functions like these:
http://www.htmlvalidator.com/htmlval/v1 ... kcheck.htm
http://www.htmlvalidator.com/htmlval/v1 ... canadd.htm

There are some example functions on the above pages.

You can use this function in the above event functions to use regular expression matching:
http://www.htmlvalidator.com/htmlval/v1 ... hregex.htm

You just need to write the "user functions" in a normal text file (though UTF-8 encoded if you can), and specify it in the Target List Options Tab, as the "user functions file". The Batch Wizard will then load it when you run the Batch Wizard job.

If you can do what you want without using custom user functions, then that would be easiest, but if you need something more sophisticated, then you can likely do it with writing the custom user functions and putting them in a text file, and then specifying that file in the Target List Options Tab.

If you put or "hide" links in JavaScript, then it may or may not be seen by CSE HTML Validator. Generally, it won't, but in the future CSE HTML Validator may be able to extract more links from JavaScript.

Please let me know if you need more assistance.
Image
Albert Wiersch

User avatar
RSteinwand
Rank VI - Professional
Rank VI - Professional
Posts: 600
Joined: Mon Jun 09, 2008 2:12 pm
Location: Fargo, ND
Contact:

Re: validation generating server errors

Post by RSteinwand » Thu Aug 15, 2013 4:11 pm

If CSE deleted your database, it sounds like it wasn't very secure. I would think any search engine (or spam bot) could have done that as well.

The only way to delete a database should be by an domain admin, which I would hope would be a short list of people. Ideally, that database shouldn't even be on your web server or the same domain. Ours aren't.
Rick

Post Reply