Page 1 of 1

validation generating server errors

Posted: Wed Aug 14, 2013 3:39 pm
by AQiccl135
When I point 11.02 Pro at some of my company's websites, another programmer gets an Unhandled Exception error claiming that "/WebResource.axd" cannot be found.

Main question: can I tell CSE to ignore that file extension? If not, what else can I do to stop getting that error?

Additional info if needed: as far as I understand from the other programmer, .NET looks at that extension and knows not to treat it as a page; he doesn't know if CSE understands that. He is also wondering if the request method being "head" is part of the problem.

Re: validation generating server errors

Posted: Thu Aug 15, 2013 7:58 am
by Albert Wiersch
Hello,

Because it's a HEAD request, it must be the link checker trying to check that link. It must have been linked to somewhere in your website.

In the Batch Wizard, go to the Target List Options tab. Try entering "*.axd" (without the quotes) in the "Don't check these links" field and see if that prevents the problem. It tells the Batch Wizard not to link check any URLs with an axd extension.

I hope this helps. Please let me know if you have any more questions.

Re: validation generating server errors

Posted: Thu Aug 15, 2013 9:21 am
by AQiccl135
In that same tab, is it possible to make CSE ignore code in <noindex> or <nofollow> tags? If not this tab, where else can I save those sections of code?

(I was just informed that I deleted a small DB yesterday afternoon :oops:)

Re: validation generating server errors

Posted: Thu Aug 15, 2013 9:33 am
by Albert Wiersch
AQiccl135 wrote:In that same tab, is it possible to make CSE ignore code in <noindex> or <nofollow> tags? If not this tab, where else can I save those sections of code?
Perhaps. Can you send me a small sample or demo page that demonstrates exactly what you want ignored and when?
AQiccl135 wrote:(I was just informed that I deleted a small DB yesterday afternoon :oops:)
Sorry! :(

Re: validation generating server errors

Posted: Thu Aug 15, 2013 9:36 am
by RSteinwand
I bet that "other programmer" gets a LOT of messages and yours are just a drop in the bucket. :D

If I look in our IIS logs, it's scary all the hacking attempts I see there on a daily basis.

Re: validation generating server errors

Posted: Thu Aug 15, 2013 9:48 am
by Albert Wiersch
RSteinwand wrote:If I look in our IIS logs, it's scary all the hacking attempts I see there on a daily basis.
Same here... our server is under constant attack. and our online validator service at http://www.OnlineWebCheck.com/ is constantly abused (mostly by bots I think), even though most of the abusive requests are just denied. They don't care about the results, they just want to post a URL to it. I guess they think it's going to increase their search engine rankings or traffic. :roll:

Re: validation generating server errors

Posted: Thu Aug 15, 2013 10:05 am
by AQiccl135
Albert Wiersch wrote: Perhaps. Can you send me a small sample or demo page that demonstrates exactly what you want ignored and when?

Code: Select all

<html>
     <head>
           <meta name="robots" content="noindex, nofollow" />
     </head>
</html>
We think that CSE saw the user-specific "delete" link for an ad page and followed it.

Re: validation generating server errors

Posted: Thu Aug 15, 2013 10:17 am
by Albert Wiersch
Thanks.

Do you want CSE HTML Validator to abort the validation when it encounters that meta tag? The document would not be validated and the links in it would not be followed or checked.

If that's what you want, then I think it should be possible with a custom user function... please let me know if so and I'll give you more details on how to do this.

Re: validation generating server errors

Posted: Thu Aug 15, 2013 10:27 am
by RSteinwand
AQiccl135 wrote:

Code: Select all

<html>
     <head>
           <meta name="robots" content="noindex, nofollow" />
     </head>
</html>
Can't you use use a robots.txt file instead (or better yet, a sitemap.xml file)?

Re: validation generating server errors

Posted: Thu Aug 15, 2013 10:56 am
by AQiccl135
Albert Wiersch wrote: Do you want CSE HTML Validator to abort the validation when it encounters that meta tag? The document would not be validated and the links in it would not be followed or checked.
That is what I believed to be the easiest fix, but I really don't want to test the scope of the fix on any other DB's.
Albert Wiersch wrote:I think it should be possible with a custom user function... please let me know if so and I'll give you more details on how to do this.
I'll need a bit more direction please.I am currently the "expert" of CSE in my department; and my experience is limited to pointing CSE at a site, telling it where to generate the report, and clicking the Start button.
RSteinwand wrote: Can't you use use a robots.txt file instead (or better yet, a sitemap.xml file)?
As far as I am aware, all of our sites do use one or both of the above mentioned. The problem is that I have no idea how to make CSE make use of it. The next problem is that there are X-number of methods for protecting a page or its code. If I tell CSE to use just ignore the ones of which I am currently aware, I'll have to break something else to be able to learn about new ones.

Basically, I need the broadest fix possible, or advice from people who have fixed this type of problem before.

~~~~~~~~~~~~~~~~~~

Since my last post, I have noticed some patterns and thought of a few questions.

As far as we are currently aware, the only things that got deleted were the ones that had the delete option be a link instead of JavaScript. We are now changing known links to be JS, but we are worried that we might be missing something due to an unfamiliarity with the inner workings of CSE.

My question is how the delete link was activated. Does CSE actually load the page to be able to validate it?

My co-worker's question is if there is any way to give a regular expression to CSE and tell it to ignore links that match the expression.

Re: validation generating server errors

Posted: Thu Aug 15, 2013 3:32 pm
by Albert Wiersch
Oh, I see. It sounds like CSE HTML Validator requested a URL that deleted the database?

CSE HTML Validator certainly needs to request URLs in order to validate them.

In the Target List Options Tab of the Batch Wizard, there is one option for not checking links and one option for not checking targets. You can specify strings to match against and use the "*" character at the beginning and/or end of each string. Perhaps these would work? However, they don't support regular expressions (but see below).

You can further tune what to check using custom user functions like these:
http://www.htmlvalidator.com/htmlval/v1 ... kcheck.htm
http://www.htmlvalidator.com/htmlval/v1 ... canadd.htm

There are some example functions on the above pages.

You can use this function in the above event functions to use regular expression matching:
http://www.htmlvalidator.com/htmlval/v1 ... hregex.htm

You just need to write the "user functions" in a normal text file (though UTF-8 encoded if you can), and specify it in the Target List Options Tab, as the "user functions file". The Batch Wizard will then load it when you run the Batch Wizard job.

If you can do what you want without using custom user functions, then that would be easiest, but if you need something more sophisticated, then you can likely do it with writing the custom user functions and putting them in a text file, and then specifying that file in the Target List Options Tab.

If you put or "hide" links in JavaScript, then it may or may not be seen by CSE HTML Validator. Generally, it won't, but in the future CSE HTML Validator may be able to extract more links from JavaScript.

Please let me know if you need more assistance.

Re: validation generating server errors

Posted: Thu Aug 15, 2013 4:11 pm
by RSteinwand
If CSE deleted your database, it sounds like it wasn't very secure. I would think any search engine (or spam bot) could have done that as well.

The only way to delete a database should be by an domain admin, which I would hope would be a short list of people. Ideally, that database shouldn't even be on your web server or the same domain. Ours aren't.