Page 1 of 1

Batch Wizard doesn't propagate jsessionid

Posted: Wed Sep 30, 2009 3:45 pm
by bknights
I am trying to link check a site that uses JSESSIONID. I'm not sure what's wrong with how the links are being generated but I'm getting lists of hundreds of pages when the whole site probably contains less than 50 pages. The pages are generated dynamically so I don't have an exact count. Is there a way to make the batch link checker ignore the JSESSIONID for purposes of deciding whether it has already crawled a link? I have 9.0302

Re: Batch Wizard doesn't propagate jsessionid

Posted: Wed Sep 30, 2009 5:56 pm
by Albert Wiersch
I'm sorry for the trouble.

Could send me your target list and a sample Batch Wizard report? Send to support at htmlvalidator dot com. Thank you.

Re: Batch Wizard doesn't propagate jsessionid

Posted: Thu Oct 01, 2009 1:53 pm
by bknights
I can't send a target report because so far the batch wizard has never completed. I've just set it off to run again.
I'll send the site by private email.

...
after running the link checker again for 23 minutes (1 second delay between requests) the list was up above 1400 links and pages had started to time out. I just canceled the run. I'll send what I have.

Re: Batch Wizard doesn't propagate jsessionid

Posted: Thu Oct 01, 2009 3:20 pm
by Albert Wiersch
bknights wrote:I can't send a target report because so far the batch wizard has never completed. I've just set it off to run again.
I'll send the site by private email.

...
after running the link checker again for 23 minutes (1 second delay between requests) the list was up above 1400 links and pages had started to time out. I just canceled the run. I'll send what I have.
Thanks. I got it and plan on checking it out tomorrow. A canceled Batch Wizard report should be fine to help in finding the problem.

Re: Batch Wizard doesn't propagate jsessionid

Posted: Fri Oct 02, 2009 10:30 am
by Albert Wiersch
Hi Brett,

I've reviewed the report files. It seems there is more than once session ID. It is possible this could be solved with an option to strip the jsessionid out of the URL for purposes of comparing whether a link has already been checked or not. Do you think this would address the problem? If so, I may be able to add this in the next major release but I can't make any guarantees.

A solution you could use now would be to add each link you want to check manually to the Batch Wizard and not have it follow links.

Re: Batch Wizard doesn't propagate jsessionid

Posted: Mon Oct 05, 2009 7:27 pm
by bknights
Albert,
Yes stripping out or ignoring the jsessionid would work quite well.
Regarding your workaround: can I load a list of links or do I have to do that one-by-one?

Re: Batch Wizard doesn't propagate jsessionid

Posted: Tue Oct 06, 2009 9:07 am
by Albert Wiersch
bknights wrote:Albert,
Yes stripping out or ignoring the jsessionid would work quite well.
Regarding your workaround: can I load a list of links or do I have to do that one-by-one?
OK, I will consider an option to strip "jsessionid" when comparing URLs for a future version, though I am unable to make any guarantees. Thanks for the suggestion!

As for loading a list of links, yes, you can make your own Batch Wizard target list file. For more information, please see:
http://www.htmlvalidator.com/htmlval/v9 ... ormats.htm

The easiest format is something like this:

Code: Select all

URL : http://www.htmlvalidator.com/page1.html
URL : http://www.htmlvalidator.com/page2.html
URL : http://www.htmlvalidator.com/page3.html
URL : http://www.htmlvalidator.com/page4.html