Batch wizard performance

For technical support and bug reports for all editions of CSS HTML Validator, including htmlval for Linux and Mac.
User avatar
anv2010
Rank II - Novice
Posts: 34
Joined: Wed May 05, 2010 4:41 pm

Batch wizard performance

Post by anv2010 »

Hello,

It takes more than 18 hours for batch wizard to crawl my website.

Is there a feature I can disable to speed up the crawl?

Thank you,
Alex.
Conversions and Calculations
https://www.aqua-calc.com
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX

Re: Batch wizard performance

Post by Albert Wiersch »

Hi Alex,

Great question. I have mulled over some possibilities for increasing the Batch Wizard speed for large jobs like yours.

Here is the documentation page that I'm working on:
https://www.htmlvalidator.com/2020/docs ... peedup.htm

There are several things to try. Do you know what is causing the most delay? Is a CPU core stuck at 100% validating documents? Is the computer doing a lot of waiting for the HTTP requests to finish before it can validate? It the link checking causing a long delay?

If there was a way to validate on the URLs that have changed since the last validation then would that be acceptable? That could significantly speed things up by greatly reducing the amount of documents that need to be checked.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
anv2010
Rank II - Novice
Posts: 34
Joined: Wed May 05, 2010 4:41 pm

Re: Batch wizard performance

Post by anv2010 »

Hello,

I've disabled the following checks, and the elapsed time had decreased from 18 to 14 hours:

JSHint, JSLint, PHP Checker, Security Messages, Search Engine Messages and the keyword density message, and spell checking.

That's a very good improvement for me.
-- Thank you, Albert for the published "wizard_speedup.htm" document!

Albert, how expensive are the checks for duplicate HTML titles and meta descriptions?

Thank you!
Alex
Conversions and Calculations
https://www.aqua-calc.com
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX

Re: Batch wizard performance

Post by Albert Wiersch »

Hi Alex,

That's great. That's a 22% improvement if my calculations are correct. :D

I have not done any tests to figure out how expensive the duplicate page title and duplicate meta description tests are. You process a large number of documents so they could be "expensive". If you want to find out though, then I can probably add some timing information to the Batch Wizard progress window that will spit out some processing times for these reports. It would have to be for the next major release of CSS HTML Validator which will be 2020/v20. If you'd like to test this out and help test a BETA version of the upcoming new major release then please let me know and I will put it on my to-do list.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
anv2010
Rank II - Novice
Posts: 34
Joined: Wed May 05, 2010 4:41 pm

Re: Batch wizard performance

Post by anv2010 »

Hi Albert,

Yes, by all means. I'd like timing information:-)

Alex.
Conversions and Calculations
https://www.aqua-calc.com
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX

Re: Batch wizard performance

Post by Albert Wiersch »

Hi Alex,

Great. I will work on that in the next week or two and get a BETA to you for testing when ready.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX

Re: Batch wizard performance

Post by Albert Wiersch »

Hi Alex,

I hope to get a BETA to you by the end of this week.

I also wanted to let you know that I forgot to put HTML Tidy checking on that page. You're probably not using that but if you are then disabling HTML Tidy checking could save a significant amount of time. The next version of the documentation page will include this.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
anv2010
Rank II - Novice
Posts: 34
Joined: Wed May 05, 2010 4:41 pm

Re: Batch wizard performance

Post by anv2010 »

Hi Albert,

No, I wasn't using HTML Tidy checking.

I'm looking forward to testing the beta release.

Best,
Alex.
Conversions and Calculations
https://www.aqua-calc.com
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX

Re: Batch wizard performance

Post by Albert Wiersch »

In case anyone following this topic is interested, adding the timing information for the duplicate title and meta description reports and then doing some test runs did not show that those reports where taking an inordinate amount of time.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial