Page 1 of 1

Batch wizard performance

Posted: Tue Oct 01, 2019 12:44 pm
by anv2010
Hello,

It takes more than 18 hours for batch wizard to crawl my website.

Is there a feature I can disable to speed up the crawl?

Thank you,
Alex.

Re: Batch wizard performance

Posted: Tue Oct 01, 2019 4:45 pm
by Albert Wiersch
Hi Alex,

Great question. I have mulled over some possibilities for increasing the Batch Wizard speed for large jobs like yours.

Here is the documentation page that I'm working on:
https://www.htmlvalidator.com/2020/docs ... peedup.htm

There are several things to try. Do you know what is causing the most delay? Is a CPU core stuck at 100% validating documents? Is the computer doing a lot of waiting for the HTTP requests to finish before it can validate? It the link checking causing a long delay?

If there was a way to validate on the URLs that have changed since the last validation then would that be acceptable? That could significantly speed things up by greatly reducing the amount of documents that need to be checked.

Re: Batch wizard performance

Posted: Thu Oct 03, 2019 11:07 am
by anv2010
Hello,

I've disabled the following checks, and the elapsed time had decreased from 18 to 14 hours:

JSHint, JSLint, PHP Checker, Security Messages, Search Engine Messages and the keyword density message, and spell checking.

That's a very good improvement for me.
-- Thank you, Albert for the published "wizard_speedup.htm" document!

Albert, how expensive are the checks for duplicate HTML titles and meta descriptions?

Thank you!
Alex

Re: Batch wizard performance

Posted: Thu Oct 03, 2019 2:33 pm
by Albert Wiersch
Hi Alex,

That's great. That's a 22% improvement if my calculations are correct. :D

I have not done any tests to figure out how expensive the duplicate page title and duplicate meta description tests are. You process a large number of documents so they could be "expensive". If you want to find out though, then I can probably add some timing information to the Batch Wizard progress window that will spit out some processing times for these reports. It would have to be for the next major release of CSS HTML Validator which will be 2020/v20. If you'd like to test this out and help test a BETA version of the upcoming new major release then please let me know and I will put it on my to-do list.

Re: Batch wizard performance

Posted: Thu Oct 03, 2019 4:05 pm
by anv2010
Hi Albert,

Yes, by all means. I'd like timing information:-)

Alex.

Re: Batch wizard performance

Posted: Thu Oct 03, 2019 4:18 pm
by Albert Wiersch
Hi Alex,

Great. I will work on that in the next week or two and get a BETA to you for testing when ready.

Re: Batch wizard performance

Posted: Sun Oct 06, 2019 10:15 am
by Albert Wiersch
Hi Alex,

I hope to get a BETA to you by the end of this week.

I also wanted to let you know that I forgot to put HTML Tidy checking on that page. You're probably not using that but if you are then disabling HTML Tidy checking could save a significant amount of time. The next version of the documentation page will include this.

Re: Batch wizard performance

Posted: Sun Oct 06, 2019 12:11 pm
by anv2010
Hi Albert,

No, I wasn't using HTML Tidy checking.

I'm looking forward to testing the beta release.

Best,
Alex.

Re: Batch wizard performance

Posted: Mon Oct 21, 2019 6:32 pm
by Albert Wiersch
In case anyone following this topic is interested, adding the timing information for the duplicate title and meta description reports and then doing some test runs did not show that those reports where taking an inordinate amount of time.