Batch wizard performance

For technical support for all editions of CSS HTML Validator. Includes bug reports.
Post Reply
User avatar
anv2010
Rank II - Novice
Rank II - Novice
Posts: 34
Joined: Wed May 05, 2010 4:41 pm
Contact:

Batch wizard performance

Post by anv2010 » Tue Oct 01, 2019 12:44 pm

Hello,

It takes more than 18 hours for batch wizard to crawl my website.

Is there a feature I can disable to speed up the crawl?

Thank you,
Alex.
Conversions and Calculations
https://www.aqua-calc.com

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3425
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch wizard performance

Post by Albert Wiersch » Tue Oct 01, 2019 4:45 pm

Hi Alex,

Great question. I have mulled over some possibilities for increasing the Batch Wizard speed for large jobs like yours.

Here is the documentation page that I'm working on:
https://www.htmlvalidator.com/2020/docs ... peedup.htm

There are several things to try. Do you know what is causing the most delay? Is a CPU core stuck at 100% validating documents? Is the computer doing a lot of waiting for the HTTP requests to finish before it can validate? It the link checking causing a long delay?

If there was a way to validate on the URLs that have changed since the last validation then would that be acceptable? That could significantly speed things up by greatly reducing the amount of documents that need to be checked.
Image
Albert Wiersch

User avatar
anv2010
Rank II - Novice
Rank II - Novice
Posts: 34
Joined: Wed May 05, 2010 4:41 pm
Contact:

Re: Batch wizard performance

Post by anv2010 » Thu Oct 03, 2019 11:07 am

Hello,

I've disabled the following checks, and the elapsed time had decreased from 18 to 14 hours:

JSHint, JSLint, PHP Checker, Security Messages, Search Engine Messages and the keyword density message, and spell checking.

That's a very good improvement for me.
-- Thank you, Albert for the published "wizard_speedup.htm" document!

Albert, how expensive are the checks for duplicate HTML titles and meta descriptions?

Thank you!
Alex
Conversions and Calculations
https://www.aqua-calc.com

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3425
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch wizard performance

Post by Albert Wiersch » Thu Oct 03, 2019 2:33 pm

Hi Alex,

That's great. That's a 22% improvement if my calculations are correct. :D

I have not done any tests to figure out how expensive the duplicate page title and duplicate meta description tests are. You process a large number of documents so they could be "expensive". If you want to find out though, then I can probably add some timing information to the Batch Wizard progress window that will spit out some processing times for these reports. It would have to be for the next major release of CSS HTML Validator which will be 2020/v20. If you'd like to test this out and help test a BETA version of the upcoming new major release then please let me know and I will put it on my to-do list.
Image
Albert Wiersch

User avatar
anv2010
Rank II - Novice
Rank II - Novice
Posts: 34
Joined: Wed May 05, 2010 4:41 pm
Contact:

Re: Batch wizard performance

Post by anv2010 » Thu Oct 03, 2019 4:05 pm

Hi Albert,

Yes, by all means. I'd like timing information:-)

Alex.
Conversions and Calculations
https://www.aqua-calc.com

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3425
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch wizard performance

Post by Albert Wiersch » Thu Oct 03, 2019 4:18 pm

Hi Alex,

Great. I will work on that in the next week or two and get a BETA to you for testing when ready.
Image
Albert Wiersch

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3425
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch wizard performance

Post by Albert Wiersch » Sun Oct 06, 2019 10:15 am

Hi Alex,

I hope to get a BETA to you by the end of this week.

I also wanted to let you know that I forgot to put HTML Tidy checking on that page. You're probably not using that but if you are then disabling HTML Tidy checking could save a significant amount of time. The next version of the documentation page will include this.
Image
Albert Wiersch

User avatar
anv2010
Rank II - Novice
Rank II - Novice
Posts: 34
Joined: Wed May 05, 2010 4:41 pm
Contact:

Re: Batch wizard performance

Post by anv2010 » Sun Oct 06, 2019 12:11 pm

Hi Albert,

No, I wasn't using HTML Tidy checking.

I'm looking forward to testing the beta release.

Best,
Alex.
Conversions and Calculations
https://www.aqua-calc.com

Post Reply