"Don't process these targets" in Batch Wizard issue

For technical support for all editions of CSE HTML Validator. Includes bug reports.

"Don't process these targets" in Batch Wizard issue

Postby rag_gupta » Tue Nov 30, 2010 11:36 pm

I'm using v10 pro.

I've some 30000 pages in my website with the two category of urls:

1) http://localhost*com_content*.htm
2) http://localhost*com_content*com_download*.htm

What I want is to validate only number (2) links and skip all url's without com_download. How to do it?
Also com_download pages are in the 3rd level depth to reach them we'll need to follow (1) pages.

I've already verified all pages without com_download in the url.

Why because CSE is coming out with out of memory when at 24000 pages. So I need to classify the url's into two separate runs.

Also can't we get Xenu link broken link consolidated report? I can't go and view all the 100's of pages of report to find all the broken links.
I'll need to run Xenu for them.
rag_gupta
Rank II - Novice
Rank II - Novice
 
Posts: 28
Joined: Tue Nov 30, 2010 11:17 pm

Re: "Don't process these targets" in Batch Wizard issue

Postby Albert Wiersch » Wed Dec 01, 2010 9:20 am

rag_gupta wrote:1) http://localhost*com_content*.htm
2) http://localhost*com_content*com_download*.htm

What I want is to validate only number (2) links and skip all url's without com_download. How to do it?


Hello,

You should be able to do this by excluding the links you don't want. Please try entering "*com_content*" in the "Don't process these targets" option in the Target List Tab of the Batch Wizard. Another way might be to specify "*com_download*" in the "Process ONLY these targets" field.

rag_gupta wrote:Also com_download pages are in the 3rd level depth to reach them we'll need to follow (1) pages.


This may be a problem. You are wanting to validate only com_download pages which are linked from com_content pages, but not validate the com_content pages themselves? You may need to find another way to extract the com_download links. Perhaps dividing the job up differently might work better.

rag_gupta wrote:I've already verified all pages without com_download in the url.

Why because CSE is coming out with out of memory when at 24000 pages. So I need to classify the url's into two separate runs.


Yes, that is a lot of pages to do at once and memory resources could definitely be an issue.

rag_gupta wrote:Also can't we get Xenu link broken link consolidated report? I can't go and view all the 100's of pages of report to find all the broken links.
I'll need to run Xenu for them.


Do you want CSE HTML Validator to generate a Xenu type report? I am not familiar with Xenu reports. I could possibly look into different reporting formats for a future major version.

Also, if you only want to check links, then I recommend changing the "Tool to Use" to "link check". You could then only generate link reports for documents with bad links (see the "Link Report" options page in the Batch Wizard Options).
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX

Re: "Don't process these targets" in Batch Wizard issue

Postby Albert Wiersch » Mon Dec 06, 2010 12:24 pm

This topic got me thinking and I looked at the code to see if there were any memory efficiency improvements I could make. I believe I have found some areas where memory efficiency could be improved and thus have worked to improve those areas.

The improvements could results in significant memory savings (at least that what I am hoping). If you'd like, please email me at support at htmlvalidator dot com and I can send you a pre-release build to try.

I'd really like to know if you can see any improvements due to these changes when you check such large amounts of documents. Thanks!
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX

Re: "Don't process these targets" in Batch Wizard issue

Postby rag_gupta » Fri Dec 17, 2010 7:43 am

Unfortunately now only by chance I peeked into this thread. But I'm done for now. My schedule of time for this website (http://www.downloadformsindia.com) is over and is live now.

Had I seen your answer I'd definitely have tried your pre-release version.

I also want to point out that in your forum "Notify me when a reply is posted" is default off which actually should be ON by default like in any other forums because of which I missed this opportunity.
CSE has helped me a lot in correcting many html,css and javascript errors despite memory issues.
I'm happy I have it. Many thanks for this tool!

Anurag
rag_gupta
Rank II - Novice
Rank II - Novice
 
Posts: 28
Joined: Tue Nov 30, 2010 11:17 pm

Re: "Don't process these targets" in Batch Wizard issue

Postby Albert Wiersch » Fri Dec 17, 2010 8:50 am

rag_gupta wrote:Had I seen your answer I'd definitely have tried your pre-release version.

I also want to point out that in your forum "Notify me when a reply is posted" is default off which actually should be ON by default like in any other forums because of which I missed this opportunity.
CSE has helped me a lot in correcting many html,css and javascript errors despite memory issues.
I'm happy I have it. Many thanks for this tool!


Sorry you missed the offer. I was wanting to get your feedback to see if the improvements made a noticeable difference for you. :(

I think the reply notification is defaulted to on, but since it was your first post, which needs to be approved, it may not have been set.

I'm glad that CSE HTML Validator was still able to be a big help though!
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX

Re: "Don't process these targets" in Batch Wizard issue

Postby rag_gupta » Thu Mar 24, 2011 4:44 am

As just now I've mentioned, I'd be checking websites containing 100000 pages in batch wizard, I'm not sure whether CSE's memory usage would allow me to check this. Currently I've the latest version.
With 32 bit 4 G ram it could check up 17000 pages before crashing.
I just want your opinion about it's memory usage.
rag_gupta
Rank II - Novice
Rank II - Novice
 
Posts: 28
Joined: Tue Nov 30, 2010 11:17 pm

Re: "Don't process these targets" in Batch Wizard issue

Postby Albert Wiersch » Thu Mar 24, 2011 11:13 pm

rag_gupta wrote:As just now I've mentioned, I'd be checking websites containing 100000 pages in batch wizard, I'm not sure whether CSE's memory usage would allow me to check this. Currently I've the latest version.
With 32 bit 4 G ram it could check up 17000 pages before crashing.
I just want your opinion about it's memory usage.


Hello, please email at support at htmlvalidator dot com if you'd like to test a pre-release of the enterprise edition. I'd like to get your feedback on it especially if you are going to give the Batch Wizard a hard time :D . It is possible the upcoming version might be able to check that many targets, but I have not tested it that high (high being 100,000 pages in one job).

Ideally, with the new feature in the enterprise edition to use the file system for temporary storage and the new efficiency improvements planned for 10.02, I would hope for it to be able to check about 1 target per 1K of memory, which means 100,000 targets would ideally use only about 100MB of memory (in addition to base amount of memory)... but this will vary depending on the site and validation settings, and everything working right. :D

UPDATE: A more accurate measure of "ideal" memory use in 10.02 (with the enterprise edition option to use a working directory) might be 1K per target, plus a base amount of memory for the software/code, plus up to about 300 download URLs saved in memory for the job queue (if an average of 100K per page, then that could be up to about 30MB), plus more for link checking if enabled (the more links, the more memory). If not using the enterprise edition option to use a working directory or if using the professional edition, then the size of the validation reports would also need to be added because they are stored in memory - and this will vary according to how many messages are in each report.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX


Return to CSE Tech Support

Who is online

Users browsing this forum: No registered users and 1 guest