Page 1 of 1

the same old wish list

Posted: Sat Mar 16, 2013 2:29 am
by roedygr
Once a week I revalidate everything. This takes quite a long time. I run 16 different scripts.

When It finds a bad file, I have to manually copy/paste the filename from the browser to CSE.

I have quite a skookum computer, 6 cores, 16 GIGs or ram, all files on SSD. CSE is used to running on something much smaller and does not exploit it.

I would like to be able to validate all my 32,000 files in one script in one go.

I would like it if CSE kept track if the file has not changed since the last validation, so it can avoid the work of revalidating.

I would like it if CSE used all the RAM available and all the cores available, perhaps allowing me to configure a reserve of resources to do other work.

It would be nice if in batch mode on finding bad files it set them aside in a list that it loads at the end so all the files I need to correct are sitting ready to go, maybe even let me correct while the batch runs.

Re: the same old wish list

Posted: Sat Mar 16, 2013 2:55 am
by roedygr
How to track?
1. put names of files that validated ok along with date/time validated in a database.

2. create 0-length files is a tree mirroring all files successfully validated. Timestamp is last successful validation. Files that failed validation are not represented. OS will cache part of tree you are working on, and store legs without redundancy.

The idea is to avoid redoing successful validations. Redoing unsuccessful ones is ok. The normal case is nearly everything is fine. You are just trying to find a few needles of error in that haystack of perfection. Most of the time files have not changed since the last validation. This should drastically speed up a grand check-everything sweep.

Re: the same old wish list

Posted: Sat Mar 16, 2013 2:09 pm
by Lou
You know how this goes, I hate being a curmudgeon but, ...

You want CSE to expand to include a database management system to help you deal with your 32000 files and fully utilize your hardware? Why do I see this as a request to make CSE something as big as, as hard to maintain as, etc. as any of Windows resent (last 10-20 years) efforts to be all things to all people.

Yes, yes the art of building OS had made great strides sense DOS and they are easier for many to use. They are also, bigger than the applications being run, unstable, have capabilities that most users don't use/need/understand/or know exist.

Given the interface between CSE and other application that I know of, why don't you build a script to call CSE from your CMS and process the results? This would give you what you need, and keep the validator overhead reasonable for those of us that don't have the management issues you do.


Re: the same old wish list

Posted: Mon Mar 18, 2013 3:04 pm
by Albert Wiersch
This is an interesting problem, and I'm willing to work on a solution, but Lou is right in that it would be a bad idea to "bloat up" CSE HTML Validator. Adding a database engine to CSE HTML Validator is, at least for now, is not going to be feasible.

I have some thoughts & questions.

1. When you say you have 16 scripts, do you mean 16 different Batch Wizard target lists that you process sequentially? If so, then it should be possible to run some of these simultaneously with different instances of CSE HTML Validator. You would use the "-msgwindow <name>" command line option to run multiple instances of CSE HTML Validator and run more than one job at a time - one in each instance. This alone will probably give you a big boost given your hardware.

2. Are these 32,000 files local files or are they being downloaded with HTTP? This will make a difference in thinking of other solutions besides #1.

3. It should be easy to use custom "user functions" to create text file containing a list of targets that have errors or warnings in them. Would this be useful to you? You'd still need to open them manually, perhaps by copying the line (which contains the full path) and then opening it in CSE HTML Validator. However, you might be able to automate this by writing your own script to take these lines and open the files in CSE HTML Validator. Or, you could open all the files at once by passing the file to the command line processor using the "-f" option. If there are not too many files to open then this might work well. However, I don't know how it'll work with opening more than 100 or so files in CSE HTML Validator. I have not done much testing with so many open files.

4. The idea of using a folder tree as a "poor man's" database is intriguing and might work if you are checking local files (not HTTP). Let me know if you are and I'll think about this option more.