Speeding up Batch Validate revisited

For topics about current BETA or future releases, including feature requests.
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3237
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Speeding up Batch Validate revisited

Post by Albert Wiersch » Thu Apr 10, 2014 10:42 pm

Here is "v2" which uses empty files and the date of the cache file's last write in order to calculate whether or not to validate. It's probably a little more efficient but I doubt it would make a practical (noticeable) difference (but maybe it will?).

Code: Select all

(removed because updated code posted later)
Image
Albert Wiersch

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Fri Apr 11, 2014 9:42 am

It is now creating files in the cache of the form:
E__mindprod_jgloss__zip.html
{"lastwrite_year":2014,"lastwrite_month":4,"lastwrite_day":9,"lastwrite_hour":8,"lastwrite_min":16,"lastwrite_sec":37}

Looking good.

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Fri Apr 11, 2014 9:51 am

It is working fine. If were a production version, I would want something on the batch report to indicate which files were previously checked ok. But for my own use this is wonderful. It will save so much time. I imagined a directory tree in the cache, but your _ method is probably simpler to implement and with modern hashed directory lookup, no slower. Thanks very much.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3237
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Speeding up Batch Validate revisited

Post by Albert Wiersch » Fri Apr 11, 2014 9:57 am

Great! I'm glad it is working. You may want to try the newer version ("v2") that uses empty files and doesn't have to encode and decode JSON; just replace the original code with the new code I posted above (and update the $cachefolder variable). You shouldn't have to do anything with the cache files from the old version as they should still work with the new version.

If you do change the config and want to revalidate everything, then you will have to remember to manually delete all the cache files in the cache folder.

Also note that the "time resolution" is one minute, so if a file is changed within one minute of writing the cache file, then it may not be detected as changed. If that's an issue then the script could easily be adjusted to work down to the second.

I will see if I can get this into the documentation as a useful example of what can be done with TNPL.
Image
Albert Wiersch

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Fri Apr 11, 2014 2:28 pm

There is a bug in the way you transform filename into names for the cache file. It collapses similar names onto one or for some reason does not build an entry for every file.

For example. Here is the contents of my cache.

2014-04-11 11:27 118 E__mindprod_kjv_Luke_1.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke_1_.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke_10.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke_11.html
2014-04-11 11:27 119 E__mindprod_kjv_Luke_foot_1.html
2014-04-11 11:45 120 E__mindprod_kjv_Luke_foot_1_.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke_foot_10.html
2014-04-11 11:27 119 E__mindprod_kjv_Luke_foot_11.html
2014-04-11 11:27 119 E__mindprod_kjv_Luke_foot_index.html
2014-04-11 11:45 120 E__mindprod_kjv_Luke_foot__.html
2014-04-11 11:27 120 E__mindprod_kjv_Luke_foot__0.html
2014-04-11 11:27 120 E__mindprod_kjv_Luke_foot__1.html
2014-04-11 11:47 120 E__mindprod_kjv_Luke_foot___.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke_index.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke__.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke__0.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke__1.html
2014-04-11 11:27 118 E__mindprod_kjv_Luke___.html

Here are the files in the corresponding E:\mindprod\kjv\Luke\ directory
2014-04-09 1:16 18,672 1.html
2014-04-09 1:16 14,977 2.html
2014-04-09 1:16 13,797 3.html
2014-04-09 1:16 14,121 4.html
2014-04-09 1:16 13,633 5.html
2014-04-09 1:16 15,352 6.html
2014-04-09 1:16 15,385 7.html
2014-04-09 1:16 16,798 8.html
2014-04-09 1:16 17,100 9.html
2014-04-09 1:16 14,031 10.html
2014-04-09 1:16 16,184 11.html
2014-04-09 1:16 16,835 12.html
2014-04-09 1:16 13,058 13.html
2014-04-09 1:16 12,717 14.html
2014-04-09 1:16 12,126 15.html
2014-04-09 1:16 12,406 16.html
2014-04-09 1:16 12,782 17.html
2014-04-09 1:16 13,449 18.html
2014-04-09 1:16 14,468 19.html
2014-04-09 1:16 14,156 20.html
2014-04-09 1:16 12,943 21.html
2014-04-09 1:16 17,231 22.html
2014-04-09 1:16 15,396 23.html
2014-04-09 1:16 14,887 24.html
2014-04-09 1:16 9,249 index.html

If you need the actual files, see http://mindprod.com/kjv/Luke/*.*

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Fri Apr 11, 2014 5:22 pm

I posted a message showing how cache entries are missing . It disappeared. Did you remove it or did not post for some reason?

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Fri Apr 11, 2014 5:23 pm

roedygr wrote:There is a bug in the way you transform filename into names for the cache file. It collapses similar names onto one or for some reason does not build an entry for every file.
*
odd, now it is back.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3237
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Speeding up Batch Validate revisited

Post by Albert Wiersch » Fri Apr 11, 2014 5:42 pm

Ooops! The regular expression only allows 0-1 when it should be 0-9. Below is the fix. Don't forget to update $cachefolder.

Or you can just change the regular expression in your current code to: '[^a-zA-Z0-9\.\-\_]' from '[^a-zA-Z0-1\.\-\_]'

Code: Select all

/*********************
 * Set $cachefolder in onBeforeMainStart() to the folder to use to store data (it must end in a backslash)
 *********************/

function onBeforeMainStart() {
 $cachefolder='T:\\cache\\';
}

function onTargetCanAdd() {
 if !$otca_add return;

 $cachefile=$cachefolder+replaceRegEx($otca_target,'[^a-zA-Z0-9\.\-\_]','_');
// ProgressMessage('$cachefile: '+$cachefile);
 $cfinfo=getFileInfo($cachefile,1);

 if $cfinfo.isSet() {
  $tinfo=getFileInfo($otca_target,1);
  
  if $tinfo.isSet() {
   $otca_add=false;

   if $tinfo.lastwrite_year>$cfinfo.lastwrite_year { $otca_add=true; }
   else { if $tinfo.lastwrite_year==$cfinfo.lastwrite_year {
   if $tinfo.lastwrite_month>$cfinfo.lastwrite_month { $otca_add=true; }
   else { if $tinfo.lastwrite_month==$cfinfo.lastwrite_month {
   if $tinfo.lastwrite_day>$cfinfo.lastwrite_day { $otca_add=true; }
   else { if $tinfo.lastwrite_day==$cfinfo.lastwrite_day {
   if $tinfo.lastwrite_hour>$cfinfo.lastwrite_hour { $otca_add=true; }
   else { if $tinfo.lastwrite_hour==$cfinfo.lastwrite_hour {
   if $tinfo.lastwrite_min>$cfinfo.lastwrite_min { $otca_add=true; }
   }}}}}}}}
  }
 }
}

function onTargetProcessed() {
 $cachefile=$cachefolder+replaceRegEx(getValueString(5),'[^a-zA-Z0-9\.\-\_]','_');

 if getValueInt(1) || getValueInt(2) {
  deleteFile($cachefile);
 }
 else {
  writeFile($cachefile,'',2);
 }
}
I'm not sure why your post would have disappeared and come back... perhaps a caching issue? I did not delete or move it.
Image
Albert Wiersch

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Fri Apr 11, 2014 7:00 pm

My recent messages have disappeared again.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3237
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Speeding up Batch Validate revisited

Post by Albert Wiersch » Fri Apr 11, 2014 7:36 pm

roedygr wrote:My recent messages have disappeared again.
Sorry, I have no idea what's going on. Did you see the messages posted and then they disappeared or did you never see the messages posted? And are you sure you posted the message with 'Submit' and didn't accidentally just preview it?
Image
Albert Wiersch

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Thu Apr 17, 2014 5:28 am

There is something not quite right with the cache:

117 E__mindprod_kjv_1_Peter_1.html
0 E__mindprod_kjv_1_Peter_2.html
0 E__mindprod_kjv_1_Peter_3.html
0 E__mindprod_kjv_1_Peter_4.html
0 E__mindprod_kjv_1_Peter_5.html
119 E__mindprod_kjv_1_Peter_foot_1.html
118 E__mindprod_kjv_1_Peter_foot_index.html
120 E__mindprod_kjv_1_Peter_foot__.html
117 E__mindprod_kjv_1_Peter_index.html
117 E__mindprod_kjv_1_Peter__.html
0 E__mindprod_kjv_2_Peter_1.html
0 E__mindprod_kjv_2_Peter_2.html
0 E__mindprod_kjv_2_Peter_3.html
0 E__mindprod_kjv_2_Peter_index.html
118 E__mindprod_kjv___Peter_1.html
118 E__mindprod_kjv___Peter_foot_1.html
120 E__mindprod_kjv___Peter_foot_index.html
120 E__mindprod_kjv___Peter_foot__.html
118 E__mindprod_kjv___Peter_index.html
118 E__mindprod_kjv___Peter__.html
they are all dated today.

I have ony two directories with the string "Peter" in them
E:\mindprod\kjv\1_Peter
and
E:\mindprod\kjv\2_Peter
You should be ignoring the . and .. entries in the directories.
Perhaps the problem is left over cache entries from an earlier version. But the dates are today!

There should not be any names in the cache without a 1 or 2.

It keeps retesting files that are proven good, not all, just some.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3237
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Speeding up Batch Validate revisited

Post by Albert Wiersch » Thu Apr 17, 2014 7:38 am

Hello,

Please replace your onTargetProcessed() function with the one below (assuming you are using "v2" of the code - the version that uses empty files):

Code: Select all

function onTargetProcessed() {
 $cachefile=$cachefolder+replaceRegEx(getValueString(5),'[^a-zA-Z0-9\.\-\_]','_');

 if getValueInt(1) || getValueInt(2) { // if has errors or warnings
  deleteFile($cachefile);
 }
 else {
//  writeFile($cachefile,'',2); // create an empty file
  writeFile($cachefile,getValueString(5),2); // for debugging
 }
}
It will put the actual target name that the cache file is for in the cache file. You can then delete the questionable cache files and re-run your job. Then please look in the cache files at what the actual target name is. This should help find out where the problem is.
Image
Albert Wiersch

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Fri Apr 18, 2014 11:10 pm

I installed the 14.0300 version of HTMLValidator. My cache script has stopped working.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3237
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Speeding up Batch Validate revisited

Post by Albert Wiersch » Sat Apr 19, 2014 7:39 am

roedygr wrote:I installed the 14.0300 version of HTMLValidator. My cache script has stopped working.
Could the option to enable writeFile() have been reset to the default when you installed? New installations will have this option set to the default (which is disabled), or if you uninstalled a previous version and did a 'full uninstall'.

You may need to go to the Validator Engine Options and the 'Config File' page and enable the option that "enables potentially destructive functions like writeFile()" before the script will be able to create any files.

Is that option still set?

If not, I will need more details. The v14.03 update shouldn't have stopped the script from working. I did a quick test and didn't notice any issues.
Image
Albert Wiersch

User avatar
roedygr
Rank V - Professional
Rank V - Professional
Posts: 370
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

Re: Speeding up Batch Validate revisited

Post by roedygr » Sun Apr 20, 2014 9:07 am

I know this is still experimental, but pushing it toward wider use ...
It should check at the beginning if the cache dir exists.
If it does not it should complain and/or create it and
complain if the create failed.

Otherwise all your cache entries just quietly go into nul:

Post Reply