Batch compare files against a master?

For technical support and bug reports for all editions of CSS HTML Validator, including htmlval for Linux and Mac.
Post Reply
jtramontana
Rank I - Novice
Posts: 13
Joined: Thu Feb 23, 2023 12:35 pm

Batch compare files against a master?

Post by jtramontana »

Is there a way to run a batch process that will look at a folder with "starter files" and compare them to "completed files"?
Basically looking for an easy way to check 125 websites that started out with a boilerplate and were turned in as "complete" by students.

Below is an example. I gave the page below to students and told them to use the correct tags and classes to format the page.

I'd like an easy way to find the slackers who never touched the assignment.

Last modified timestamp won't work because they work in OneDrive and the syncing messes up the last modified date.

I need a way to file compare, if possible.

<!DOCTYPE html>
<html lang="en">
<head>
</head>

<body>
<main>
<div>
<section>
TCB Turner

Greetings! I am TCB Turner and welcome to my website. I have
dedicated my life to helping people understand the weird and
wonderful world of temporal displacement (commonly know as time
travel).

It all started in 1992, when I had a chance encounter with Elvis
Presley (a man who supposedly died in 1977). Elvis hadn’t aged a day
and after a brief, strange conversation, he disappeared in a purple
1959 Cadillac convertible.

From that moment on, I became obsessed with time travel. Join me on
my journey as I share my findings, experiments, and theories on the
nature of time and how we may one day be able to control it!

<div><a href="concepts.html">Learn More</a></div>
</section>

<section>
<img src="img/tcbTurner.png" alt="TCB Turner" />
</section>
</div>
</main>
<footer>
<p>&copy; TCB Turner 2023</p>
</footer>
</body>
</html>
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch compare files against a master?

Post by Albert Wiersch »

Hello,

CSS HTML Validator isn't a file compare tool so there isn't much in it for file comparison.

However, it should be possible to use user functions to detect certain pieces of text in the document. If the text is detected then you could generate an error message... so if the student didn't change the text, then you'd see an error message.

I tested it with this and it seems to work well, looking for pieces of boilerplate text. If a student never touched a boilerplate document then it should detect it easily.

More information about the onEndTag_(tagname)() function is here:
https://www.htmlvalidator.com/current/d ... agname.htm

Code: Select all

function onEndTag_footer() {
 if getValueString(13)=='&copy; TCB Turner 2023' {
  Message(1,MSG_ERROR,'Boilerplate text detected in <'+CurrentTagName+'>.',getLocation(4));
 }
}

function onEndTag_section() {
 if stripos(getValueString(13),'I am TCB Turner and welcome to my website.')>=0 {
  Message(1,MSG_ERROR,'Boilerplate text detected in <'+CurrentTagName+'>.',getLocation(4));
 }
}
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
jtramontana
Rank I - Novice
Posts: 13
Joined: Thu Feb 23, 2023 12:35 pm

Re: Batch compare files against a master?

Post by jtramontana »

I add the code to a config file called "slacker.cfg" and saved in
slacker.png
slacker.png (38.52 KiB) Viewed 4026 times
the directory with my htmlvalV239.cfg.
I added the path to the Validator Engine Config User Function
user.PNG
user.PNG (36.19 KiB) Viewed 4026 times
Was there something else I needed to do to make it read the config and apply it when the batch runs. It looks like it was just running the files through the default validation process.

Sorry if this is a n00b question. I'm still learning the interface and being pulled in a 1000 different directions
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch compare files against a master?

Post by Albert Wiersch »

Hello,

That looks right but I can't see the file you have actually specified or confirm what's in it.

You do need to reload the config for the user functions to take effect, or you can just exit and restart CSS HTML Validator and it will reload it on startup, including any user function files you have specified.

If you're running CSS HTML Validator using different user accounts then that will complicate things a bit as the settings are per-user.

To test it you could throw in this function which should generate an error every time the <h1> element is used:

Code: Select all

function onStartTag_h1() {
 Message(1,MSG_ERROR,'TEST - <h1> element used - TEST');
}
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
jtramontana
Rank I - Novice
Posts: 13
Joined: Thu Feb 23, 2023 12:35 pm

Re: Batch compare files against a master?

Post by jtramontana »

index1 is the incomplete page they were given.
index2 is a completed one.

Do I need to specify the original file/folder in the cfg file so it knows what to compare?
Attachments
index2.zip
(1.26 KiB) Downloaded 231 times
index.zip
(888 Bytes) Downloaded 201 times
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch compare files against a master?

Post by Albert Wiersch »

jtramontana wrote: Fri Feb 24, 2023 10:57 am index1 is the incomplete page they were given.
index2 is a completed one.

Do I need to specify the original file/folder in the cfg file so it knows what to compare?
Hello,

No, you will have to manually hard-code some of the boilerplate text in the incomplete pages into the user functions so that CSS HTML Validator can check documents for the boilerplate text.

Upon looking at your examples, it looks like the boilerplate text is the same. I assumed the students would also need to change the boilerplate text. If the student just needs to markup the page/text with proper elements and not change the text itself then this technique won't work. :(

Because CSS HTML Validator is not a comparison tool, there is no way to specify a comparison file or folder. Unfortunately I can't think of an easy solution to this right now (if the solution to check for boilerplate text that I mentioned above won't work).
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
jtramontana
Rank I - Novice
Posts: 13
Joined: Thu Feb 23, 2023 12:35 pm

Re: Batch compare files against a master?

Post by jtramontana »

Ok. Not a biggie at all. The big deal is the cool validation stuff it does. Still love it!
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch compare files against a master?

Post by Albert Wiersch »

jtramontana wrote: Fri Feb 24, 2023 11:35 am Ok. Not a biggie at all. The big deal is the cool validation stuff it does. Still love it!
Glad you still like it! :D

I do have another thought ... one which may require some modifications to the program, is to check the document size based on the title text. You'd have to hard code the title text and file sizes into a user function. For example, if the title text is "TCB Turner" and the file size is 1401 bytes, then you could generate an error message that says that the document size is the same as or similar to the boilerplate document size.

If you want to pursue this, then let me know and I will investigate further.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
jtramontana
Rank I - Novice
Posts: 13
Joined: Thu Feb 23, 2023 12:35 pm

Re: Batch compare files against a master?

Post by jtramontana »

That's a cool idea. The reason this is an issue is because of OneDrive. If I could just look at the last modified date, I would know who the slackers are. But since it syncs in giant batches, it overwrites the last modified date --- but maybe just file size or character count would work??
Or is there maybe another source of meta data in the file that I can pull??
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Batch compare files against a master?

Post by Albert Wiersch »

OK, here is a basic user function that will check the title text and if the title text is 'TCB Turner' then it will throw an error if the document size appears to be small enough that it was not added to.

For document size, do not check against the exact document byte size... you should add at least 100 bytes to that due to the way the document size is calculated and use 'less than' instead of equal to (see examples).

You can just add more elseif's to check the document sizes for other boilerplate documents based on the title of the document that is being checked.

Code: Select all

function onEndTag_title() {
 $titletext=getValueString(13);
 $docsize=getValueInt(17);
 
 if $titletext=='TCB Turner' {
  if $docsize<1600 {
   Message(1,MSG_ERROR,'Possible boilerplate document detected by title text and document size.',getLocation(4));
  }
 }
 elseif $titletext=='Title of doc2' {
  if $docsize<2000 {
   Message(1,MSG_ERROR,'Possible boilerplate document detected by title text and document size.',getLocation(4));
  }
 }
}
Hope this works!
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Post Reply