/************************************************************************/ Copyright (C)2000-2009 AI Internet Solutions Last Update: July 11, 2009 Original: October 26, 2004 2009-07-11: Changed CSEGetInteger5EZ() to CSEGetInteger5EZW() 2009-07-10: Added note about using CSEJOBTYPELINKCHECK instead of CSEJOBTYPEVALIDATE to extract the links 2008-06-17: Getting absolute paths in v9.0+ 2008-06-16: Added "Clearing the link cache" 2008-06-13: More improvements (including "Aborting the link check") 2008-06-12: Minor fixes and improvements 2008-06-12: Updated to use wide strings (v8.0 and above). For the old "ANSI" version, see http://www.htmlvalidator.com/htmlval/developerdlllinkcheckerpseudocodeANSI.txt /************************************************************************/ /************************************************************************/ Overview /************************************************************************/ This C pseudo-code will help you use the CSE HTML Validator DLL for link checking. This document assumes that you have already loaded the validator DLL and have access to the DLL functions. To enable the link status CSELINKSTATUSMAYBEOK (recommended), do this once after loading the DLL: CSESetFlag2(CSECFGGLOBALFLAGS, CSEGLOBALFLAGENABLELINKMAYBEOK, 1); To use the link checker, a list of links must be loaded into the validator. When the links are loaded, tell the DLL to check the links. You'll then need to get the results of the link check. Normally you will want to load the link checker with a list of links extracted from a document. When an HTML document is validated with the included configuration file, the links are placed into a variable array. You can read that array to get the links. /************************************************************************/ Getting extracted links from the results of a validated document /************************************************************************/ Example: Assume that you have a resulthandle variable from a validation, containing the results of the validation. NOTE: You do not have to do a validation to extract the links since you can use a jobtype of CSEJOBTYPELINKCHECK instead of CSEJOBTYPEVALIDATE to just extract the links while not doing a validation. // get the links that the validator extracted from the document during the validation // these links are stored in the arrays "links" and "linkstype" // the "links" array contains the actual link // the "linkstype" array contains information about where the link was extracted, like "A HREF" or "IMG SRC" int linksindex, linkstypesindex; linksindex=CSEGetInteger4EZ(resulthandle, CSERESULTSYMBOLNAMEINDEX, "links"); linkstypesindex=CSEGetInteger4EZ(resulthandle, CSERESULTSYMBOLNAMEINDEX, "linkstypes"); int numlinks; // number of links // get the number of links (entries in array "links") if (linksindex>=0) numlinks=CSEGetInteger2EZ(resulthandle, CSERESULTNUMSYMBOLVALUES, linksindex); else numlinks=0; // now store the links somewhere const wchar_t *link, *linktype; for (int i=0; i0) { // if link added successfully // set the user agent string to use when checking the link - here it's set to "CSE HTML Validator" CSESetString7W(CSELINKAGENT, linkindex, L"CSE HTML Validator"); if (using a username and password) { CSESetString7W(CSELINKUSERNAME, linkindex, username); // set username to use when checking link CSESetString7W(CSELINKPASSWORD, linkindex, password); // set password to use when checking link } else CSESetString7W(CSELINKUSERNAME, linkindex, L""); // disable username and password when checking the link } } } /************************************************************************/ Adding links in version 8.9913 or greater /************************************************************************/ For version v8.9913 and above (CSEGetInteger3EZ(CSEPROGRAMVERSIONINT)>=89913), links to be checked can be added even when a link check is already running if CSEADDABSOLUTELINKNOENTEX or CSEADDABSOLUTELINKEX is used instead (notice the "EX"). See the above code and modify it appropriately with the below information. linkindex=CSESetString3W(CSEADDABSOLUTELINKNOENTEX, link); // add the link with CSEADDABSOLUTELINKNOENTEX instead of CSEADDABSOLUTELINKNOENT Now, it is important to release the link for link checking once any other options have been set, like username, password, and agent. Be sure to do this when using these "EX" functions or the link will never be cleared for checking! CSESetFlag6(CSELINKLINKFLAGS, linkindex, CSELINKFLAGNOTREADYFORCHECK, 0); // set ready to check /************************************************************************/ Tell the DLL to start checking the links /************************************************************************/ When the list of absolute links is loaded into the validator, the link check can be done. if (CSEGetInteger3EZ(CSELINKCHECKINGTHREADS)==0) { // if link check not already running int jobhandle=CSEGetNewHandle(CSEGETNEWHANDLEJOB); // get a new job handle if (jobhandle<0) return; // error if jobhandle is less than 0 so return // NOTE: The jobhandle of a CSEJOBTYPECHECKLINKS type job is automatically freed when done, so do not free it if (CSESetInteger(jobhandle, CSEJOBTYPE, CSEJOBTYPECHECKLINKS)<0) return; // set the jobtype - error if fuction returns less than 0 so return // if you want the link checker to send messages to your application with the progress of the link check, do the following to add two lines to the job buffer with CSEJOBADDLINETOBUF // windowhandle is the handle of the window (CONVERTED TO A WIDE STRING PTR), to send the status messages to if (CSESetStringW(jobhandle, CSEJOBADDLINETOBUF, windowhandle)) return retval; // windowmessage is the message to send to the window (CONVERTED TO A WIDE STRING PTR) (CSE uses WM_APP+1) // the LPARAM of the message will be a pointer to a global atom containg a text string (created with GlobalAddAtom()) // messages will be sent approximately after every 5% increment in progress // a global atom containing "-LinkCheckMessage x" will be sent, where x is an integer (0-100) indicating the link check percentage complete, the WPARAM of the message will be an integer (0-100) also indicating the percentage complete // when the link checker is completely finished, the global atom will contain "-LinkCheckMessage 101" and the WPARAM argument of the message will be 101 // it is the responsibility of the code receiving the message to free the global atom after it has been read if (CSESetStringW(jobhandle, CSEJOBADDLINETOBUF, windowmessage)) return retval; // now start the link checking if (CSERunJob(Handle, confighandle, jobhandle, 0)<0) return retval; } /************************************************************************/ Get the status of the links after link checking is complete /************************************************************************/ When the link checking is done, the status of the links can now be obtained. You can try to obtain the status of links before the link checker has completed, but not all links will be checked. If you'd like, you can verify that the link checker has completed if (CSEGetInteger3EZ(CSELINKCHECKINGTHREADS)==0) { // if link checker has completed and is not currently checking any links // do something here when the link checker is not running } // assume numlinks is the number of links in your link array that you want the result for // linkstatusint should return one of the following if the link is found (or <0 if not found) // CSELINKSTATUSOK - link is OK // CSELINKSTATUSNOTOK - link is not OK (bad) // CSELINKSTATUSNOTCHECKED - link has not been checked yet // CSELINKSTATUSMAYBEOK - link might be OK or might be bad (usually displayed as a warning link in CSE) redirected links are links that can be considered "maybe OK" int linkstatusint; const wchar_t *linkstatusstring, *linkcommentstring; for (int i=0; i0) // if succeeds then link check is running /************************************************************************/ Clearing the link cache /************************************************************************/ The link cache stores all the links and their status (bad, good, etc) until the DLL is unloaded or until the link cache is explicitly cleared. To explicitly clear the link cache, use the below code. Please note that the link cache cannot be cleared while a link check is running. If you try to do this, then a message like "cannot reset link cache while link check is running" is displayed and the link cache is not cleared, so you may want to make sure the link check is not running before clearing it. CSESetInteger3(CSELINKCACHENUMENTRIES, 0); // clear link cache /************************************************************************/ Getting absolute paths in v9.0+ /************************************************************************/ Version 9.0 and above make it easy to calculate absolute paths that support the new path mapping feature. if (CSEGetInteger3EZ(CSEPROGRAMVERSIONINT)>=89920) { // set the base path used to calculate the absolute links from the relative links - should contain no character references (per the "NOENT"); only needs to be done once per resulthandle CSESetStringW(resulthandle, CSERESULTBASEPATHNOENT, basepath); // returns the absolute path given the relative path; character references are converted; relpath can be exactly what is retrieved by CSEGetString4W(resulthandle, CSERESULTSYMBOLVALUE, linksindex, i); abspath=CSEGetString8W(resulthandle, CSERESULTGETABSPATHFROMRELPATH, relpath); // now, if desired, abspath can be added to the link checker to be checked }