Accented characters in URL/URI ?

For technical support and bug reports for all editions of CSS HTML Validator, including htmlval for Linux and Mac.
Post Reply
ktp
Rank III - Intermediate
Posts: 84
Joined: Sat Oct 29, 2016 10:34 am

Accented characters in URL/URI ?

Post by ktp »

Hello,

There is a Message id 2003031702:
"Using unencoded space characters in URLs may cause problems. If space characters must be used in URLs then they should be encoded as "%20" (without the quotes). However, avoid spaces in URLs whenever possible and consider using the underscore (_) or hyphen (-) character instead of space characters in folder names and filenames."

Does CSS HTML Validator also check for accented characters in URL/URI ?
According to RFC 3986 : https://www.ietf.org/rfc/rfc3986.txt ,
URI's are restricted to a subset of ASCII characters.

I have several URI with accented characters and I would like to spot them out using CSS HTML Validator/Batch Wizard.

Thank you for your support.
User avatar
Albert Wiersch
Site Admin
Posts: 3847
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Accented characters in URL/URI ?

Post by Albert Wiersch »

Hello,

Thank you for bringing this to my attention. I will see if I can address/improve this for the next update.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 84
Joined: Sat Oct 29, 2016 10:34 am

Re: Accented characters in URL/URI ?

Post by ktp »

I made a quick test with URI containing some special characters, and following is the result:

- Left/right parenthesis, left/right bracket, left/right square bracket "( ), { }, [ ]" : bad link with "Can't compute abs path". OK.
- space : Warning with Message id 2003031702. OK.
- single quote : currently no warning => need to be addressed.
- accented characters : currently no warning => need to be addressed.
ktp
Rank III - Intermediate
Posts: 84
Joined: Sat Oct 29, 2016 10:34 am

Re: Accented characters in URL/URI ?

Post by ktp »

Albert Wiersch wrote: Thu Mar 06, 2025 12:58 pm Hello,

Thank you for bringing this to my attention. I will see if I can address/improve this for the next update.
In the meantime, wating for next update, I found that thanks to Batch Wizard report,
in the list of processed documents, the links with accented characers can be found
by searching the "%" string.

This is because Batch Wizard replaces the accented characters with 2-byte coded in hex as %XX%YY.
For example, "é" = "e acute" = "HTML5 é" is replaced with %C3%A9.
Post Reply