HTML Encoder

This is a helper object of HTML Parser Light. It provides text encoding/decoding using HTML/XML entities. The entities can be configured, thus even if something is not by default in it you can add it. The default configuration is close to the common HTML practice - i.e. in the HTML encoding only a small number of special characters are encoded using HTML entities (for example &quot;) while all the rest is encoded over the standard rules &<charcode>;

Creation:

ProgID: newObjects.utilctls.HTMLEncoder
ClassID: {E01B7474-2E63-4683-8A7C-FA41C5AB83D7}
free threaded version
ProgID: newObjects.utilctls.HTMLEncoder.free
ClassID: {0D46936D-3EE7-494a-9C7B-DC63AA3E968B}

Members reference

Member Syntax Description
Encode str = obj.Encode(string) HTML Encodes a string
Decode str = obj.Decode(string) HTML Decodes a string
Configuration members
LoadDefaultEntities obj.LoadDefaultEntities Loads the default set of entities recognized/used by the decoder/encoder. They are only:
' - &quot;
& - &amp;
< - &lt;
> - &gt;
  - &nbsp; (non-breaking space)
© - &copy;
® - &reg;

Any other entity you want to understand/use must be configured using the Entity property.

codePage obj.CodePage = x
x = obj.CodePage
The code page used for conversions such as from UNICODE (the strings you pass to Encode when forceANSI=True) or to UNICODE (the numerically encoded characters in decoded strings if forceANSI = True). See the remark section for more details.
entity obj.entity(charcode) = string
x = obj.entity(charcode) 
Get/put the entity name for a particular character code. Through this property named representations of any character can be configured. For instance let do this for the & character:
obj.entity(38) = "amp"
You set the entity by putting only its name, in the HTML encoded text the entity appears as &<entity_name>; Thus the amp from the sample above will appear as &amp;
Named entities are defined by the HTML/XML standards for special characters only (including some accented letters). Using non-standard entities for encoding should be avoided, but for decoding you can configure any entities in order to cope with the input string even if it violates the standards.

When forceANSI is true the character_code is assumed to be a character code from the specified code page. Otherwise it is assumed to be an UNICODE character code. Still, when decoding if the character_code is > 256 the UNICODE character with that code will be substituted and no error will occur.

useEntities obj.useEntities = boolvalue
x = obj.useEntities 
Has effect over the Encode method only. When set to False the configured entities are not used and numerical representation is used instead (for example instead of &amp; &#38; will be generated). (default is False)
forceANSI obj.forceANSI = boolvalue
x = obj.forceANSI 
When set to True it is assumed that the HTML content is to be encoded to/decoded from a string where the numerically encoded characters will represent ANSI character codes from the specified codePage. When it is False it is assumed that the numerically encoded characters are UNICODE characters. The non-encoded characters (Decode method) are always converted through the specified codePage.

(default it is False)

useHex obj.useHex = boolvalue
x = obj.useHex
Specifies how to encode characters numerically. If True &#xnnn; form is used if False the &#nnn; form is used.
(default is False)
ignoreUnknownEntities obj.ignoreUnknownEntities = boolvalue
x = obj.ignoreUnknownEntities
Specifies how to deal with unknown entities on Decode. If set to False the Decode method will fail if unrecognized entity is found, otherwise the entity will be put "as is" in the output.
(default is True)
maxEntityLen obj.maxEntityLen = n
x = obj.maxEntityLen
The maximum size of a named entity. It is doubtful that this will need changing in any kind of application. 
(default is 32)
encodeSpecial obj.encodeSpecial = boolvalue
x = obj.encodeSpecial
If set to True the Encode method will encode also the <CR><LF> and space.
(default is False)

Remarks

The HTML encoding can be a problem sometimes. There are HTML editors and functions that produce incorrect encoding and on some occasions it is possible that the HTML encode/decode facilities at hand are unable to cope with some data. This object can be configured in many ways, thus enabling you to deal with different problems.

Aside of the problem solving this object is needed when working with HTML Parser Light in programming environments which do not offer HTML encoding facilities of their own. For example in ASP/ALP you have Server.HTMLEncode but in VB or NSBasic which are desktop oriented you will not have such a function at hand.