- UTF-8, a transformation format of ISO 10646
encoding represents UCS-4 characters as a sequence of octets, using
between 1 and 6 for each character.
It is backwards compatible with
so 0x00-0x7f refer to the
The multibyte encoding of
consist entirely of bytes whose high order bit is set.
encoding is represented by the following table:
If more than a single representation of a value exists (for example,
0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
Longer ones are detected as an error as they pose a potential
security risk, and destroy the 1:1 character:octet sequence mapping.