UTF-8 to Unicode Converter
Get Unicode Character Codes
This converts UTF-8 strings to sequences of Unicode hexadecimal character codes. This can be handy for various programming reasons.
Just as an example, Adobe (or was it Macromedia then?) Flash didn't used to support non-European characters in the development interface, but it would display them in the runtime environment. The solution was to put the characters into Flash as long strings of escaped Unicode hexadecimal codes.
UTF-8 Conversion Notes
The following are shorthand notes for the conversion. For a thorough discussion (and for my notes below to mean anything), I recommend the Wikipedia article on UTF-8.
Rules for determining number of bytes used by character:
if first_byte > 1111 0000 (F0h, 240d) then there will be 4 bytes in this character
if first_byte > 1110 0000 (E0h, 224d) then there will be 3 bytes in this character
if first_byte > 1100 0000 (C0h, 192d) then there will be 2 bytes in this character
if first_byte > 1000 0000 (80h, 128d) then ERROR, this is a continuation byte!
else then there will be 1 byte in this character
AND MASK
Byte Bin Hex Dec
-----------------------------------------------------------------
1st byte of 4 byte character (1111 0xxx) 111 7 7
1st byte of 3 byte character (1110 xxxx) 1111 F 15
1st byte of 2 byte character (110x xxxx) 11111 1F 31
2nd, 3rd, or 4th byte (10xx xxxx) 111111 3F 63
1st byte of 1 byte character (0xxx xxxx) n/a n/a n/a
Multiplier
Byte Bin Hex Dec
------------------------------------------------------------------
1 of 4 1 000000 000000 000000 40000 262144
1 of 3; 2 of 4 1 000000 000000 1000 4096
1 of 2; 2 of 3; 3 of 4 1 000000 40 64
1 of 1; 2 of 2; 3 of 3; 4 of 4 n/a n/a n/a
This Page
To make sure the page displays correctly and receives your text input as UTF-8, I use PHP's header() function to make the following setting in the HTTP header returned by the server:
header("Content-type: text/html; charset=utf-8");