UTF-8 Conversion Notes
The following are shorthand notes for the conversion. For a thorough discussion (and for my notes below to mean anything), I recommend the Wikipedia article on UTF-8.
Rules for determining number of bytes used by character:
if first_byte > 1111 0000 (F0h, 240d) then there will be 4 bytes in this character
if first_byte > 1110 0000 (E0h, 224d) then there will be 3 bytes in this character
if first_byte > 1100 0000 (C0h, 192d) then there will be 2 bytes in this character
if first_byte > 1000 0000 (80h, 128d) then ERROR, this is a continuation byte!
else then there will be 1 byte in this character
AND MASK
Byte Bin Hex Dec
-----------------------------------------------------------------
1st byte of 4 byte character (1111 0xxx) 111 7 7
1st byte of 3 byte character (1110 xxxx) 1111 F 15
1st byte of 2 byte character (110x xxxx) 11111 1F 31
2nd, 3rd, or 4th byte (10xx xxxx) 111111 3F 63
1st byte of 1 byte character (0xxx xxxx) n/a n/a n/a
Multiplier
Byte Bin Hex Dec
------------------------------------------------------------------
1 of 4 1 000000 000000 000000 40000 262144
1 of 3; 2 of 4 1 000000 000000 1000 4096
1 of 2; 2 of 3; 3 of 4 1 000000 40 64
1 of 1; 2 of 2; 3 of 3; 4 of 4 n/a n/a n/a

