Plain text, ASCII, ANSI, UNICODE, UTF-8, UTF-16 confused?

I came across this wonderful post by Joel Spolsky on characters, character sets, ANSI, ASCII, UNICODE and much much more. I have to admit that till now I thought plain text is ASCII is ANSI, all within 8-bits. And anything that takes up 2-bytes was unicode. Well, I couldn’t have been more wrong! Here are a few excerpts from his entry:

"All that stuff about "plain text = ascii = characters are 8 bits" is not only wrong, it’s hopelessly wrong, and if you’re still programming that way, you’re not much better than a medical doctor who doesn’t believe in germs."

"Some people are under the misconception that Unicode is simply a 16-bit code where each character takes 16 bits and therefore there are 65,536 possible characters. This is not, actually, correct. It is the single most common myth about Unicode, so if you thought that, don’t feel bad."

Don’t miss this one. Grab a read at:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Better learn late than never (:

Leave a Reply Cancel reply