A Primer on Unicode and the Generator
The Generator
If you scroll down you'll come across what I've dubbed the Unicode Character Generator. By clicking on either the Unicode categories or buttons beside them a new page is loaded in your browser that reveals (actually tests your browser's capacity to display) the Unicode characters.
Go ahead and do some clicking right now. Check out the category Basic Latin (it contains the English alphabet) for starters to get a feel for what the Generator does and should do.
The Generator is a Javascript program, so if your browser doesn't understand that language or if you've switched off its JavaScript option then clicking on the names and buttons won't trigger anything.
Note that this document is designed to run off-line. There's no need to connect to the Internet. Take note as well that some categories, like the Oriental language sets for instance, have hundreds and even tens of thousands of characters. Therefore, the Generator will take many seconds before displaying anything and finishing its task. Be patient. Else, run this page on a 50Ghz supercomputer.
What is the Unicode standard and what are HTML Decimal References?
Simply put Unicode is the set of numbers that identifies each and every character a computer can understand. In the words of the Unicode Consortium: "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." To see these numbers (in hexadecimal, i.e., base 16) and their associated glyphs you can download the Consortium's Unibook. It's sort of a pdf file but doesn't require Acrobat Reader. It's a rather large program at nearly 1 Mb and will run only on Windows NT, Me and XP.
The HTML decimal reference system is an instance of implementation of the Unicode standard for Hypertext Markup Language (HTML). For example the capital letter A has been assigned the Unicode number 65 (base 10). It's HTML decimal reference is A When your browser encounters that code it will print A on your screen. The ampersand and hash symbol prefix are mandatory, so is the semicolon suffix. Therefore, an HTML decimal reference is simply the Unicode number (in base 10) accompanied by the prefix and suffix mentioned.
So What?
Need Greek characters like pi (π), sigma (Σ), or omega (Ω) without relying on Microsoft-specific Greek true type fonts? Or, want an authentic en-dash (–) or em-dash (—) instead of two (--) or three (---) hyphens? Any use for this pair: ♀ ♂ ? Converting your francs to euro? Then you'll need this euro symbol. That'll be €10.00 please.
You can display all these symbols without relying on a any special font face. Incidentally, you can highlight and copy any of the foregoing characters and paste them on your email or website. Unfortunately, word processors are a different lot, so test if they can recognize the characters by copying from your email/browser and tacking them on your word processor document. If a rectangle or a question mark appears, or if nothing happens at all, then your software urgently needs to take a remedial language course.
Here's how to benefit from the Generator: Make sure you have an HTML enabled email software. Create a new message. Entitle it "Special Characters and Symbols" or some descriptive name that'll make sense to you. Run the generator and browse through the characters generated. When you come across a character you need or just fancy, highlight and copy both it and its decimal reference. Paste them onto the email. When you're done edit if need be, then save the 'message' in your drafts box. You can always add to this list in the future. Now, whenever you need a special character you can just go to the email and copy the character. Including the decimal reference is a good idea just in case you need to generate the character again. You can also print the page displayed by the Generator. Beware though–some pages may contain hundreds or thousands of lines!
The Catch
Unfortunately, not all browsers and email software will ever be equal. Old (pre-millennium) editions have probably poor or no support for decimal references beyond 255 (ASCII code). The latest versions of Internet Explorer and Netscape on the other hand have been hard coded to contain a very large set of symbols and special characters which can be accessed through decimal references. Both browsers for instance support the entire Chinese language. That's about 20,000 ideographs! But you'll have to install the Chinese language pack to access them.
As you will discover most of the categories listed below will not generate any characters. No browser to date displays the Cherokee and Tagbanwa set. For now that's how limited technology is. The bottom line is that the more decimal references your browser recognizes the more characters it can output to screen. In time as the Unicode Consortium adds more character sets and as browsers become more powerful the number of characters that can be displayed by the Generator will certainly grow.
Tips
- If you're reading this page on the web you can save it on your computer. It's self-contained. All the style sheet specs and JavaScripts are hard coded in this single html document. No external files are accessed.
- Netscape 6.2 provides a much greater set of characters than Internet Explorer 5.5 unless you download the language packs. Opera 6 has a good set as well.
- The fastest JavaScript processing browser is Internet Explorer followed by Netscape (takes 2 to 3 times as long as IE) followed by Opera (approx. 100 times as long as IE). Opera takes forever to interpret JavaScript. Note that browser speed is quite independent of computer hardware speed
The Unicode Character Generator
How to Use
- Choose a font face the characters will appear in by clicking one of the radio buttons below. Default font is Arial should you not pick any. Font faces in parentheses are alternates in case your computer does not support the named types.
- Choose a Unicode category by clicking either the name of the category or the buttons beside the category. Placing your mouse over the categories will highlight them. Older browsers will probably not support this feature.
- This page will be replaced by a new one containing three columns: the Unicode numbers (in hexadecimal), the HTML decimal references, and corresponding characters as rendered by your browser. Your browser will display either a blank space, a question mark or an empty rectangle for those character codes which it does not (yet) understand.
- Copy the characters you need by clicking and dragging your mouse over them then pressing Ctrl+C. Paste the selection (Ctrl+V) on a blank email and save.
- Click your browser's BACK or REFRESH / RELOAD button to return to this page.