This article describes a class to read Unicode character names. The names are read from the file UnicodeData.txt, which is one of the files that make
up the Unicode Character Database. A copy of the file is included with the demo project, but it can also be obtained
from the aforementioned link.
A demo application is also provided. The application allows the user to enter a decimal or hexadecimal code point, type a character, or search for a character name.
While the application is useful for touring the available names, it is overly complex for learning how to use the classes themselves.
UnicodeNames class, which this article describes, relies on some supporting classes that I described in previous articles.
Specifically, it uses the
CsvReader class described in the article Flexible CSV reader/writer
with progress reporting. It also uses the
PreviewTextBox control described in the article WPF TextBox
with PreviewTextChanged event for filtering.
To simply use the
UnicodeNames class described in this article, neither of the other articles is required for reading.
There are two name fields available in UnicodeNames.txt. The
Name field, located in the second column of this file, is used preferentially.
However, for control characters, the name is always
<control>. The alternative name field
Unicode_1_Name is used, in these instances, when it is available.
For example, code point 10 (decimal) has a name of
<control> and a
LINE FEED (LF). In this case, the latter is used.
Additionally, there are large ranges of code points that do not have names. However, the range has a named starting and ending code point. The names of these
two code points are suffixed with
UnicodeNames class will return this same name (without the suffix)
for all characters between these two code points.
Examples of these code points (in hexadecimal) are 100000 (
First), 100001, and 10FFFD (
Last). All are within the range
<Plane 16 Private Use>.
Using the Code
The code could not be much simpler to use. Creating an instance of the class is done as follows:
string path = "UnicodeData.txt";
UnicodeNames names = new UnicodeNames(path);
Getting the name for a character is as simple as the following:
int codePoint = 10;
string name = names[codePoint];
You should also plan on disposing of the
UnicodeNames instance when you are no longer going to use it, as follows:
Finally, if you have concerns about the time required to load the file into memory, there are methods and properties that make it easy to integrate with
RowEnd event is raised each time a row of the file is read. To further assist, the
ProgressPercentage property describes the percentage
of the file that has been loaded. Lastly, the
CancelAsync method is available to abort the load operation.
Points of Interest
This was my first full fledged foray into WPF development. I did cheat a little bit and use WinForms for its
AboutBox. To the WPF purists out there, I apologize.
- 8/3/2011 - The original version was uploaded.