Click here to Skip to main content
15,885,244 members
Articles / Desktop Programming / Windows Forms

Class to read Unicode character names and a tool to display/search them

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
3 Aug 2011CPOL3 min read 19K   571   5   1
A class to read Unicode character names and a tool to display/search them.

UnicodeNames.png

Introduction

This article describes a class to read Unicode character names. The names are read from the file UnicodeData.txt, which is one of the files that make up the Unicode Character Database. A copy of the file is included with the demo project, but it can also be obtained from the aforementioned link.

A demo application is also provided. The application allows the user to enter a decimal or hexadecimal code point, type a character, or search for a character name. While the application is useful for touring the available names, it is overly complex for learning how to use the classes themselves.

Background

The UnicodeNames class, which this article describes, relies on some supporting classes that I described in previous articles. Specifically, it uses the CsvReader class described in the article Flexible CSV reader/writer with progress reporting. It also uses the PreviewTextBox control described in the article WPF TextBox with PreviewTextChanged event for filtering.

To simply use the UnicodeNames class described in this article, neither of the other articles is required for reading.

There are two name fields available in UnicodeNames.txt. The Name field, located in the second column of this file, is used preferentially. However, for control characters, the name is always <control>. The alternative name field Unicode_1_Name is used, in these instances, when it is available.

For example, code point 10 (decimal) has a name of <control> and a Unicode_1_Name of LINE FEED (LF). In this case, the latter is used.

Additionally, there are large ranges of code points that do not have names. However, the range has a named starting and ending code point. The names of these two code points are suffixed with First and Last. The UnicodeNames class will return this same name (without the suffix) for all characters between these two code points.

Examples of these code points (in hexadecimal) are 100000 (First), 100001, and 10FFFD (Last). All are within the range <Plane 16 Private Use>.

Using the Code

The code could not be much simpler to use. Creating an instance of the class is done as follows:

C#
string path = "UnicodeData.txt";
UnicodeNames names = new UnicodeNames(path);
names.LoadFile();

Getting the name for a character is as simple as the following:

C#
int codePoint = 10; // LINE FEED (LF)
string name = names[codePoint];

You should also plan on disposing of the UnicodeNames instance when you are no longer going to use it, as follows:

C#
names.Dispose();

Finally, if you have concerns about the time required to load the file into memory, there are methods and properties that make it easy to integrate with a BackgroundWorker component.

The RowEnd event is raised each time a row of the file is read. To further assist, the ProgressPercentage property describes the percentage of the file that has been loaded. Lastly, the CancelAsync method is available to abort the load operation.

Points of Interest

This was my first full fledged foray into WPF development. I did cheat a little bit and use WinForms for its AboutBox. To the WPF purists out there, I apologize.

History

  • 8/3/2011 - The original version was uploaded.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United States United States
Eric is a Senior Software Engineer with 30+ years of experience working with enterprise systems, both in the US and internationally. Over the years, he’s worked for a number of Fortune 500 companies (current and past), including Thomson Reuters, Verizon, MCI WorldCom, Unidata Incorporated, Digital Equipment Corporation, and IBM. While working for Northeastern University, he received co-author credit for six papers published in the Journal of Chemical Physics. Currently, he’s enjoying a little time off to work on some of his own software projects, explore new technologies, travel, and write the occasional article for CodeProject or ContentLab.

Comments and Discussions

 
QuestionNeed Help UNICODE Pin
Nazim Iqbal5-Oct-12 13:29
Nazim Iqbal5-Oct-12 13:29 
XML
Hi


I need your help in,
I am working on a project, MYSQL and PHP based, with WAMP SERVER
I have a field storytext, (data collected in URDU) detail are:

<b>Field               Type         Collation                    Null        Default
storytext          text           utf8_unicode_ci            Yes        NULL</b>

Retrieving URDU Data from MY SQL 5.5.8 running on wamp server with PHP is OK,

And the soft ware is only for Internal use not web base!
------------------------------------------------------------------------------------------
<b>HERE IS THE PROBLEM:</b>

One section of it is designed on Visual Studio.NET 2010 in which

I have a datagrid, on clicking any row I got the TEXT in details on text box,

which show <b>URDU TEXT</b>, from data base here is the problem with <b>TEXT BOX and DATAGRID.</b>

I got this..........

<b>DESP. ITEM

پروگرام رانا مب
شر ايٹ پرائم ٹائم ميں
گ٠تگو کرتے Û ÙˆØ¦Û’ وزير٠اعظم Ú©Û’
وکيل Ú†ÙˆÛ Ø¯Ø±ÙŠ اعتزاز احسن
کا Ú©Û Ù†Ø§ تھا Ú©Û ØµØ¯Ø± Ú©Û’ Ø¹Û Ø¯Û’
Ú©Ùˆ استشني حاصل Û Û’
وزير٠اعظم Ø</b>
-------------------------------------------------------------------------------------------------------
I also use in visual studio

<b>cnString = "datasource=localhost;username=root;password=;database=spidernews;charset=utf8;"</b>

How can I solve this problem,
can you help me out !
<b>URDU NOT SHOWING,  </b>

<pre lang="vb">Here is Whole CODE of Visual Studio, I am using:
------------------------------------------------------------------------------------------------------------------------</pre>


<pre lang="text">
Imports MySql.Data.MySqlClient

Public Class Form1

    Dim conn As Common.DbConnection
    Dim da As Common.DbDataAdapter
    Dim ds As DataSet = New DataSet
    Dim cnString As String
    Dim sqlQRY As String
    'Dim dt As DataTable
    'Dim strBuilder As Char




    '''' SCROLLONG TEXT START
    Dim scrollingText As String = "Spider NEWS "
    Dim txtStr(scrollingText.Length - 1) As String
    Dim txtPos As Integer = -1

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

        'strBuilder.ToString ("tblnews" & char(34))


        cnString = "datasource=localhost;username=root;password=;database=spidernews;charset=utf8;"

        sqlQRY = "SET NAMES UTF8; SET CHARACTER SET UTF8;Select slugid,slug,storytext,storyduration,segmentdescription,prioritydescription,mosstatus from tblnews"
        'sqlQRY = "SET NAMES UTF8; SET CHARACTER SET latin1;Select slugid,slug,storytext,storyduration,segmentdescription,prioritydescription,mosstatus from tblnews"
        conn = New MySqlConnection(cnString)

        Try
            conn.Open()
            da = New MySqlDataAdapter(sqlQRY, conn)
            Dim cb As MySqlCommandBuilder = New MySqlCommandBuilder(da)

            da.Fill(ds, "tblnews")

            DataGridView1.DataSource = ds
            DataGridView1.DataMember = "tblnews"
            '''''


        Catch ex As Common.DbException
            MsgBox(ex.ToString)
        Finally
            conn.Close()
        End Try

        '''' SCROLLONG TEXT START
        For idx As Integer = 0 To UBound(txtStr)
            Dim workedString As String = ""
            workedString = scrollingText.Substring(idx) & " " & scrollingText.Substring(0, idx)
            txtStr(idx) = workedString
        Next
        Timer1.Interval = 5000
        Timer1.Enabled = True
        Timer1.Start()
        '''' SCROLLONG TEXT END
        Label2.Text = System.DateTime.Now

    End Sub

    Private Sub Save_Click(ByVal sender As System.Object, ByVal e As System.EventArgs)

        da.Update(ds, "" + TextBox1.Text + "")
        MsgBox("Data sent", MsgBoxStyle.OkOnly, "Sucess")

    End Sub

    Private Sub DataGridView1_CellContentClick(ByVal sender As System.Object, ByVal e As System.Windows.Forms.DataGridViewCellEventArgs) Handles DataGridView1.CellContentClick

        Dim i, j As Integer
        i = DataGridView1.CurrentRow.Index
        RichTextBox1.Text = DataGridView1.Item(2, i).Value

    End Sub

    Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer1.Tick

        Dim i, j As Integer
        i = DataGridView1.CurrentRow.Index

        '''' SCROLLONG TEXT START
        txtPos += 1
        Dim timerStr As String
        timerStr = txtStr(txtPos)
        RichTextBox1.Text = DataGridView1.Item(2, i).Value

        Label1.Text = timerStr
        If txtPos = UBound(txtStr) Then txtPos = -1
        '''' SCROLLONG TEXT START

    End Sub

End Class



</pre>

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.