Dynamicly add diacritics to characters

Question

4.33/5 (5 votes)

See more:

Hey friends!

I'm trying to create a function that adds diacritics to a character :

C#

private static char AddDiacritics(char Letter, char Character)
{
   return (modified letter);
}

For example, if I call the function AddDiacritics('e', '\"'); I want the function to return the character ë.
If I call the function AddDiacritics('u', '^'); I want the function to return û

Is there some kind of support in the framework for this?

Thanks,

Eduard

Posted 31-Mar-11 4:52am

Eduard Keilholz

Updated 31-Mar-11 6:07am

Dalek Dave

v2

Add a Solution

Comments

Dalek Dave 31-Mar-11 12:07pm

Edited for Readability

5 solutions

Solution 5

Although this is not really a solution to the problem, it might be usefull to know and might possibly even help build the table used by other solutions.

A string can be normalized to different form. In my case, I want to do the opposite (remove diacritics), and I have used:

string stFormD = fileName.Normalize(NormalizationForm.FormD);
for (int ich = 0; ich < stFormD.Length; ich++)
{
    char currentChar = stFormD[ich];

    System.Globalization.UnicodeCategory uc =
        System.Globalization.CharUnicodeInfo.GetUnicodeCategory(currentChar);

    if (uc != System.Globalization.UnicodeCategory.NonSpacingMark)
    {
        //...
    }
}

I don't know but maybe it would be possible to try for each possible Char to normalize it (the corresponding string) to the "D" form and then build information from that.

However, I haven't found a way to split character like œ to oe so in my case, I handle a few characters manually.

Posted 18-Jul-11 16:51pm

Philippe Mori

Updated 18-Jul-11 16:53pm

v2

Comments

Eduard Keilholz 19-Jul-11 6:00am

Good contribution! I like people actually reading the forum's history finding a way to they're solution and even more important, share!

Thanks a lot!

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Accepted Answer · 2011-03-31T05:06:00

Solution 1

AFAIK, there is no built in support for this.

Normally, it would be expected that the user keyboard takes care of it when the info is entered. If you have to retrospectively work it, it becomes quite fiddly, as you will have to take account of which characters can have which diacritic added.

Posted 31-Mar-11 5:06am

OriginalGriff

Comments

Olivier Levrey 31-Mar-11 11:34am

You are right. There is no support for this, and there is no algorithm able to deal with all cases. The only option is to deal with all the possibilities one by one...

Dalek Dave 31-Mar-11 12:08pm

True

Eduard Keilholz 1-Apr-11 2:37am

Thanks for your answer, my 5

BobJanova · Accepted Answer · 2011-03-31T05:44:00

I'm not aware of an inbuilt solution for this. Normally such a thing would be done by the IME as Griff says.

You probably need to create a lookup table, which is logically a two-dimensional map and which you probably would implement as Dictionary<char, Dictionary<char,char>>. Populate it in a static constructor like so:

Dictionary<char, Dictionary<char,char>> lookupTable = new Dictionary<char, Dictionary<char,char>>;
static MyClass(){
 lookupTable['a'] = new Dictionary<char,char>();
 lookupTable['a']['"'] = 'ä';
 lookupTable['a']['o'] = 'å';
 // ... etc

I'm not sure if there is even a standard set of diacritical marks from the ASCII set, and there's not much point using the ones in Unicode since if the user can enter those, they can enter the character they want directly. Those you need for ANSI (Western Europe) are grave (è), acute (é), umlaut (ä), circumflex (â), ring (å), slash (ø), caron (š), tilde (ñ) and cedila (ç); sensible characters to use to mark them would be `, ', ", ^, o, v, ~ and c. In the ANSI set there are also eth and thorn, which are not really diacritics of a normal character; oe and ae diphthongs; and the German ß, to think about.

Olivier Levrey · Accepted Answer · 2011-03-31T05:51:00

OriginalGriff is right there is no algorithm able to deal with all cases. The only option is to deal with all the possibilities one by one.

You could use an enum to list all diacritics, and dictionaries for each supported letter. For example:

C#

enum Diacritic
{
    //"é"
    Acute,
    //"è"
    Grave,
    //"ê"
    Circonflexe,
    ...
}

//supported letters for e
static Dictionary<Diacritic, char> letterE;
//supported letters for a
static Dictionary<Diacritic, char> letterA;
//supported letters
static Dictionary<char, Dictionary<Diacritic, char>> letters;

//call this function to initialize dictionaries
//or put its code it a static constructor
static void Init()
{
    //all possibilities for "e"
    letterE = new Dictionary<Diacritic, char>();
    letterE[Diacritic.Acute] = 'é';
    letterE[Diacritic.Grave] = 'è';
    letterE[Diacritic.Circonflexe] = 'ê';
    ...

    //all possibilities for "a"
    letterA = new Dictionary<Diacritic, char>();
    letterA[Diacritic.Grave] = 'à';
    letterA[Diacritic.Circonflexe] = 'â';
    ...

    //supported letters
    letters = new Dictionary<char, Dictionary<Diacritic, char>>();
    letters.Add('a', letterA);
    letters.Add('e', letterE);
    ...
}

//will throw a KeyNotFoundException if the requested character doesn't exist
static char AddDiacritics(char letter, Diacritic diacritic)
{
    return letters[letter][diacritic];
}

This code is not very elegant but I am not sure if a more elegant solution exists for that problem...

Albin Abel · Accepted Answer · 2011-03-31T06:07:00

Solution 4

As Original Griff said there is no built in support. As others said can do using a lookup table. Got this from net, a handy class

C#

public class DiacritMerger
{
    static readonly Dictionary<char, char> _lookup = new Dictionary<char, char>
                     {
                         {'\'', '\u0301'},
                         {'"', '\u0308'},
                         {'^', '\u0302'}
                     };
    public static string AddDiacritics(string asciiBase, string diacrits)
    {
        var combined = asciiBase.Zip(diacrits, (ascii, diacrit) => DiacritVersion(diacrit, ascii));
        return new string(combined.ToArray());
    }
    private static char DiacritVersion(char diacrit, char character)
    {
        char combine;
        return _lookup.TryGetValue(diacrit, out combine) ? new string(new[] { character, combine }).Normalize()[0] : character;
    }
}

Then can use like

C#

MessageBox.Show(DiacritMerger.AddDiacritics("u", "^"));
MessageBox.Show(DiacritMerger.AddDiacritics("e", "\""));

Posted 31-Mar-11 6:07am

Albin Abel

Comments

Dalek Dave 31-Mar-11 12:09pm

good Answer.

Albin Abel 31-Mar-11 12:24pm

Thanks Dalek Dave

Eduard Keilholz 1-Apr-11 2:37am

Thanks a lot! My 5

Albin Abel 1-Apr-11 3:37am

You are welcome, Thanks too Eduard Keilholz

Eduard Keilholz 4-Apr-11 8:10am

Wow, I implemented this code, it works like a charm! Thanks once more!

Albin Abel 4-Apr-11 9:44am

You are most welcome Eduard Keilholz. Let the credit goes to the original author :)

Antonio Barros 26-Mar-16 10:42am

Hi, Albin Abel.
Thank you for your post. It was useful to undertand how we can resolve this problem. I've tried to add yo your class the use of tilde above a "w", but I couldn't obtain the result, I don't know why. I only add the last line of the following:

static readonly Dictionary<char, char=""> _lookup = new Dictionary<char, char="">
{
{'\'', '\u0301'},
{'"', '\u0308'},
{'^', '\u0302'},
{'˜', '\u02DC'}
};
and to test I add the last line in the following:
MessageBox.Show(DiacriticMerger.AddDiacritics("u", "^"));
MessageBox.Show(DiacriticMerger.AddDiacritics("e", "\""));
MessageBox.Show(DiacriticMerger.AddDiacritics("w", "˜"));
But the message only displays "w" without the tilde.
What could be the problem? Thank you for all.

António Barros

Dynamicly add diacritics to characters

5 solutions

Solution 1

Solution 2

Solution 3

Solution 4

Solution 5

Add your solution here

Preview 0