Click here to Skip to main content
15,892,674 members
Please Sign up or sign in to vote.
4.33/5 (5 votes)
See more:
Hey friends!

I'm trying to create a function that adds diacritics to a character :
C#
private static char AddDiacritics(char Letter, char Character)
{
   return (modified letter);
}


For example, if I call the function AddDiacritics('e', '\"'); I want the function to return the character ë.
If I call the function AddDiacritics('u', '^'); I want the function to return û

Is there some kind of support in the framework for this?

Thanks,

Eduard
Posted
Updated 31-Mar-11 6:07am
v2
Comments
Dalek Dave 31-Mar-11 12:07pm    
Edited for Readability

AFAIK, there is no built in support for this.

Normally, it would be expected that the user keyboard takes care of it when the info is entered. If you have to retrospectively work it, it becomes quite fiddly, as you will have to take account of which characters can have which diacritic added.
 
Share this answer
 
Comments
Olivier Levrey 31-Mar-11 11:34am    
You are right. There is no support for this, and there is no algorithm able to deal with all cases. The only option is to deal with all the possibilities one by one...
Dalek Dave 31-Mar-11 12:08pm    
True
Eduard Keilholz 1-Apr-11 2:37am    
Thanks for your answer, my 5
I'm not aware of an inbuilt solution for this. Normally such a thing would be done by the IME as Griff says.

You probably need to create a lookup table, which is logically a two-dimensional map and which you probably would implement as Dictionary<char, Dictionary<char,char>>. Populate it in a static constructor like so:
Dictionary<char, Dictionary<char,char>> lookupTable = new Dictionary<char, Dictionary<char,char>>;
static MyClass(){
 lookupTable['a'] = new Dictionary<char,char>();
 lookupTable['a']['"'] = 'ä';
 lookupTable['a']['o'] = 'å';
 // ... etc


I'm not sure if there is even a standard set of diacritical marks from the ASCII set, and there's not much point using the ones in Unicode since if the user can enter those, they can enter the character they want directly. Those you need for ANSI (Western Europe) are grave (è), acute (é), umlaut (ä), circumflex (â), ring (å), slash (ø), caron (š), tilde (ñ) and cedila (ç); sensible characters to use to mark them would be `, ', ", ^, o, v, ~ and c. In the ANSI set there are also eth and thorn, which are not really diacritics of a normal character; oe and ae diphthongs; and the German ß, to think about.
 
Share this answer
 
v2
Comments
Olivier Levrey 31-Mar-11 11:55am    
Have my 5. I was writing my solution at the same time but you were faster than me ;)
Dalek Dave 31-Mar-11 12:08pm    
Good answer
Eduard Keilholz 1-Apr-11 2:35am    
Thanks for helping, my 5
El_Codero 6-Mar-12 7:22am    
Elegant solution, my 5!
OriginalGriff is right there is no algorithm able to deal with all cases. The only option is to deal with all the possibilities one by one.

You could use an enum to list all diacritics, and dictionaries for each supported letter. For example:

C#
enum Diacritic
{
    //"é"
    Acute,
    //"è"
    Grave,
    //"ê"
    Circonflexe,
    ...
}

//supported letters for e
static Dictionary<Diacritic, char> letterE;
//supported letters for a
static Dictionary<Diacritic, char> letterA;
//supported letters
static Dictionary<char, Dictionary<Diacritic, char>> letters;

//call this function to initialize dictionaries
//or put its code it a static constructor
static void Init()
{
    //all possibilities for "e"
    letterE = new Dictionary<Diacritic, char>();
    letterE[Diacritic.Acute] = 'é';
    letterE[Diacritic.Grave] = 'è';
    letterE[Diacritic.Circonflexe] = 'ê';
    ...

    //all possibilities for "a"
    letterA = new Dictionary<Diacritic, char>();
    letterA[Diacritic.Grave] = 'à';
    letterA[Diacritic.Circonflexe] = 'â';
    ...

    //supported letters
    letters = new Dictionary<char, Dictionary<Diacritic, char>>();
    letters.Add('a', letterA);
    letters.Add('e', letterE);
    ...
}

//will throw a KeyNotFoundException if the requested character doesn't exist
static char AddDiacritics(char letter, Diacritic diacritic)
{
    return letters[letter][diacritic];
}


This code is not very elegant but I am not sure if a more elegant solution exists for that problem...
 
Share this answer
 
v2
Comments
Dalek Dave 31-Mar-11 12:08pm    
Good Call.
Eduard Keilholz 1-Apr-11 2:35am    
Thanks for helping, my 5
As Original Griff said there is no built in support. As others said can do using a lookup table. Got this from net, a handy class

C#
public class DiacritMerger
{
    static readonly Dictionary<char, char> _lookup = new Dictionary<char, char>
                     {
                         {'\'', '\u0301'},
                         {'"', '\u0308'},
                         {'^', '\u0302'}
                     };
    public static string AddDiacritics(string asciiBase, string diacrits)
    {
        var combined = asciiBase.Zip(diacrits, (ascii, diacrit) => DiacritVersion(diacrit, ascii));
        return new string(combined.ToArray());
    }
    private static char DiacritVersion(char diacrit, char character)
    {
        char combine;
        return _lookup.TryGetValue(diacrit, out combine) ? new string(new[] { character, combine }).Normalize()[0] : character;
    }
}



Then can use like

C#
MessageBox.Show(DiacritMerger.AddDiacritics("u", "^"));
MessageBox.Show(DiacritMerger.AddDiacritics("e", "\""));
 
Share this answer
 
Comments
Dalek Dave 31-Mar-11 12:09pm    
good Answer.
Albin Abel 31-Mar-11 12:24pm    
Thanks Dalek Dave
Eduard Keilholz 1-Apr-11 2:37am    
Thanks a lot! My 5
Albin Abel 1-Apr-11 3:37am    
You are welcome, Thanks too Eduard Keilholz
Eduard Keilholz 4-Apr-11 8:10am    
Wow, I implemented this code, it works like a charm! Thanks once more!
Although this is not really a solution to the problem, it might be usefull to know and might possibly even help build the table used by other solutions.

A string can be normalized to different form. In my case, I want to do the opposite (remove diacritics), and I have used:

string stFormD = fileName.Normalize(NormalizationForm.FormD);
for (int ich = 0; ich < stFormD.Length; ich++)
{
    char currentChar = stFormD[ich];

    System.Globalization.UnicodeCategory uc =
        System.Globalization.CharUnicodeInfo.GetUnicodeCategory(currentChar);

    if (uc != System.Globalization.UnicodeCategory.NonSpacingMark)
    {
        //...
    }
}


I don't know but maybe it would be possible to try for each possible Char to normalize it (the corresponding string) to the "D" form and then build information from that.

However, I haven't found a way to split character like œ to oe so in my case, I handle a few characters manually.
 
Share this answer
 
v2
Comments
Eduard Keilholz 19-Jul-11 6:00am    
Good contribution! I like people actually reading the forum's history finding a way to they're solution and even more important, share!

Thanks a lot!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900