Click here to Skip to main content
15,881,027 members
Articles / Programming Languages / C#

NHunspell - Hunspell for the .NET platform

Rate me:
Please Sign up or sign in to vote.
4.73/5 (23 votes)
21 Jul 2014LGPL31 min read 270.7K   2.9K   60   107
The spell checking and hyphenation features of OpenOffice for the .NET platform.

Introduction

I was looking for a good spell checker and hyphenation library for .NET, and I found the free (LGPL licensed) Hunspell spell checker and Hyphen libraries used in OpenOffice. Hunspell wasn't available for the .NET platform. So, I decided to write a wrapper/port. It is quite nice that a lot of the OpenOffice dictionaries are LGPL licensed too and can be used in proprietary applications.

Interop code to the native Hunspell functions

I used Managed C++ to write the wrapper/port, because I could use the original source code of Hunspell and Hyphen. It was quite nice to write the interop code between managed classes and the unmanaged Hunspell and Hyphen libraries. The original source code is almost unchanged, so that new versions of Hunspell or Hyphen can be easily adopted. Hunspell and Hyphen use unmanaged memory functions, so I had to implement the IDisposable interface and used this pattern to free unmanaged memory early.

Because Hunspell uses UTF8 coding, I had to provide conversion functions from/to UTF8:

MC++
char * NHunspell::MarshalHelper::AllocUTF8FromString(String ^value)
{
    array<Byte> ^ byteArray = Encoding::UTF8->GetBytes(value);
    int size = Marshal::SizeOf(byteArray[0]) * (byteArray->Length + 1);
    IntPtr buffer = Marshal::AllocHGlobal(size);
    Marshal::Copy(byteArray, 0, buffer, byteArray->Length);
    Marshal::WriteByte(buffer, size - 1, 0);
    return (char *) buffer.ToPointer();
}

String ^ NHunspell::MarshalHelper::AllocStringFromUTF8( char * value )
{
    int size = strlen(value);
    array<Byte> ^ byteArray = gcnew array<Byte>(size);
    Marshal::Copy(IntPtr(value), byteArray, 0, size);
    return Encoding::UTF8->GetString(byteArray);
}

Another big thing is to handle the unmanaged memory. I implement destructors and finalizers to deal with this:

MC++
NHunspell::Hunspell::~Hunspell()
{
    this->!Hunspell();
}

NHunspell::Hunspell::!Hunspell()
{
    if( handle != 0 )
    {
        delete handle;
        handle = 0;
    }
}

bool NHunspell::Hunspell::IsDisposed::get()
{
    return handle == 0;
}

NHunspell spell checking and hyphenation sample

This is a short demo of how to use NHunspell for spell checking, suggestions, and hyphenation:

C#
Console.WriteLine("NHunspell functions and classes demo");

Console.WriteLine("");
Console.WriteLine("Spell Check with with Hunspell");

// Important: Due to the fact Hunspell will use unmanaged memory
// you have to serve the IDisposable pattern
// In this block of code this is be done
// by a using block. But you can also call hunspell.Dispose()
using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
{
    Console.WriteLine("Check if the word 'Recommendation' is spelled correct"); 
    bool correct = hunspell.Spell("Recommendation");
    Console.WriteLine("Recommendation is spelled " + 
              (correct ? "correct" : "not correct"));

    Console.WriteLine("");
    Console.WriteLine("Make suggestions for the word 'Recommendatio'");
    List<string> suggestions = hunspell.Suggest("Recommendatio");
    Console.WriteLine("There are " + suggestions.Count.ToString() + 
                      " suggestions" );
    foreach (string suggestion in suggestions)
    {
        Console.WriteLine("Suggestion is: " + suggestion );
    }
}

Console.WriteLine("");
Console.WriteLine("Hyphenation with Hyph");

// Important: Due to the fact Hyphen will use unmanaged
// memory you have to serve the IDisposable pattern
// In this block of code this is be done by a using block.
// But you can also call hyphen.Dispose()
using (Hyphen hyphen = new Hyphen("hyph_en_us.dic"))
{
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'"); 
    HyphenResult hyphenated = hyphen.Hyphenate("Recommendation");
    Console.WriteLine("'Recommendation' is hyphenated as: " + 
                      hyphenated.HyphenatedWord ); 
}

Console.WriteLine("");
Console.WriteLine("Press any key to continue...");
Console.ReadKey();

Because Hunspell is native C++ code, you must include the correct assembly for your platform. On x86 platforms (32 bit), use the NHunspell.dll from the X86 folder. On X64 platforms, use the NHunspell.dll from the X64 folder.

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)


Written By
CEO
Germany Germany
I'm a Senior Software Consultant
Thomas Maierhofer Consulting

Comments and Discussions

 
GeneralRe: NHunspell ExtenderProvider Pin
William Winner26-Jan-10 12:50
William Winner26-Jan-10 12:50 
GeneralRe: NHunspell ExtenderProvider Pin
William Winner26-Jan-10 13:15
William Winner26-Jan-10 13:15 
NewsTwo new NHunspell Articles for Version 0.9.2 and above are available Pin
Thomas Maierhofer (Tom)16-Dec-09 1:54
Thomas Maierhofer (Tom)16-Dec-09 1:54 
GeneralProblem when loading a dictionary... Pin
Knight_Rider23-Nov-09 6:10
Knight_Rider23-Nov-09 6:10 
GeneralRe: Problem when loading a dictionary... Pin
Thomas Maierhofer (Tom)23-Nov-09 6:30
Thomas Maierhofer (Tom)23-Nov-09 6:30 
GeneralRe: Problem when loading a dictionary... Pin
Knight_Rider23-Nov-09 9:06
Knight_Rider23-Nov-09 9:06 
GeneralRe: Problem when loading a dictionary... Pin
Thomas Maierhofer (Tom)23-Nov-09 19:37
Thomas Maierhofer (Tom)23-Nov-09 19:37 
GeneralRe: Problem when loading a dictionary... [modified] Pin
Aleksei Karimov13-Feb-10 9:13
Aleksei Karimov13-Feb-10 9:13 
Hello! Firstly I want to say that you made great work!

So, after digging into the code I realized that during convertion WCtoMB and vice versa the original data is damaged. At least I noticed such behavior in the following function:

inline char * AllocMultiByteBuffer( wchar_t * unicodeString )
{
	size_t buffersize = WideCharToMultiByte(CP_UTF8,0,unicodeString,-1,0,0,0,0); 
	char * buffer = (char *) malloc( buffersize );
	WideCharToMultiByte(CP_UTF8,0,unicodeString,-1,buffer,buffersize,0,0);
	return buffer;
}


I am using russian Windows 7 x64 and so all the russian letters are transformed to something really unreadable. Now I am looking for a workaround and it seems that the easiest way is to fall back to the ANSI encoding instead of UNICODE. At least this is the first that I thought about.

So, maybe this information will be usefull to the programmers whoe faced the same issue.

Good luck!


[EDIT]

I found solution. At least this worked for me. In the function above I replaced CP_UTF8 with CP_ACP, which stands for ANSI CODE PAGE.

Regards,
Alex
Alex KraS
modified on Saturday, February 13, 2010 3:22 PM

GeneralNHunspell 0.9.2 is relesed - Thesaurus is available Pin
Thomas Maierhofer (Tom)27-Oct-09 21:02
Thomas Maierhofer (Tom)27-Oct-09 21:02 
GeneralNHunspell as ASP.NET App: http://www.spell-check-thesaurus.com/ Pin
Thomas Maierhofer (Tom)26-Oct-09 10:09
Thomas Maierhofer (Tom)26-Oct-09 10:09 
GeneralNHunspell Development Status Report: Stemming, Thesaurus, multi Threading, other improvements Pin
Thomas Maierhofer (Tom)20-Oct-09 2:10
Thomas Maierhofer (Tom)20-Oct-09 2:10 
GeneralDictionaries Pin
Thomas Maierhofer (Tom)20-Oct-09 2:03
Thomas Maierhofer (Tom)20-Oct-09 2:03 
QuestionSimulating OpenOffice Pin
Anthony Daly22-Jun-09 4:16
Anthony Daly22-Jun-09 4:16 
AnswerRe: Simulating OpenOffice Pin
Thomas Maierhofer (Tom)24-Jun-09 23:58
Thomas Maierhofer (Tom)24-Jun-09 23:58 
GeneralRe: Simulating OpenOffice Pin
Anthony Daly25-Jun-09 4:21
Anthony Daly25-Jun-09 4:21 
AnswerRe: Simulating OpenOffice [modified] Pin
William Winner3-Feb-10 9:02
William Winner3-Feb-10 9:02 
GeneralRe: Simulating OpenOffice Pin
Anthony Daly3-Feb-10 9:04
Anthony Daly3-Feb-10 9:04 
GeneralNHunspell 0.6.2 is released - Location of the Hunspellx86.dll and Hunspellx64.dll files Pin
Thomas Maierhofer (Tom)28-May-09 22:22
Thomas Maierhofer (Tom)28-May-09 22:22 
GeneralRe: NHunspell 0.6.2 is released - Location of the Hunspellx86.dll and Hunspellx64.dll files Pin
CCPOSTON11-Aug-09 11:39
CCPOSTON11-Aug-09 11:39 
GeneralRe: NHunspell 0.6.2 is released - Location of the Hunspellx86.dll and Hunspellx64.dll files Pin
Thomas Maierhofer (Tom)11-Aug-09 21:51
Thomas Maierhofer (Tom)11-Aug-09 21:51 
GeneralRe: NHunspell 0.6.2 is released - Location of the Hunspellx86.dll and Hunspellx64.dll files Pin
CCPOSTON12-Aug-09 4:24
CCPOSTON12-Aug-09 4:24 
GeneralRe: NHunspell 0.6.2 is released - Location of the Hunspellx86.dll and Hunspellx64.dll files Pin
W4Rl0CK4729-Sep-09 21:05
W4Rl0CK4729-Sep-09 21:05 
GeneralNHunspell 0.6.0 released Pin
Thomas Maierhofer (Tom)27-Apr-09 0:53
Thomas Maierhofer (Tom)27-Apr-09 0:53 
QuestionRe: NHunspell 0.6.0 released Pin
jazzmoney19-May-09 12:43
jazzmoney19-May-09 12:43 
AnswerRe: NHunspell 0.6.0 released Pin
Thomas Maierhofer (Tom)24-May-09 23:50
Thomas Maierhofer (Tom)24-May-09 23:50 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.