Click here to Skip to main content
15,113,873 members
Articles / DevOps
Article
Posted 16 May 2021

Stats

4.2K views
35 downloads
3 bookmarked

Dlús : Irish Language Word Processor

Rate me:
Please Sign up or sign in to vote.
4.98/5 (10 votes)
19 May 2021CPOL29 min read
A fully functional Irish Language Word Processor written in C#
Dlús was written in C# and incorporates Ireland's best on-line Dictionary, Foclóir Gaeilge-Béarla (Ó Dónaill, 1977) and the great Irish Language resource Líonra Séimeantach na Gaeilge to provide a fast and easy to use Thesaurus.

Image 1

Introduction

Although I myself do not speak Gaeilic, I was proud to be invited by Seanán Ó Coistín to build this app for him after he had read my previous article Creative Writer's Word Processor. (You can't blame him for the bad jokes I may write in this article but you should, however, thank him for conceiving of the plan that resulted in this awesome Irish Language Word Processor.) Since I had written several other similar projects before, I was happy to inform him that I could bring it to life (without promising the moon). We drew up an outline of what he wanted and I had to cut out any ideas he might have had about grammatically correct sentences before they germinated (the fact that my A.I. coding experience is limited to a single Mini-Max Tic-Tac-Toe experiment made my lack of Gaeilic Language Skills moot on that matter). My only condition for agreeing to help promote the Irish Language by writing this app was that it must be made free and available to open-source developers when I'm done.

So, here we are.

Launching the App

When you download Dlús intending to launch the app, you'll have to find the executable file on your hard-drive. Download the files "Dlús Word processor", then extract them onto your hard-drive. Once you have done this, you will have to locate the app's executable file(the one with the Irish flag painted face and the .exe file extension you see highlighted in the image below) and double click on it. I'd encourage you to select this file in your Windows File Manager, then right click your mouse and select 'create shortcut'. This will place an icon on your Desktop which you can then more easily use to launch the app.

The image below should help you find the executable file on your hard-drive. Just remember that in this example, the downloaded files were extracted directly in the C:\ root directory and this may not reflect where you have chosen to extract your files.

Image 2

How to Write and Edit Text in Dlús: A User's Guide

Main Working Area

The Dlús word processor was written with the average-to-novice computer user in mind. We did not want to encumber the user with too many options that would clutter the work area. For this reason then, the interface is easy-to-use and fancy free.

All the tools you need to edit your text are available in the Context-Menu (shown below) which you can summon by clicking your right-mouse-button in the main work-area.

Image 3

In the image below, you can see the tool bar and the ruler located above the working text area. Both are as similar to existing word-processors as imaginable making the transition from MS-Word to Dlús as painless as possible. The ToolBar has all the important commands from Spell-Check and Thesaurus to File Options and Font-Styles right there for you to use. The only difference between Dlús and other writing apps is what makes Dlús such a great application for writing in Irish, the Gaeilge Word List found on the right of the main working area.

Image 4

The word-list will find the exact word you are looking for as you type in the main work area. Once you find the word you need in the Word-List, you can then insert that word into your text by clicking on the word in this Word-List when your mouse cursor takes the form of the 'word insert' icon Image 5. If the word you've typed is a complete word that appears in the word-list, its definition will appear in the top-right corner of the app after you've stopped typing for a brief pause. Alternately, should you wish to read the definition of a word that you scrolled to in the list using the mouse-wheel or the Word-List scroll-bar, just click on that word when your mouse cursor takes this appearance Image 6 over the word you wish to explore. If there's a word you want to look up that's already written in your text, you can move your mouse cursor over it and click on the mouse-button there to summon that word's definition. These definitions are all taken from the Irish government's Teanglann.ie website, Foclóir Gaeilge-Béarla and can be accessed on your home computer even when you're off-line.

Typing Accented Vowels

You can type in the accented vowels of the Irish language even if your personal computer's keyboard does not provide you with easy access to these special characters. Although European keyboards are more likely to provide this feature, North-American and many other keyboards around the world do not. For this reason then, since Dlús was intended to be accessible to all users, the application enables you to easily include all Irish accented vowels in your writing projects. To type in an accented vowel into your text you simply precede that vowel with the back-slash character and the word-processor will recognize the two-character combination of the 'back-slash + vowel' and replace both these characters in your text with the single accented vowel you intended to write.

e.g., if you wanted to write the word

you would actually type T\a

and immediately see this text be replaced with the correctly accented word .

Using the Thesaurus

The Thesaurus also has a quick-access feature. All you need to do is select a word in your text by putting the textbox's cursor on it and then press the Ctrl-T key combination. You can also use the Thesaurus button Image 7 in the Tool-Bar above the ruler near the top of your workspace . The word's Thesaurus entry may be derived from the root spelling of the word you clicked and will appear in the area at the bottom right corner of the app. Note, however, that some words do not have entries in the thesaurus and will not appear in the Teasáras box. If you've found a word in the Teasáras which you'd like to insert into your text, you can click on it in the Teasáras box and it will appear where the textbox cursor is, right in your work-project where you need it.

Spell-Checking

Spell checking your work is very important and easy to do when you're working with Dlús. Whenever you're ready to spell-check your work, press F7, click the Spell-Check icon Image 8 in the ToolBar or use the right-mouse button to call up the Context Menu and select the Spell-Checker option there. You'll have options like Ignore, Ignore All, Add or Replace to help you along. Effort was made to re-create common existing spell-checker products to make it easier for the user to jump right in and be comfortable with the familiarity of this tool's appearance and functionality. Spell-Checker will quit automatically when it has gone through your text, so if you call the Spell-Checker and there are no detected spelling errors in your text, it will simply quit and let you get on with your work.

N.B. Whenever you 'Add' a word to the Spell-Checker, what you're really doing is adding that word to the Word-List that appears on the right of the main working area of your screen. The Spell-Checker only accepts the words that are in its Word-List as correctly spelled words. All these words are derived from the dictionary and have explanatory 'Word-info' tags attached to them explaining the root of the word as well as the type of word it is but when you 'Add' a word using the Spell-Checker, Dlús will not have any word-information for it even though it is accepted as a correctly spelled word and does appear in the Word-List.

As you can see in the example below:

Image 9

The word 'teaghaisí' is the plural form of the head-word 'teaghais'. You can see that information in both the Dictionary entry above and the light-blue 'Word-Info' tag attached to it in the Word-List below.

Editing the Word List

Should you find an incorrect entry in Dlús's Word-List you want to change, correct or delete, you simply click the Lexicon Editor icon Image 10 in the ToolBar above. Whenever you choose to do that, a new form will appear with a green (sometimes red) textbox on the upper-left side of the screen and the word-list below it. You'll have two check boxes asking whether or not you want to see deleted information. This is important because, when you check these boxes words that have been removed from the Word-List will appear in red in their alphabetical ranking with a line struck across them. Each word in the word-list here can be removed from the list of correctly spelled words by selecting it and then un-checking the checkbox beside its spelling in the green text-box above the list. In this way, you can remove words from the Word-List if you feel they are incorrectly spelled or inappropriate entries. You can also use the Lexicon Editor to add words by typing them in the green text box near the top left (note that the color of this textbox will change from green to red depending on whether or not that word is already a correctly spelled word). When you're satisfied that the new word entry you've typed in this textbox is spelled correctly (it will be in red if it has not yet been included in the word list), check the small box beside it (the box will turn to green) and it will then be added to the Word-List and the Spell-Checker.

Image 11

Similarly, in the middle of this form, you will see the list of Word-Info tags associated with this particular spelling of this particular word. The 'New' button to the right of the 'View Deleted Word-Info' check-box will add a new 'un-checked' entry to the Word-Info list for this word. You can use it to add a new 'Word-Info' tag for this word then fill out the information appropriate for that spelling by selecting the FGB file in the app's Dictionary directories. The selected file will be the 'root word' (or 'head word') for this particular spelling and will therefore be the definition that will appear in the top-right Dictionary area of Dlús's main form when you select this word in the Word-List. The combo-box to the right of the Word-Info tag lets you select the kind of word it is and, when you're sure the information is correct, check the box on the right and the word's new info-tag that you just created and it will be added to the Word-List.

About the Code: Stuff Users Don't Need to Care About

Why I didn't use HunSpell (or Gaelspell)

Gaelspell hates me. There, I said it. It just hates me. I downloaded a dozen different versions and tried all kinds of files and they all told me to 'Go F!#k' myself. It was really annoying. I don't know why this happened, as I have used HunSpell in several different projects and it has never given me any issues before.

So, I made my own Spell-Checker.

Initially, I wanted to use the popular EXTENDED Version of Extended Rich Text Box (RichTextBoxEx) which has all the features you need to build a word-processor but I couldn't figure out how to change the language files and after I had created my own spell-checker using the on-line dictionary, which I'm going to tell you about in a minute, this Extended Rich Textbox suddenly had systemic heart-failure and consistently rendered the ghost in my app's 'program.cs' file like a reincarnating Brahman doll with cheap Dollarama batteries. This was most disappointing as it made the crash non-debuggable without venturing into the .DLL's Visual Basic source code. Something, I was loathe to do.

So, I made my own Extended RichTextBox.

Then, I was hungry. I hadn't eaten anything for hours except for a dried Wonderbread crust. The local Subway restaurant was closed and the supermarket was far, far away ...

So, I made my own Sandwich.

The Dictionary - Foclóir Gaeilge-Béarla (Ó Dónaill, 1977)

Scraping the commercial-free website

If you want to create a Spell-Checker what you need is a bunch of words. Like, lots of em. As many as you can get. All of them, really. And then you put them all together and tell your Spell-Checker to pick out words in your text that aren't in that list of 'all the words in the universe' that you've collected.

"If it ain't there then it ain't a word." ~Spell Checker

And that's essentially how a Spell-Checker works.

So... I had to get myself a bunch of words. To do that, I went to the Teanglann.ie website. Conquered it, scraped it and left.

Veni, Vidi, Vici. ~Julius Caesar

If you want to learn more about how to scrape a website, have a look at a previous article I wrote about How I Scraped Merriam Webster's Dictionary.

There are no actual laws prohibiting someone from using electronic means to acquire information on the internet. Laws are only broken when you bypass passcodes, steal login information and post the names of Ashley Madison clients and their spouse-cheating ways. but that's for another day...

I won't be writing a second article on the subject of scraping-websites but if you're interested in seeing the Source Code I wrote to do it for this project, here it is Foclair_Gaeilg_Bearla_ScrapingTool.zip

... and I did send them an email telling them about my scraping ... I even sent them a link to a copy of the RTF files which were derived from their website and are now included in my Creative Writer's Word Processor.

All in the name of Gaeilge.

Converting the Source Files From HTML to Useable RTF

The source files which I scraped from the FGB website were HTML files. Unless I was planning on incorporating a web-browser into this Word-Processor, they needed to be parsed out and re-written into RichTextFile format. That can be a difficult and daunting process. Take, for example, this website you're looking at now. If you right-mouse-click and call your web-browser's context-menu (mine, anyway) then click on the 'View Page Source' menu item you'll get to see what the page's Mark-Up Language looks like. The generic HTML code is intended to be interpreted by any web-browser on any computer using any operating system. It's essentially comprised of instructions telling the browser you're using how to draw this page. Which fonts to use where and all that. HTML works great because its universal that way. But now, imagine picking through it and trying to figure out how to draw it yourself and then producing RichTextFiles with that information.

That's what I did.

I used a tool that I wrote for the Merriam Webster's files that are now all happily converted and baptised into the RTF hall of prayer. You can have a look at the source code here HTML-ator_20210329_2333.zip

Once I had figured out which HTML tags isolate the different parts of the Dictionary word entry files and knew how to format the text, I wrote the app that converted all the HTML files to RTF. Writing this particular tool took about a week and then processing all the files takes another 10-12 hours (on my slow and emotionally troubled laptop computer, anyway). This had to be done 3 or 4 times as errors in the final output RTF files were discovered and the code needed to be modified to correct the flaws that were appearing in the output files.

It was important for the RTF files to use a specific color of green that was unique for each of the file's 'pop-up tips' (same unique color for all tips) which could later be used to identify a 'tip' when the user's mouse hovers over the dictionary entry in the final word-processor and expects a 'pop-up' explanation for every abbreviation in the word's definition.

Here's the source code that I wrote to convert the HTML files to RTF format Foclair_Gaeilg_Bearla_HTML_to_RTF.zip.

Creating a Data-Base by Generating Variant Word Spellings

So, here's where the real Gaeilge lives. Or so you'd think, but not really. I didn't need to know anything about the Irish language to interpret the English language definitions. Nor did I have to understand the grammatical usage of any words and their variant spellings to generate these aberrant lexicographic logophilious transmographications. It was relatively easy. Having acquainted myself with the website's source files in those ten days it took me to sift through the HTML in order to produce the long sought out RTF files, I had a better idea of how to pick out the information I needed in order to transform those files into a database of all the variant spellings.

First, I scoured all the files for specific HTML tags that identified the 'word type' information. Found every one of them and stored them in a file for later use TipList.zip. These were all the abbreviations that 'pop-up' on the Teanglann.ie web-site with their complete spellings.

Since variant spellings (in this on-line dictionary) are described inside round brackets, I wrote an app which used each file's 'head-word' as the starting mold, then isolated the round brackets in the definition. Each of these round brackets then tested for 'pop-up' abbreviations. When a 'pop-up' abbreviation was found, the variant spelling's word-type was located. The tilda (~) and the dash (-) symbols on this web-site were interpreted to mean 'replace with head-word' or 'replace end of head-word', respectively.

Because I knew running through all 53,104 word entries was going to result in many 'round-brackets' not being interpreted by this Data-base creating app, I had it ask me what I wanted it to do with text it didn't know how to interpret (All the non-variant spelling round brackets in the dictionary confused this algorithm along with some other mis-formatted file entries). But doing that would mean I would have to tell it what to do with every questionable bracket every time I had to start again, and I knew that that was going to happen ... a lot. and it did. So I had it record whatever text it found confusing along with my instructions on how to handle them into a separate 'data-base building data-base' and so, as I ran through all these files explicitly telling it what to do every time it was confused, it recorded my instructions and never bothered me with those entries again until, by the end, it had it all figured out (or memorized) and left me alone (with few exceptions recurring regardless of my best efforts to train it to leave me alone).

Some Beta-Testers have been using the app and providing me with changes they want to see but none of those changes have been with regards to any of the Word-List entries, so I'm thinking this process worked out pretty good.

Here's the code I wrote to produce the Dictionary's complete Word-List Foclair_Gaeilg_Bearla_-_Alt-Spellings_20210330_0012.zip.

Using the Data-Base to Find 'Eclipsed' Words

Although the on-line dictionary provides ~all~ the information you need to generate the variant spellings of each word, you have to realize that when we say ~all~ ... the word ~all~ can be open for interpretation. You see, the thing is, the Irish tend to muck things up when it comes time to writing their language. Its really just to keep you honest. I won't make any jokes about slurring drunks or fighting Irish men speaking through bruised, bloody or fattened-lips that distort the sounds of what they're saying because the Real-IRA may come around and ask me whether I'm a Catholic-Atheist or a Protestant-Atheist and I really don't know which way to go on that one.

But, as I was saying, the Irish language is a little blurry about spelling. They have a thing called the 'eclipse' which means that they insert a letter sometimes before and sometimes after the first character of a word depending on how it's used in a sentence. Latin has declensions, French has conjugations and the Irish Eclipse them all with distorted twists in their spelling just to muck you up.

Here's an example, the 'Séimhiú' Eclipse inserts an 'h' after a 'b', 'c' or .. (there's a bunch) to soften the sound because ... why not?

There's a few of them 'Séimhiú', 'Urú' and some mysterious thing I call 'other'.

To manage this and still have a functioning spell-checker, I had to write methods that tested the second and first letters everytime it failed to find the word in the app's Word-List as it is spelled in the user's text.

Here's the method that detects an eclipse which adds one or two characters in front of the word's leading character.

C++
static string RootWord_PeelEclipse_Urú(string strWord)
{
    string[] arrEclipses =
        {
        "n-a",
        "mb",
        "gc",
        "nd", 
        "n-e",
        "bhf",
        "ng",
        "n-i",
        "n-o", 
        "bp",
        "dt",
        "n-u"
        };

    if (strWord.Length > 2)
    {
        for (int intEclipseCounter = 0; 
             intEclipseCounter < arrEclipses.Length; 
             intEclipseCounter++)
        {
            string strEclipse = arrEclipses[intEclipseCounter];
            if (strWord.Length > strEclipse.Length
                && 
                string.Compare(strWord.Substring(0, strEclipse.Length), strEclipse) == 0)
                return strWord.Substring(strEclipse.Length - 1);
        }
    }
    return "";
}

This method peels-off the leading 'eclipse' and returns the word's natural spelling which is then tested against the app's Word-List to see if it is an acceptable spelling. There are three of these methods and they are found in the classDlús_BinaryTree.cs file. They are used whenever a word cannot be found in the Word-List because they may have their spelling-distorted by a similar Irish lexicographer's nightmare. If none of these methods produces a valid spelling, then it quits trying and returns a null 'Not-Found' result indicating that the word is not in the Word-List and is therefore assumed to be misspelled.

Displaying an Interactive Word-List Onto the Screen

The code I used to draw the Word-List is a nasty bit of a mess that I've been working on for some time now. It started out with the need to reduce the number of Microsoft objects native to the C# language in my Animation Editor project (which has made giant leaps since the last publishing and will require more attention before I'm ready to show the world what monster I have wrought). I had so many objects on the screen for the User to interface with I was convinced that that was the reason why it was slogging along at such an annoyingly slow pace (I found a few other reasons later and fixed those too). The objective was to draw the Graphical User Interface onto a single MS PictureBox using a Sweep-and-Prune algorithm and thereby wittling away at the unnecessary overhead that resulted from having all those versatile memory burdened and event laden objects I assumed were cluttering my project down to a single PictureBox.

Here's an example of what I mean about a cluttered work area ...

Image 12

The image above is a small part of the UI for the Animation Editor which allows the user to sample a rectangle of a source image from a start size and location to an end size and location at given start and end frames and then draw that sampled image on the screen for each frame from a start size/location to an end size/location for any number of animation frames in an animation project (this produces a scrolling effect where the camera pans/zooms across an image or video). There are 87 of these objects that each have events, methods and properties, 95% of which this particular app never uses. The objects native to C# are tried and tested, versatile and bug-free (mostly) but they come with a lot of overhead (or so I figure). My solution to this was intended to reduce this memory overhead and alleviate the processor's work when handling them (Jury is still out on whether this objective was achieved). However, the SPObjects class has grown, changed excessively. I've debugged and tinkered with it so much that all of my projects for the last year have different versions of this same class which keeps getting better (although my SPObjects.TextBox is a crying flop of a disaster which may take much time yet to domesticate properly before it can be brought to the park and play with others without embarrassing me too much).

The class is so difficult to use I sometimes take to drink.

I will eventually write an article about it and show the world but for now let me just say ... ouch. It has been one of the most difficult challenges I have set for myself and, although I am pleased with the results, it is far from finished (there really is no hope for that TextBox)...

But the SPObjects class does have its advantages.

Without going too deeply into it ... Essentially, there is an imaginary rectangular region which defines the space where objects can be placed. That region can be as big as you like and of any shape you like (as long as its a rectangle) anywhere in the Cartesian plane. The 'Visible Region' can also be placed anywhere in the Cartesian plane and, depending on what is to be shown to the user at any time, scroll-bars will automatically appear if necessary. That means, for this word processor, I can create a space large enough to contain the entire Word-List of the Irish Dictionary, let the user move the scroll bar and then interrupt the SPContainer before it draws itself whenever the Visible Region changes and move the dozen or so SPObjects.Labels I am re-cycling onto the screen into the visible region where they will be displayed with the text & color appropriate for their dance recital in the 'Visible Region'.

Make sense?

Ok, I'll try again.

I created a tall rectangular region for the SPContainer (the Sweep'n'Prune area) which is much much larger than the rectangular space the user sees on the screen. When this Visible Region needs to be drawn, any labels already in the SPContainer are removed and kept in a side list to be re-cycled and used again. The Visible Region is then compared to the Word-List which has an ascending ranking order of all the words that is used as an index. This indexed list of words is then consulted to figure out what needs to be drawn on the screen, the existing labels are pulled from the side-list (where we just put them a minute ago), they are told what costumes & makeup to wear and when they are dressed and ready for their next performance they run to their intended positions in the SPContainer so that they appear on the screen as they should for the user to see.

This is kind of a convoluted way to draw the Word-List because normally I would just add all the objects into the SPContainer wherever they belong and not worry about re-cycling them as the Visible Region changes, since that's one of the few advantages of using this class but, since we're talking about more than 74 000 unique word spellings in the dictionary, doing it that way would stall the Dlús during load time while it builds the SPContainer's region and then hamper it with unnecessary memory requirements that are best left in Binary-Files on the hard-drive.

The SPObjects.cs classes are included in this project's source-code, it's the latest and greatest and does cut down on all the memory overhead involved in using too many objects with all the versatility Microsoft put into each one but it is still deficient in its ease of use (and lack thereof). You really have to will-it-to-life in order to make it work and despite the advantage of being able to make a scrolling container of any proportion that will automatically add Scroll-Bars for you ... it really still is a major pain if you haven't experienced the masochistic joys of implementing it yourself first.

Displaying a Dictionary Word-Entry Onto the Screen

In order to load RTF files into the dictionary window at the top-right of the screen, what I did was put two RichTextBoxes in the same panel and then alternated between them like an animator might draw on a side plate before putting it on the screen. There are two properties with the names...

C++
RichTextBox RTX_Next { get { return rtx[(RTXCounter + 1) % 2]; } }
RichTextBox RTX_Current { get { return rtx[RTXCounter]; } }

...which I cycle between using a method that changes the value of the current RichTextBox being referenced in either of them.

C++
void RTX_Cycle() 
{
    intRtxCounter = (intRtxCounter + 1) % 2;
    RTX_Current.BringToFront();
    if (formDlús.Debugging)
        RTX_Current.ContextMenu = cmnu;
}

Then, when I want to load a new definition, the RTX_Next RichTextBox is the one that actually loads the file before the RTX_Cycle() method is called and puts it in front of the previous one.

A timer is set to test whether the user's mouse is hovering over the Dictionary display area. This timer is reset whenever the mouse moves and then quits altogether when the mouse leaves that part of the screen. There's a method in the CK_Objects.cs file which I use to measure the generic 'on-screen' MousePosition relative to any control in my app. It asks Windows where the Mouse is on the screen, then subtracts the Location of each parent control back to the form that contains the app. Have a look:

C++
public class classMouseOnControl
{
    public static Point MouseRelTo(Control ctrl)
    {
        Point ptRetVal = System.Windows.Forms.Control.MousePosition;

        while (ctrl != null && ctrl.Parent != null)
        {
            ptRetVal.X -= ctrl.Location.X;
            ptRetVal.Y -= ctrl.Location.Y;
            ctrl = ctrl.Parent;
        }

        return ptRetVal;
    }
}

If the user lets the mouse cursor rest anywhere over the Dictionary display long enough, the timer event is triggered. This tells the app to check what word is under the mouse cursor and bring up whatever information is appropriate, then displays that in a PopUp text box near where the mouse cursor is located on the screen.

What gets displayed in the PopUp textbox depends on what is under the mouse cursor. The first thing it asks is 'what color is this text' because if its that 'unique green color' that was used to paint all the abbreviated 'tips' mentioned earlier then it knows that the text under the mouse-cursor is an abbreviation and what it needs to display is the full spelling of that abbreviation. Otherwise, it looks through the Word-List database (being sure to test for the Eclipses I mentioned in the previous section). If it finds a word in the Word-List that matches what appears underneath the mouse-cursor, then it puts that on the screen.

Initially, I had argued to include the same Merriam-Webster's English Dictionary files I added to my Creative Writer's Word-Processor but the intention to "keep it Irish" shillelagh-ed that plan and I took the MW dictionary out.

C++
void PopUpText()
{
    string strPopUp = "";

    if (intIndex_Start <= intIndex_End && intIndex_Start >= 0)
    {
        RTX_Current.Select(intIndex_Start, intIndex_End - intIndex_Start+1);
        if (RTX_Current.SelectionColor.R == clrTip.R
            && RTX_Current.SelectionColor.G == clrTip.G
            && RTX_Current.SelectionColor.B == clrTip.B)
        {
            // this is an abbreviation and needs to be matched with its 'tip' 
            strPopUp = classTip.Search_PopUpKey(_strWordUnderMouse);
            panelPopUpDefinition.Abbreviation(strPopUp);
        }
        else
        {
            List<classDlús_LLItem> lstLL = classDlús_BinaryTree.Search(WordUnderMouse);
            if (lstLL != null && lstLL.Count > 0)
            {
                panelPopUpDefinition.Definition(lstLL);

            }
        }
    }
}

The Thesaurus - Líonra Séimeantach na Gaeilge (LSG for those in the know)

Converting one lone PDF file into 32,728 RTF files

I went to their web-site and had a look at the XML-ish file they offered.

didn't like it.

Tried to use their Latex_Source file

didn't like it..

GaelSpell ,,,

didn't like it ...

Essentially, I've decided to quit relying on other people and discovering 3rd party whatcha-call'ems that don't do what they're supposed to for me ... so I downloaded the LSG PDF file thinking I could use that to generate the thesaurus database. I went about searching on the internet for a (free) app that would convert the PDF to RTF and wound up giving my credit-card information to two different companies who promised they could do it. I logged into each of my new accounts in turn to discover that Gaeilge is NOT a common language and they had no idea what to do with this file. Thankfully, both accounts were 'Free Trial' accounts and I haven't seen any money come out of my depleted (red-lined) bank statement... yet.

So, there I was with a pretty PDF and no gas in the tank to take her anywhere... hmmm, let me reminisce how often this has happened to me.

Well, we had fun anyway.

Let me introduce you to my date, her name is 'Cut N. Paste'. We had hours of fun. Dropped 21 young'uns and gave them all names from A to Z.

Let me show you a family picture:

Image 13

I would have jumped in the photo, but I had to hold the camera.

Next, I set to work on the grand-kids.

Since the kids all look like their mom, I knew that each word-entry started with Bold fonted text. So by scanning each character one at a time looking for Bold fonted letters (and ignoring samples of Bold fonted numerals), I would be able to cut each RTF file up into the separate word-entries and save them individually to generate all the 'grand-kids'. Which is what I did. It took me about an hour to write the code, 9 hours to process all the files and ten minutes to decide to take a nap while the rest of our descendants looked up their history on Ancestry.com to find this proud family picture:

Image 14

So, at this point, I had all the RichText files I needed to build the Thesaurus.

Here is the app I wrote to generate all the RTF Thesaurus files Dl_s_Thesaurus_Build_RTF_Files_20210330_0806.zip

Finding Words in the Thesaurus for the User

To provide the user with the Thesaurus information of a given word, the spelling of the requested word is used to search for a file name in the appropriate LSG sub-director. If a file with the exact spelling of the word requested is found, then that file's content is drawn to the screen. When a word's variant spelling is requested, then the app needs to scour through a binary-tree for the requested word and then reports back with the word's root spelling which is then used as the file name and the HD is once again searched, the file is found and its content is put to the screen. Skipping the first step in this process would likely simplify the algorithm but why bother fixing what isn't broken. When I decided it was working as I had written it... I just moved on and gave it no further thought.

Here's the code:

C++
public void Thesaurus_Search()
{
    RichTextBox rtx = rtxMain.rtx;
    string strWordUnderMouse = TextAtCursor(ref rtx);
    if (strWordUnderMouse.Length > 0)
    {
        string strDir = classDlús_BinaryTree.WorkingDirectory + "lsg\\Letter" + 
        StringLibrary.classStringLibrary.Deaccent(strWordUnderMouse)[0] + 
                      "\\" + strWordUnderMouse + ".rtf";

        if (System.IO.File.Exists(strDir))
        {
            Thesaurus_Show(strDir);
            return;
        }

        bool bolValid = false;
        classDlús_BinaryTree.classBTLeaf cBTLeaf = 
        classDlús_BinaryTree.classBTLeaf.Get(strWordUnderMouse, ref bolValid, true);

        strDir = classDlús_BinaryTree.WorkingDirectory + "lsg\\Letter" + 
        StringLibrary.classStringLibrary.Deaccent(cBTLeaf.key)[0] + 
                      "\\" + cBTLeaf.key + ".rtf";
        if (System.IO.File.Exists(strDir))
        {
            Thesaurus_Show(strDir);
        }
    }
}

Conclusion

I started working on this project in early January. Since I always have a dozen projects on slow-burners at a time, I've been finding it difficult to actually complete anything without being distracted by something else. My Still has been a distraction. My Animation Editor project often has me spending time actually making animation videos and in the process of doing that, I often discover issues with my Sprite Editor. I play with micro-controllers and now I'm writing my next novel which calls attention to fixes & new features for my Creative Writer's Word Processor. All of these distractions are great fun and a lot of time-consuming work but since Dlús was a project which I was commissioned to write, I put extra diligence in ensuring it was done properly and as user-friendly as I could conceive it. There may still be updates to it in the future ... but for now "tá sé iomlán".

History

  • 16th May, 2021: Initial version
  • 7th June, 2021 : fixed Save menu option (was treated as a SaveAs)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Christ Kennedy
CEO unemployable
Canada Canada
Christ Kennedy grew up in the suburbs of Montreal and is a bilingual Quebecois with a bachelor’s degree in computer engineering from McGill University. He is unemployable and currently living in Moncton, N.B. writing his next novel.

Comments and Discussions

 
QuestionOS X support? Pin
Sean Dynan20-May-21 9:14
MemberSean Dynan20-May-21 9:14 
AnswerRe: OS X support? Pin
Christ Kennedy20-May-21 9:50
mvaChrist Kennedy20-May-21 9:50 
GeneralRe: OS X support? Pin
Sean Dynan20-May-21 12:37
MemberSean Dynan20-May-21 12:37 
Praise5 Stars from me as well! Pin
Gwyneth Llewelyn20-May-21 6:51
MemberGwyneth Llewelyn20-May-21 6:51 
GeneralRe: 5 Stars from me as well! Pin
Christ Kennedy20-May-21 8:31
mvaChrist Kennedy20-May-21 8:31 
GeneralMy vote of 5 Pin
Gwyneth Llewelyn20-May-21 6:49
MemberGwyneth Llewelyn20-May-21 6:49 
QuestionDlús means Density? Pin
Eek Ten Bears18-May-21 0:34
MemberEek Ten Bears18-May-21 0:34 
AnswerRe: Dlús means Density? Pin
Christ Kennedy18-May-21 6:50
mvaChrist Kennedy18-May-21 6:50 
I did not choose the name.
I would have gone with the generic "Irish Word Processor" but the guy who commissioned me to write this app came up with the name.
When you run this app and call up the definition for the word 'Dlús' (or go to Foclóir Gaeilge-Béarla website the 2nd and 3rd entries for this word's definition are : 'Fullness, abundance' & 'Close application; expedition, speed', respectively. (but the first is 'closeness, compactness; density')
I'm thinking 'fulness' & 'speed' were probably what he was going for.

thank you for your interest.
my code is perfect until i don't find a bug...

QuestionThank you so much for doing this...a lead on a pluggable grammar tool Pin
Des Nolan17-May-21 10:52
MemberDes Nolan17-May-21 10:52 
AnswerRe: Thank you so much for doing this...a lead on a pluggable grammar tool Pin
Christ Kennedy17-May-21 12:16
mvaChrist Kennedy17-May-21 12:16 
Question+5 for several reasons Pin
honey the codewitch16-May-21 4:57
mvahoney the codewitch16-May-21 4:57 
AnswerRe: +5 for several reasons Pin
Christ Kennedy16-May-21 5:52
mvaChrist Kennedy16-May-21 5:52 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.