Click here to Skip to main content
15,867,834 members
Articles / General Programming / Regular Expressions

Inserts Tabular Text from Text Files into Microsoft Word Table

Rate me:
Please Sign up or sign in to vote.
4.11/5 (2 votes)
26 Aug 2010CPOL4 min read 35.2K   486   7   4
Inserts tabular text into Microsoft Word document

Introduction

In our company we have an automatic product scanner that reads the devices serial number (XXXX; XXXXS; XXXX where X is an integer) and saves them in a tabular text file.
After one hour maybe, there are more than 1000 SN saved in the text file in a tabular format. At the end of the day, the production employer must insert parts of the scanned SN’s into Microsoft Word template and he shall send a PDF copy to the customer.

This tool helps the production employer to inserts tabular data into Microsoft Word document and converts the merged document to PDF.

This project presents a tabular text format merger which copies data from a given tabular text files and inserts/merges the copied text on the fly into a given Microsoft Document table, finally exports the merged Microsoft Word document to PDF file format.

The Main Form

The purpose of the main form is control the feedback of the application and allows the user to run it anyway that they choose. When the program is run as shown in Fig 3, you will see the browse buttons and textboxes.

The first one is for the given tabular text file.
The second one is for the given Microsoft Word document.
The third one is optional just an output path.

Remarks

  • The origin Microsoft Word document used as template. The tool will not change the origin doc file, if you like to do that, then set:
    C#
    // object saveWordChanges = true;
  • In the Config file, you can define the begin index of the given Microsoft Word document table. For example:
    C#
    // <add key="TableIndex" value="2" />
    // <add key="RowIndex" value="2" />
  • The tool will open the second table in the given Microsoft Word origin document and then jumps to the second row and begins to write the copied tabular text.
  • If the table size is less than the copied tabular text size, then the user will receive a TextInjector format error.
  • If the copied tabular text is not compatible with the given regular expression pattern, then the user will receive a format error.
  • You can define how many columns need to be copied from the given text file.
  • The copy process is cell by cell respectively, the start cell is defined in the Config file.
  • Microsoft Word 2003 can't save PDF format, I have solved this problem by using an external tool.

How to use TextInjector?

Run the TextInjector.

  • Navigate to the tabular text file, as shown in Fig 3.

    image001.png

    Figure 1 Example for a tabular text file
  • Navigate to Microsoft Word document that contains tables, as shown in Fig 3.

    image002.jpg

    Figure 2 Example for Microsoft-Word Document
  • You have to browse to the desired output path, as shown below in Fig 3.

    image003.png

    Figure 3 TextInjector
  • Finally, click on the Create File button. The PDF file shall be created, see Fig 4, 5.

    image004.jpg

    Figure 4 The created PDF file

    image005.jpg

    Figure 5 The test directory

How Does the Code Work?

The first method called in the constructor is RetrieveAppSettings().

C#
/// <summary>
/// Reads the settings from the Config file.
/// </summary>
private void RetrieveAppSettings()
{
var configurationAppSettings = new AppSettingsReader();
// the tabular Text file path
_inputTextFileTextBox.Text = (string)(configurationAppSettings.GetValue
				("TextFilePath", typeof(string)));

// the Microsoft-Word document path
_inputWordFileTextBox.Text = (string)(configurationAppSettings.GetValue
				("WordFilePath", typeof(string)));

App.config File

C#
<configuration>
     <appSettings>
           <add key="TextFilePath" value="C:\" />
           <add key="WordFilePath" value="C:\" />
           <add key="WordOutputPath" value="C:\" />
           <add key="TextFilesFilter" value="txt files (*.txt)|*.txt" />
           <add key="WordFilesFilter" value="txt files (*.doc)|*.doc" />
           <add key="RegularExpressionFilter" value="^[0-9]{5}[- ;.]
					[0-9]{5}S[- ;.][0-9]{4}$" />
           <add key="NumberOfCloumnsInTextFile" value="3" />
           <add key="RowSplitter" value=";" />
           <add key="TableIndex" value="1" />
           <add key="RowIndex" value="1" />
     </appSettings>
</configuration>

This method reads the application App.config which contains the configuration data. Each time you navigate to a new path, the new path will be written in the App.config file as follows:

C#
/// <summary>
/// Save the config data
/// </summary>
/// <param name="propertyName">App.config property name</param>
/// <param name="filePath">App.config file path.</param>
private static void SaveConfig(string propertyName, string filePath)
{
    //open the configuration file for the current application
    var config = ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.None);
    
    //set the value of the App.config property
    config.AppSettings.Settings[propertyName].Value = filePath;
    Causes only modified properties to be written to the configuration file, 
    even when the value is the same as the inherited value.
     config.Save(ConfigurationSaveMode.Modified);
     
    //----------------------------- here it’s funny----------------------
    // You see the below two lines, the problem was: 
    // if I run the application from VS than I can’t save the 
    // configuration properties in the App.config
    // This workaround for the problem
    //var configFile = System.Windows.Forms.Application.StartupPath + 
		Path.DirectorySeparatorChar + ConfigFileName;
    //config.SaveAs(configFile, ConfigurationSaveMode.Modified);
    
    // Refreshes the named section so the next time that it is retrieved 
    // it will be re-read from disk.
    ConfigurationManager.RefreshSection("appSettings");
}

The application is ready to work.

The CreateDocumentsButtonClick is the heart method of the application, it’s called when the user clicks on the Convert Button:

Check the exiting of the given paths, if any error found, then return:

C#
if (!File.Exists(_inputTextFileTextBox.Text))
      {
             MessageBox.Show("Input Text file doesn't exist!!");
             return;
      }
      
if (!File.Exists(_inputWordFileTextBox.Text))
      {
             MessageBox.Show("Input Word file doesn't exist!!");
             return;
      }
      
if (!Directory.Exists((_outputDocsPathTextBox.Text)))
      {
             MessageBox.Show("Output Word directory doesn't exist!!");
      return;
      }

Retrieves the data from the given tabular text file, if a format error is found and the user has cancelled the process, then return:

C#
if (!RetrieveRows())
    return;

Below, I will explain the most important code section:

Open the Microsoft Word document and insert the copied table.

Represent the Microsoft Office Word application.

C#
_Application wordApp = new Word.Application();

The Microsoft Word document.

C#
object wordFile = @_inputWordFileTextBox.Text;

Dummy object.

C#
object missing = Missing.Value;

The document shall opened in read-only mode.

C#
object readOnly = true;

PDF format.

C#
object fileFormat = WdSaveFormat.wdFormatPDF;

Here I get the Microsoft Word document file name without the extension.

C#
var fileNameWithoutExt = Path.GetFileNameWithoutExtension(_inputWordFileTextBox.Text);

The PDF file has the same name of the Microsoft Word document.

C#
object wordPdfFile = _outputDocsPathTextBox.Text + Path.DirectorySeparatorChar +
                      fileNameWithoutExt + pdfExt;

wordApp.Documents is a collection of all the Microsoft.Office.Interop.Word.Document objects that are currently open in Word.
wordApp.Documents.Open opens the given Microsoft Word document in the Microsoft Word application.

C#
_Document wordDoc = wordApp.Documents.Open(ref wordFile,
	ref missing, ref readOnly, ref missing, ref missing, ref missing,
	ref missing, ref missing, ref missing, ref missing, ref missing,
	ref missing, ref missing, ref missing, ref missing, ref missing);

Checks the given Microsoft Word document whether it contains tables.

C#
if (wordDoc.Tables.Count >= _tableIndex && 
	wordDoc.Tables[_tableIndex].Rows.Count >= _rowIndex)
{
    var destinationTable = wordDoc.Tables[_tableIndex];

    object readOnlyRecommended = false;

    for (var rowNrInList = 0; rowNrInList < _extractedRow.Count; rowNrInList++)
    {
        var row = _extractedRow[rowNrInList];

Inserts the tabular text on the fly into the destination table.

C#
for (var columnNrInList = 0; columnNrInList < row.Count; columnNrInList++)
{
    //cell index begint with 1
    destinationTable.Cell(rowNrInList +1, columnNrInList+1).Range.Text = 
						row[columnNrInList];
}

Exports the Modified document to PDF.

C#
wordDoc.SaveAs(ref wordPdfFile, ref fileFormat,
	ref missing, ref missing, ref missing,
	ref missing, ref readOnlyRecommended,
	ref missing, ref missing, ref missing,
	ref missing, ref missing, ref missing,
	ref missing, ref missing, ref missing);

RetrieveRows() is called from the CreateDocumentsButtonClick() to read the text data.

  • Extracts the data from tabular text file.
  • Checks the extract row to see whether it matches the given regular expression pattern.
  • Adds the result to _rowTextLine.
  • Returns true if the process is completed without error.
C#
/// <summary>
// Retrieves the data from the given tabular text file. 
// </summary>
// <returns>True, if no format error found. </returns>
private bool RetrieveRows()
{
//clear old data
_extractedRow.Clear();

//return flag
var readSuccessfullyCompleted = true;

//regular Expression pattern
var formatRegex = new Regex(_regularExpressionText);

//read the text file
var fileText = File.ReadAllText(_inputTextFileTextBox.Text);

//split the tabular text file in lines
_rowTextLine = new List<string>(fileText.Split(new[] { Environment.NewLine }, 
                                               StringSplitOptions.RemoveEmptyEntries));
                                               
//for each line
foreach (var row in _rowTextLine)
{ 
       //if the line matched the regualr expression
       if (formatRegex.IsMatch(row))
             {
              //split the row depending on the _rowSplitter char which is 
              //given in the Config file.
             var rowSpiltter = row.Split(_rowSplitter);
             
             //add to row 
              _extractedRow.Add(new List<string>(rowSpiltter));
       }
       else
       {
              //error message
              var dlgResult = MessageBox.Show("Format error founded in the 
				tabular text file!",
                                     "Continue?", MessageBoxButtons.YesNo, 
				MessageBoxIcon.Question);
                                              
             if (dlgResult == DialogResult.No)
             {
                    readSuccessfullyCompleted = false;
                    break;
              }
       }
return readSuccessfullyCompleted;
}

History

  • 26th August, 2010: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Agilent
Germany Germany
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionAlternative without MS Word Pin
Johnny Glenn25-Mar-12 21:55
Johnny Glenn25-Mar-12 21:55 
GeneralVery good! Pin
Anthony Daly27-Aug-10 5:37
Anthony Daly27-Aug-10 5:37 
Generalnice thanks Pin
HungryMinds26-Aug-10 4:42
HungryMinds26-Aug-10 4:42 
GeneralMy vote of 5 Pin
HungryMinds26-Aug-10 4:40
HungryMinds26-Aug-10 4:40 
Very informative and nice article to read.
The levels of interest can be more clearly defined:

* one sentence summary
* one paragraph summary
* major points
* minor points
* detailed interest
* thirst for more information

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.