Click here to Skip to main content
15,878,814 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Sorry this is a bit complex to explain but its really a very simple question....just point me at the right web-site.

I am using SSIS to write a set of n data files from a database. Simplifying a hell of lot I divide very big SQL data sets comprising different related objects into smaller chunks and write them to text files. So for example I might have 17 customer files, 17 aligned customer address files and so on. At a set point in the process I ask SSIS to work through a ForEach loop which configures the SQL for that file and then writes the file. This is all complicated by other requirements but it works well. So today I ask for customers, tomorrow I ask for vendors, on Monday its materials, Tuesday Service History etc etc. Brilliant application if I say so myself, one problem...

The "writes the file" bit using C# has an issue with Unicode. When I hit it with volume, it has difficulty detecting if a file needs to be saved in UNICODE or not.

So for example the first 1 of 17 files for customers all have names which are in the standard base character set (they do its old UK and NW European customers) but when it gets to file 5 it find lots of our far eastern customers. The C# code seems to be stuck not in Unicode and translates the characters as ????s. Since there are lots of customers in one place like this it stands out, but checking back I've had errors previously just not in file 1. Moreover if I adjust the order of my extract to start with the far-eastern customers its fine again. I'm a SQL coder with lots of C# experience but I am missing something fundamentally important here. A short simple explanation now will prevent future disaster - thank-you guys...

My code is below, I think what is happening is when I use
DataTable dtD = new DataTable();
it isn't actually giving me a new DataTable its just clearing down the old one. So if it decided to be not Unicode at the first dataset it continues to be that way. Remember the script is called through SSIS which controls the next iteration.

The supplied parameters are:

nTiledExtractHeaders: A set of headers for the file, its an SQL query that returns a single row. We create the file with this and append the data underneath. So could be something like
CustomerID, CustomerName
ID, Name


nTiledExtractData: This is our data-set a simple set of made up names might be
ID, Name
C000001, Ace Limited
C000002, Acme Inc
...
C045123, 中國客戶
...
C050001, 日本のお客様
...
C092345, Zyppy Zip Company

(using a comma in each list to represent the end of each data field)

nTiledFileName: Just a file name


public void Main()
{
    string pos = "00";
    try
    {
        pos = "01";
        string filename = Dts.Variables["User::nTiledFileName"].Value.ToString();
        pos = "02";
 //       DataSet ds = null;
        pos = "03";
        DataTable dtH = new DataTable();
        DataTable dtD = new DataTable();

        pos = "04";
        System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter();
        pos = "05";
        adapter.Fill(dtH, Dts.Variables["User::nTiledExtractHeaders"].Value);
        pos = "06";
        adapter.Fill(dtD, Dts.Variables["User::nTiledExtractData"].Value);
        pos = "07";
 //       ds.Tables.Add(dtH);
 //       ds.Tables.Add(dtD);
        pos = "08";
        System.Text.Encoding e = System.Text.Encoding.Unicode;
        System.Collections.Generic.List<string> header = null;

        pos = "09";
        if (dtD.ExtendedProperties.ContainsKey("Unicode") && (bool)dtD.ExtendedProperties["Unicode"])
        {
            pos = "10";
            e = System.Text.Encoding.Unicode;
        }

        pos = "11";
        using (System.IO.StreamWriter file = new System.IO.StreamWriter(System.IO.File.Open(filename, System.IO.FileMode.Create), e))
        {
            pos = "12";
            header = new System.Collections.Generic.List<string>(dtH.Columns.Count);
            pos = "13";
            foreach (System.Data.DataColumn heading in dtH.Columns)
            {
                pos = "14";
                header.Add(heading.ColumnName);
            }
            pos = "15";
            file.WriteLine(string.Join("\t", header)); // write the headings
            pos = "16";
            foreach (System.Data.DataRow headingRow in dtH.Rows) //write the subheadings
            {
                pos = "17";
                System.Collections.Generic.IEnumerable<string> hFields = (headingRow.ItemArray).Select(Field => Field.ToString());
                pos = "18";
                file.WriteLine(string.Join("\t", hFields));
            }
            foreach (System.Data.DataRow dataRow in dtD.Rows) //write the data
            {
                pos = "19";
                System.Collections.Generic.IEnumerable<string> dFields = (dataRow.ItemArray).Select(Field => Field.ToString());
                pos = "20";
                file.WriteLine(string.Join("\t", dFields));
            }

        }

        pos = "21";
        Dts.TaskResult = (int)ScriptResults.Success;
    }
    catch (Exception ex)
    {
        //An error occurred.
        Dts.Events.FireError(0, "Write TDF Files - " + pos, ex.Message + "\r" + ex.StackTrace, String.Empty, 0);
        Dts.TaskResult = (int)ScriptResults.Failure;
    }

}


What I have tried:

The unicode or not test is a nice to have I don't really need so I commented it out and just let everything get encoded as Unicode. My solution works, and this is a good enough work-around for this application. The fact that all the files will be Unicode isn't a problem but I am worried about why its happening...and for potential similar situations in the future how could I avoid this.
if (dtD.ExtendedProperties.ContainsKey("Unicode") && (bool)dtD.ExtendedProperties["Unicode"])
{
    pos = "10";
    e = System.Text.Encoding.Unicode;
}
Posted
Updated 5-Jun-20 5:09am
v2

1 solution

I just wonder why you are trying to "detect" unicode anyway. .NET strings are internally using unicode representation. As long as the names are stored in the database using unicode as well (nvarchar), you shouldn't even bother detecting anything at all.
Store it as unicode, retrieve it from C# and save it back to an unicode file. This would be an huge improvement on your code since it would allow you to avoid spaghetti-code.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900