Click here to Skip to main content
15,888,521 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi developers,

This is my code to copy multiple files data into a single one:

public void generateMonthReport()
       {
           string[] inputFilePaths = Directory.GetFiles("//Path to directory for multiple files","*.txt.2018-"+DateTime.Now.ToString("MM")+"-*");

           using (var outputStream = File.Create("//Path to save the new file" + DateTime.Now.ToString("MM-yyyy") + "-Statistics.txt"))
           {
               foreach (var inputFilePath in inputFilePaths)
               {
                   using (var inputStream = File.OpenRead(inputFilePath))
                   {
                       inputStream.CopyTo(outputStream);
                   }

               }
           }
       }



Photo of output (please see)
https://image.ibb.co/etL5T9/Capture.jpg

in that photo there is a date highlited. That line is where a new file is being written.

The problem is that that highlited date is written different from the other and when i come to read it with regex an error is being outputted.

I tried to put that highlited line in regex101 text string and it was read as a bullet
Image : https://image.ibb.co/hF7YgU/Capture1.jpg


UPDATE AS REQUESTED :
This is my regex (multiline) :
^(?<date>[^ ]+) (?<time>[^A-Z]+) (?<errorMessage>[^[]+) \[1\] (?<programName>[^.]+)[.](?<formName>[^.]+)[.](?<event>[^ ]+)[^a-z]+(?<username>[^:]+):(?<message>[^.]+).+$"


In the hexEditor i found that the line begins with (maybe it can helps):
 -> EF BB BF

This is the code to read the data from the file :
var MyTextFileDataSet = new TextFileDataSet.TextFileDataSet();
          using (var filestream = new FileStream("//path of file to read.", FileMode.Open, FileAccess.Read,FileShare.ReadWrite))
          {
              MyTextFileDataSet.ContentExpression = new Regex(@"^(?<date>[^ ]+) (?<time>[^A-Z]+) (?<errorMessage>[^[]+) \[1\] (?<programName>[^.]+)[.](?<formName>[^.]+)[.](?<event>[^ ]+)[^a-z]+(?<username>[^:]+):(?<message>[^.]+).+$", RegexOptions.Multiline);

              MyTextFileDataSet.Fill(filestream);

          }


          int counterError = 0, counterFatal = 0, counterWarning = 0;



          var rows = MyTextFileDataSet.Tables[0].AsEnumerable();
          string errorMessage = "";
          string transactionsMessage = "";
          int counter = 0;
          foreach (var row in rows)
          {
              errorMessage = row.Field<string>("errorMessage");
              var date = DateTime.Parse(row.Field<string>("date"));
              var name = row.Field<string>("username").Trim();
              transactionsMessage = row.Field<string>("message");
              string transactionsString = "";
              if (transactionsMessage.Contains("generated"))
              {
                  transactionsString = getBetween(transactionsMessage, "generated", "transactions");
              }
              if (transactionsString != "")
              {
                  var transactions = Convert.ToInt32(transactionsString);
                  var logItem = logItems
                                            //.Where(n => n.Date == date)
                                            .Where(n => n.Name == name)
                                            .FirstOrDefault();
                  if (logItem == null)
                  {
                      logItems.Add(new LogItemGeneration
                      {
                          Date = date,
                          Name = name,
                          Transactions = transactions
                      });
                  }
                  else
                  {
                      logItem.Transactions += transactions;
                  }
              }


              switch (errorMessage)
              {
                  case "ERROR":
                      counterError++;
                      break;
                  case "FATAL":
                      counterFatal++;
                      break;
                  case "WARN":
                      counterWarning++;
                      break;

                  default:
                      break;
              }
              //GENERATING CHART

              counter++;

this line :
var date = DateTime.Parse(row.Field<string>("date"));
is giving a format exception when it comes to that line where the data is written different



If you did not understand something or want to clarify something do not hesitate to comment and i will answer :)

What I have tried:

When i changed that line like the others (manual) everything worked fine.

I tried to search for other methods on internet but failed to do so.
Posted
Updated 10-Aug-18 2:22am
v2
Comments
Richard MacCutchan 10-Aug-18 6:33am    
The difference you see is only present in the application that displays the data. The actual file content is the same for all the text. Maybe if you edit your question and show the actual content and explain what regex you are using and what error you receive, people will be able to help.
Joe Doe234 10-Aug-18 8:06am    
Question updated , maybe it can help you more im really sorry but english is not my first language
Richard MacCutchan 10-Aug-18 8:20am    
No need to apologise for your English, my Maltese is terrible.

<quote>In the hexEditor i found that the line begins with (maybe it can helps):
 -> EF BB BF
Those bytes just identify the content as encoded in UTF-8, but it does not affect the format. Any editor or text handling routine will skip over those bytes. However if they are in the middle of the file they will not be interpreted correctly. You should also check the file that contains the source of that text to see if the extra character(s) comes from the original file.
Joe Doe234 10-Aug-18 8:26am    
First of all thanks for understanding me , I think it is comming in the middle of my file because i am inputting multiple files into 1 and i am reading the wholes files including these characters as well, i will try to check about streamreaders and streamwriters maybe i can work with them
Richard MacCutchan 10-Aug-18 8:29am    
See my latest comment above.

I think the problem is that you're trying to merge lines of text files, which are composed of characters, without worrying about character encoding.

The hex that you mention looks like a Byte Order Mark (BOM) for UTF-8, see:

Byte order mark - Wikipedia[^]

Because you are treating the files as byte streams, you are copying the byte order marks verbatim. This works fine for the first line, since its OK to have a BOM at the beginning of a file, where it belongs.

However, for the second and subsequent files, it does not work. This is because you are copying the BOM into the middle of the file, where it does not belong.

You need to look into using a stream reader / stream writer, which do the character encoding for you. See:

How to: Read Text from a File | Microsoft Docs[^]
How to: Write Text to a File | Microsoft Docs[^]

Alternatively, the following can simplify reading the lines:

File.ReadLines Method (System.IO) | Microsoft Docs[^]

There is an equivalent for writing lines, but it is not well suited to merging multiple files:

File.WriteAllLines Method (System.IO) | Microsoft Docs[^]
 
Share this answer
 
v3
Comments
Richard MacCutchan 10-Aug-18 8:44am    
That definitely appears to be the issue. See my comments and OP's replies above.
Start by looking at the files your are opening: use a hex editor to look at the end of the first one, and the start of the next. It's important to use a hex editor, because a text editor (like notepad) will make assumptions about the file content that will "hide" what you need to look for.
what would the "combined file" look like? Is there anything in there other than straight letters, numbers, punctuation, and newline (either 0x0D, 0x0A, \n, \r or a combination)? If so, what?
Then look at the "combined" file using the same hex editor - what does the "join" look like? Is everything exactly what you would expect from your observations of the inputs?

You need to gather info on exactly what is happening - and we can't do that for you, we have no access to your file system!
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900