Click here to Skip to main content
15,884,176 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I understand that having a regex parse a DateTime string is inappropriate - The DateTime.Parse or TryParse is much better for that, and takes into account local formatting, time zones, and anything else that may be appropriate. The problem with those is if there are any spurious leading or trailing characters, they just fail.

So what I need to do is regex the document to find the strings, and then use the appropriate method to parse the matched substring

i.e.
C#
Regex re = new Regex(....);  // Somewhere in there will be (?<datetime>...)
Matches matches = re.Matches(document);
foreach (Match m in matches) {
    DateTime dt;
    if (DateTime.TryParse(m.Groups["datetime"], out dt))
        OperateOn(dt);
}


I realise I may have to limit the acceptable matches, but if I could get all the standard DateTime output formats for en-AU, en-UK, and en-US (except the "M" or "Y" formats, as they do not produce a full date), I'd be a happy man.
Posted
Comments
ridoy 29-Nov-15 2:08am    
Could you provide your sample string format from which you will extract datetime?
Midi_Mick 29-Nov-15 2:30am    
There's quite a few. The thing is, the user will be entering them in the document manually - so they may like 29 Nov 2015 4:15PM, or the full word November, or 29/11/15. Or may they prefer Nov 29, 2015 or 11/29/15 if they're in the US. Then again, it may be a technical document, and the user has used the RFC standard "2015-11-29 16:15:00Z". Basically, anything the DateTime can parse, I want to be able to find.
ridoy 29-Nov-15 2:36am    
Have a try with below solutions.
phil.o 29-Nov-15 3:25am    
Just my 2 cents: regexes are the worst choice when it comes to validate datetimes. They are not meant for that; they cannot handle the internal logic a datetime carries.
Midi_Mick 29-Nov-15 4:10am    
I'm not looking to validate - TryParse will do that for me - I'm just looking to find the text within the document to pass to TryParse.

What surprises me is how "robust" DateTime.TryParse can be (note: my only experience using DateTime.TryParse is with standard English char encodings); example:
C#
string strDate = "\t 12 .  12  /2015 14:34";
DateTime realDate;
DateTime.TryParse(strDate, out realDate);
That will give you a plausible result.

I am not clear about the range of possible culture-contexts and char-whatevers you may be working with, but it might be valuable to see if DateTime.TryParse may be as robust in those other contexts.

Edit: Using Midi-Mick's test data with DateTime.TryParse:
C#
List<string> test = new List<string>
{
    "29/11/15",
    "29 November 2015 6:27PM",
    "2015-11-29T18:27:45.50+10:00",
    "2015-11-29 18:27:45.50Z",
    "NOV 29, 2015",
    "21:15"
};

private void parseTest()
{
    DateTime testDate;
    
    foreach (string str in test)
    {
        if (DateTime.TryParse(str, out testDate))
        {
            Console.WriteLine(testDate);
        }
        else
        {
            Console.WriteLine("could not parse: {0}", str);
        }
    }
}

/* results of test
    could not parse: 29/11/15
    11/29/2015 6:27:00 PM
    11/29/2015 3:27:45 PM
    11/30/2015 1:27:45 AM
    11/29/2015 12:00:00 AM
    11/30/2015 9:15:00 PM (reflects local time when test was run at GMT+7
*/
 
Share this answer
 
v2
This regular expression should take care of most of the variants. If you run into yet another format you can just add to the expression

C#
Regex regex = new Regex(@"([0-9]{2}\s+[a-v]{3,9}\s+[0-9]{4}\s+[0-9]{1,2}:[0-9]{2}(AM|PM))|([a-v]{3,9}\s+[0-9]{2},\s*[0-9]{4})|([0-9]{1,2}/[0-9]{1,2}/[0-9]{1,2})|([0-9]{4}-[0-9]{2}-[0-9]{2}(T| )[0-9]{2}:[0-9]{2}:[0-9]{2})", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Multiline);

MatchCollection matchCollection = regex.Matches( [Target_string] );
foreach (Match match in matchCollection)
{
    DateTime dt;
    if (DateTime.TryParse(match.Value, out dt))
    {
        // Do what you are supposed to do.
    }
    else
    {
        // Well, what should you do if you can't convert a matched string?
    }
}


Tested on
29 Nov 2015 4:15PM
29 November 2015 4:15PM
29/11/15
Nov 29, 2015
November 29, 2015
11/29/15
2015-11-29 16:15:00Z
 
Share this answer
 
Comments
Midi_Mick 29-Nov-15 12:01pm    
That one was close, and had generally the right sort of idea. It was missing stand-alone times, GMT offsets, and seconds/milliseconds in the times, though.

It took me hours, but posting below what I eventually came up with
George Jonsson 29-Nov-15 18:38pm    
Well, it was intended as a nudge in the right direction, not as the perfect solution.
That is impossible without either seeing your sample texts or go through a lot of different formats, which is too much of an effort.
BillWoodruff 29-Nov-15 22:00pm    
+5 Good work, George; it may interest you to know that DateTime.TryParse can handle five out of the six test-cases that Midi_Mick lists ... you can see the results in my revised solution here.
George Jonsson 29-Nov-15 22:16pm    
Thanks Bill.
I am a big fan of the Parse and TryParse methods as well. It has saved me a lot of time using these methods instead of trying my own validation.
In this case however, the task was to find possible date-time strings within a texe and then regex is a good candidate.

But as you say, the actual validation is better done using TryParse.
Have a try with:
C#
Regex reg = new Regex(@"(\d+)[.](\d+)[.](\d+)");

OR,
C#
var regex = new Regex(@"\b\d{2}\.\d{2}.\d{4}\b");

OR,
C#
Regex regex = new Regex(
      ";(?<date>.+?)",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );

OR,
This one[^]
 
Share this answer
 
v3
This is what I finally arrived at:

C#
string test;

Regex re = new Regex(@"\b(?<datetime>" +
		// Date part
		@"((" +
			@"(\d{1,2}[\/\-\.]\d{1,2}[\/\-\.]\d{2}(\d{2})?)" +
			@"|(\d{1,2}\s+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-zA-Z]*,?\s+\d{2}\d{2}?)" + 
			@"|((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-zA-Z]*\s+\d{1,2},?\s+\d{2}\d{2}?)" +
			@"|(\d{4}\-\d{1,2}-\d{1,2})" +
		@")" +
		// Optional time part
		@"(" +
			@"[T\s]?\d{1,2}:\d{1,2}(:\d{1,2}(\.\d+)?)?(([AP]M)|([\+\-][12]?\d:\d{1,2})|Z)?" +
		@")?)" +
		// Stand alone time
		@"|(\d{1,2}:\d{1,2}(:\d{1,2}(\.\d+)?)?(([AP]M)|([\+\-][12]?\d:\d{1,2})|Z)?)" +
    @")\b", RegexOptions.IgnoreCase);

test = " 29/11/15 xxx 29 November 2015 6:27PM xxx 2015-11-29T18:27:45.50+10:00 xxx 2015-11-29 18:27:45.50Z xxx NOV 29, 2015 xxxx 21:15";

MatchCollection matches = re.Matches(test);

foreach (Match m in matches) {
	Console.WriteLine("{0}\t\t{1}", m.Groups["datetime"].Value, DateTime.Parse(m.Groups["datetime"].Value).ToString("O"));
}
 
Share this answer
 
v3
Comments
BillWoodruff 29-Nov-15 21:51pm    
Using DateTime.TryParse, five out of the six strings in your test can be successfully parsed into DateTime objects. See my revised solution for the code :)
George Jonsson 29-Nov-15 22:20pm    
And by using TryParse, you don't get an exception if the format cannot be parsed.
Instead you can check if TryParse returned true or false and take appropriate action instead of a try-catch mechanism.
Midi_Mick 29-Nov-15 23:13pm    
I actually get 6/6 - but I'm in Australia, where we use dd/MM/yy rather than MM/dd/yy. And because TryParse picks up on the localisation, it works when the format is localised.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900