|Thanks. Did you mean that I should replace the .NET RegEx with a 3rd party (DFA based) RegEx, or that I should have people use DFAs instead of RegEx?
Backtracking's not really an issue. Most of the time I'm just looking for one of the following:
Sometimes I use more complex options:
- String Contains Pattern (e.g., "Pattern")
- String Contains Pattern A or Pattern B or ... (e.g., "PatternA|PatternB|...")
- String Starts with Pattern (e.g., "^Pattern") or Ends with Pattern (e.g., "Pattern$")
- String Exactly matches Pattern (e.g., "^Pattern$")
And the ones that RegEx doesn't support:
- String Contains Pattern A or Pattern B (e.g., "Pattern(A|B)")
- String Matches product code (e.g., "P\d+(-\d+)?")
That's about as complex as it gets.
- String doesn't match one of the above (implemented by me as "!...")
- Numeric or Date comparison (implemented by me as "<10" or ">=1/1/2000" ...)
- Within range (currently not implemented except via regex starts with).
The main performance issue is that I'm using it for ad hoc live filtering of up to 3-4,000 records (filter changes as every character is typed) with the potential for filters on multiple fields, and I'm trying to keep it responsive (< 2 seconds worst case, preferably < 1/10 second).
So far, the performance is reasonable, if not ideal (using the native .NET REGEX), so I've not been highly motivated to change. A 3rd party drop in engine might work (my thoughts were more along the line of recognizing the simple cases and hard coding them, it's hard to beat string.IndexOf and other string intrinsics which can easily handle three of the first four cases.
I also use RegEx's for backend filtering (before it gets to the UI) and there I'm limited to what the database engine supports. Performance is generally pretty good; however, I wonder if there would be value in my detecting simple cases up front and converting them to different operations before sending to the backend. For example:
These patterns are also typically entered by users, but not live (they have to 'submit' the query). I already do some simple pattern manipulation, mostly adding a (?i) to the front as the engine is case sensitive by default and I'd rather it not be. This would be a bit more complex as I'd have to manipulate the operation, not just the pattern.
- Instead of 'Field matches regex "Pattern"' generate 'Field contains "Pattern"'.
- Instead of 'Field matches regex "^Pattern$"' generate 'Field == "Pattern"'.