|
Don't forget the characters that include diacritical marks.
E.g., ö Å ç
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
|
|
|
|
|
Is there a way to check for that without having to list every Unicode character? I didn't see any accented names in our database but that certainly doesn't mean it can't happen in the future.
I'd prefer to not include all Unicode characters. Just the ones with a high likelihood of showing up. I imagine that it could only be characters that would be accepted by Active Directory.
|
|
|
|
|
At least with the .NET Regex
http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^]
(I don't know about others)
you can specify the Unicode character category (for "Letter") so your regex would be:
^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$
possibly even just
^[\p{L}\-\s']+$
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
|
|
|
|
|
After looking at that link, a person could go crazy trying to catch every possibility. Looks like regex can be very thorough!
Thanks for the help!
|
|
|
|
|
Yes!!
There's a reason the "Mastering Regular Expressions" book[^] is 496 pages!!!
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
|
|
|
|
|
How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain.
^[a-zA-Z\-\s']+$
|
|
|
|
|
^[a-zA-Z\-\s']+\.$
Add the \. right before the $
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
|
|
|
|
|
That works perfect. I'm really starting to get the hang of this.
|
|
|
|
|
Checkout the Expresso[^] tool (free) to explore regular expressions!
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
|
|
|
|
|
Right, that is actually the tool I'm using. I bumped into it a couple of years ago but this is the first time I ever used regex.
|
|
|
|
|
Well, this pattern was working yesterday on a different computer at work. I installed Expresso on my personal computer so I could work on my project over the weekend and now the pattern is not working.
^[a-zA-Z\-\s']+\.$
john1 = no matches
The pattern should match the number one because numbers are not allowed but the results are blank when I run this pattern. I could have sworn that this was working yesterday.
EDIT:
I did some further testing and discovered that the \. is breaking the pattern. If there is no period at the end; then count = 0. This pattern seems to require the period at the end and then it works correctly. The period should be allowed 0 or 1 times at the end of the string.
So the pattern below is working the way I want it to in Expresso but not when I use it in an HTA using vbscript to do the pattern matching. Vbscript is throwing an error at the line where the pattern is executed.
^[a-zA-Z\-\s']+?\.$
Not sure how to make a pattern that works in Expresso to also work with vbscript.
SOLUTION:
^[a-zA-Z\-\s']+?\.$ This pattern works when testing in Expresso but doesn't work with vbscript although this may work when used with other languages.
^[a-zA-Z\-\s']+\.{0,1}$ This is the pattern that behaves the same way as the pattern above but also works with vbscript.
MATCHES:
Jones
Jones-Smith
Jones Smith (no hyphen)
O'Leary
Van Allen (no hyphen)
Vander Ark (no hyphen)
Jones Sr.
Although this doesn't address diacritical characters, a few conversations with colleagues resulted in the decision that the risk is very low that they will be used in Active Directory. We currently have only 3 techs making entries into AD so informing them of how this pattern works will reduce the risk even further. I have worked for my organization for 14 years and no diacritical characters have been used until now so I feel pretty safe in not testing for them. It may not be the ultimate approach such as selling a product to the public but it does meet the needs of the specifications that were given to me.
Thank you! - I'd like to give a shout out to everyone who helped me out with this project! I really appreciate all of you taking the time to steer me in the right direction! I would go as far as to say that CodeProject could be just as valuable as sitting in any classroom. You may not get a certification here but the knowledge gained is invaluable. I was able to gain a solid understanding of regex in a matter of a few hours. I watched several videos but I would say this forum helped out the most because it specifically dealt with the solution that I was attempting to resolve.
modified 12-Oct-14 10:25am.
|
|
|
|
|
robwm1 wrote: ^[a-zA-Z\-\s']+?\.$
This was so.... close.
When I suggested the \. I forgot the conditional aspect of the the dot at the end. (Sorry.)
Just move the ? to be after the \.
^[a-zA-Z\-\s']+\.?$
the ? means exactly the same thing as {0,1}
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
|
|
|
|
|
I never thought to move the ? to the end. You're right though, it is the same result as {0,1}.
Thanks again!
|
|
|
|
|
I'd be awfully surprised if the only characters allowed in Active Directory worldwide are the basic ASCII-ish letters.
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
|
|
|
|
|
I know we have a least one person that has an accented 'e' in their last name but it's not that way in Active Directory. I don't know if that is due the person making the entry didn't know how to make the accented character or it was disallowed. I'll definitely research to be sure before I make a final decision to leave it out. I will post my findings here.
|
|
|
|
|
Hi,
I created an HTA that requires First Name, Last Name, and username to be entered. I am working on the First Name validation first.
The First Name should only be alpha characters but may include a hyphen. No numbers or symbols (besides hyphen) should be found in any position of the string being tested. I did find one user with a hyphen in the first name though so I need to allow that symbol. My approach has been to look for matches that are not alpha characters. If there is a match, I display a warning that tells the user to enter only alpha characters. Here is the regex pattern that I am testing:
[^a-zA-Z]+$
When I test this pattern, it is unable to detect a number or symbol (including hyphen) if it is in any position other than the end of the string. The pattern I posted here doesn't allow for a hyphen so I need to fix that as well.
What should this regex pattern look like if I want to detect anything other than an alpha character regardless of where it occurs in the string being tested?
Thanks,
Rob
modified 9-Oct-14 19:04pm.
|
|
|
|
|
You can try this
^[^a-zA-Z\-]+$
The hyphen is a keyword in regular expressions so you need to escape it with \-
This is a pretty good site to learn about regex. Regular-Expressions.info[^]
|
|
|
|
|
That's only going to detect a string that contains nothing but the disallowed characters.
For example, "1.2" will match, but "1.2a" will not.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
You are right. Forgot to check that.
I usually stay away from negations like that. It usually contains traps.
Your solution is probably better.
|
|
|
|
|
What would be considered the best approach then? I thought it made sense to look for what is disallowed and look at the count property. If the count property is > 0, then the data entered needs to be corrected.
This is my first time using regex so I am unaware of what would be considered best practice. I spent all day yesterday studying about regex to learn about and used Expresso to play around with possibilities. Like most programmers, I would prefer to follow best practices.
|
|
|
|
|
To match the characters that aren't allowed, try:
[^a-zA-Z\-]
To validate that the string doesn't contain any disallowed characters, use:
^[a-zA-Z\-]+$
For the HTML5 pattern attribute[^], use:
<input type="text" name="FirstName" required pattern="[a-zA-Z\-]+" />
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
I tried ^[a-zA-Z\-]+$ using a string like:
10fred$erick jones
The search results didn't catch any of the disallowed characters. Am I not understanding how this pattern should work? It should have matched 10 $ and the space between the names. I'm using a utility called Expresso to pretest for results.
|
|
|
|
|
The expression ^[a-zA-Z\-]+$ will only match the string if it doesn't contain any disallowed characters.
Since the string 10fred$erick jones contains disallowed characters, it will not match that pattern.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
So in this case, if count = 0, then I should warn the user, correct? Is this considered best practice in doing it from this approach? I did try using a proper and expect input and it matches every character so I see what you're talking about.
|
|
|
|
|
Yes, if the string doesn't match that expression, then it's not valid.
The HTML5 pattern attribute[^] works in the same way - if the entered string doesn't match the pattern, then it's not valid.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|