Click here to Skip to main content
15,891,607 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi all,

I want to remove the unicode characters like ð from my data...

I used this expression

preg_replace('/[^(\x20-\x7F)]*/','', $row['xmlFeed'])

but it didn't remoove all the unicode characters....Please help me to remove all the unicode

characters from my data
Posted

1 solution

Looking at a similar question here[^], I think what you want is (most closely resembling what you've written) is:
PHP
preg_replace('/[^\x{20}-\x{7F}]/u','', $row['xmlFeed'])


However, if you truly want to keep all non-Unicode characters, the characters with values 0x00-0x19 are technically valid as well, so you might want /[^\x{00}-\x{7F}]/u.

Also some tips on regex in general: don't use parenthesis inside of [] unless you mean to include/exclude parenthesis characters, their meaning changes inside character classes. (For example, /[(a-z)]+/ would match all lowercase English letters and ( and ), so it would match the entire string "a(b)c".) In this case, you don't need the * because you want to replace any single character occurrences, from what I could find, PHP does global matching by default, so it will already match all of them. If it didn't, you'd just get the first run of them anyways. Either way, it doesn't help, especially when you consider there are no other constraints to the pattern, this can result in much slower results on some implementations of regex (it will replace every empty string it encounters with the replacement text as well, so for /a*/ replaced with "c" in the string "aaaaabba" can result in "cbcbc", because the empty string between the two b's can be considered a valid match).
 
Share this answer
 
Comments
kirthikaganesamurthy 27-Jun-13 1:10am    
Specified expression replace my entire data but i want to remove only unicode characters from

my data...pls provide an optimal way to do this
lewax00 27-Jun-13 9:58am    
Are you sure your data isn't entirely non-ASCII characters (this is a more accurate term for what you want, technically all of them are Unicode)? Also, what format of Unicode is it? UTF-8, UTF-16...?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900