|
Fully agree! This is mission impossible. How can one know that "BOS" should be "BOX" and not "BOSS" or "BOSSA NOVA"? Keep it simple and no risk, no fun!
|
|
|
|
|
Maximilien wrote: You really need to parse addresses ?If you start doing that, there will always be outliers that you will miss.
Sadly yes. And outliers are acceptable as we're trying to fill in some form fields that break out address, PO Box, and Rural Routes, and if everything fails, the address just gets put into the Address1 field.
We're aiming for improvement rather than perfection.
Marc
Latest Article - Create a Dockerized Python Fiddle Web App
Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Artificial intelligence is the only remedy for natural stupidity. - CDP1802
|
|
|
|
|
When we put our mail on vacation hold, it validates and 'normalizes' the address, so I do understand what you're working with.
Where I grew up, our address was RR#1; it wasn't until I was in my teens that we had an address with a number and street name.
So.. consider this.. are you only dealing with P.O. and its variants or do you have R.R. addresses as well?
|
|
|
|
|
RR, CR, HC, etc., as well as regular street addresses (as best as those are).
Perfect accuracy is not necessary, just best guess.
Marc
Latest Article - Create a Dockerized Python Fiddle Web App
Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Artificial intelligence is the only remedy for natural stupidity. - CDP1802
|
|
|
|
|
Well, then just parse the city & state/province and geocode to the center of that.
|
|
|
|
|
Tim Carmichael wrote: RR#1
Rolls Royce #1?
Homeless billionaire?
[sidebar]
Reminds me of The Bumpkin Billionaires which I used to read as a kid.
|
|
|
|
|
|
I think the counties try to eliminate RR addresses when they implement 911 emergency service.
Ambulance dispatcher: Code red, RR 23, box 99
...
Ambulance navigator: We are on the correct Route... 1 mailbox, 2 mailbox, etc...
|
|
|
|
|
Excellent point. Are there services that allow you to force user input validation of addresses against the USPS databases?
|
|
|
|
|
Marc Clifton wrote: The one with the 'K' is interesting. 'K' is on the opposite side of the keyboard -- I can understand the 'S'.
Maybe this happened to your user?
Just kidding - O and K are nearby, so he probably hit K accidentally along with BO and missed the X
Or maybe he went for BOKS and missed the S, who knows?
Cheers,
विक्रम
"We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread
|
|
|
|
|
See, isn't programming fun!?
Jeremy Falcon
|
|
|
|
|
I smell OCR in the mix - hence the BOK, BOS, B0X, etc.
Software Zen: delete this;
|
|
|
|
|
Gary Wheeler wrote: I smell OCR in the mix - hence the BOK, BOS, B0X, etc.
Ah - excellent point!
Marc
Latest Article - Create a Dockerized Python Fiddle Web App
Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Artificial intelligence is the only remedy for natural stupidity. - CDP1802
|
|
|
|
|
|
|
And I thought parsing dates that had been defined as strings or ints or decimals was a nightmare.
Maybe there is a need for an AI to second guess what the user might have meant.
We're philosophical about power outages here. A.C. come, A.C. go.
|
|
|
|
|
Woah... haven't seen you in a long time Chris. How's it going these days?
Cheers,
विक्रम
"We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread
|
|
|
|
|
i'm here occasionally. not constantly, as previously.
it goes... on and on and on and on.
|
|
|
|
|
I still remember your old profile pic - with hand on your thoughtful face. Got it somewhere?
Cheers,
विक्रम
"We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread
|
|
|
|
|
Vikram A Punathambekar wrote: Got it somewhere
He probably has his face at the same bbody place you have yours.
|
|
|
|
|
Smarty pants
Cheers,
विक्रम
"We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread
|
|
|
|
|
Randomly throw them to various fields. They might not be bright enough to notice.
"It is easy to decipher extraterrestrial signals after deciphering Javascript and VB6 themselves.", ISanti[ ^]
|
|
|
|
|
We have several times received paper mail where the entire name/address is no more than an alphabet soup - yet it is delivered to us no more than one day delayed.
First time this happened we were really puzzled: How could the mailman know that the mail is intended for us? (It is!) Finally we realized that a keyboard "Left shift" operation would give our name and address correctly. Later, we have seen both right and left shifts, of one hand or both hands. I asked a mail guy about it, and he confirmed that is is well known: If name/address looks like alphabet soup, chances are 9 in 10 that a keyboard shift changes it to a sensible address.
Maybe you should include full and partial (i.e. one-hand) right and left shifts in your user input parsing. But don't expect the shift machine instructions to be of great help for this task
|
|
|
|
|
I did an mailing list cleanup like this in the Jurassic era using dBase ][. I ended up trimming excess blanks, doing upper/lower case normalization and translation table lookup for common variants to translate. I don't remember how I identified exceptions back then, but now I'd use a dialog with options to add a option to manually correct, ignore (add to lookup as IGNORE string), add a translation record.
Then there is the problem of dealing with addresses foreign to your country ... whew!
Yup, this a problem to be managed, not solved, if unfiltered inputs are continuously added.
|
|
|
|
|
My guess on the "K" is that some robot filled it in based on a record created via OCR.
The United States Post office has a service you can use to "normalize" addresses. I suspect that each country has something similar.
There is probably a service provider that aggregates all of these normalization services into one spot. (Amazon?)
|
|
|
|