|So the real problem is the OCR solution.
The reality is that you are unlikely to be able to deal with all of the error cases. Your solution might introduce more errors.
So it is a trade off.
If error reduction is considered a significant issue, then perhaps better to look into getting a different OCR solution and use both of them. Then compare the output from both and only apply fixes when there is a difference.
If as I said it is a significant problem then any additional cost should not be a problem. But if the cost is a problem then perhaps it isn't as significant as thought.