Click here to Skip to main content
15,885,366 members
Articles / Artificial Intelligence
Tip/Trick

Getting PaddleOCR and PaddlePaddle to work in Windows, Ubuntu and macOS

Rate me:
Please Sign up or sign in to vote.
4.20/5 (2 votes)
21 Nov 2022CPOL4 min read 10.6K   3  
Working through the combinations to get PaddlePaddle and PaddleOCR installed and working everywhere

Introduction

OCR is one of those standard AI vision-based features we're all familiar with. OCR is now so ubiquitous that it's built into cellphone operating systems and we barely notice. Obviously we needed to add it to CodeProject.AI Server but doing so was a bit of an adventure.

There are a ton of OCR projects and packages, and with Mike Lud leading the charge in picking the most accurate, we settled on PaddleOCR. PaddleOCR is based on the excellent PaddlePaddle (PArallel Distributed Deep LEarning) package. Unfortunately PaddleOCR and PaddlePaddle can be a challenge to get working, so here's a quick rundown of what we did:

Getting PaddlePaddle and PaddleOCR setup in Python

  1. We ignored the docs. PaddleOCR evidently supports CUDA 10.6, but PaddlePaddle (which PaddleOCR needs) evidently only supports CUDA 10.1. Except where it states it supports 10.6. We have it running on CUDA 11.7. This confusion is pretty standard for evolving projects that are trying to target Nvidia. Anyone targeting Nvidia is brave, or they know how to lock their system to a specific set of drivers, toolkits and libraries. It's a mess.
     
  2. We lived in Google Translate for 3 days. PaddlePaddle was developed by Baidu, the Chinese tech giant, and Open Sourced in 2016. Since it's based in China the forums are generally not English.
     
  3. We experimented, read, tested, and adjusted and ended up with the following setup matrix for installing the Python packages:
     
    OS CPU GPU
    Windows paddlepaddle==2.3.2
    paddleocr>=2.0.1
    --find-links https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
    paddlepaddle-gpu==2.3.2.post116

    paddleocr>=2.0.1
    Ubuntu paddlepaddle==2.4.0rc0
    paddleocr>=2.6.0.1

    No success

    macOS (Intel) paddlepaddle==2.3.2
    paddleocr>=2.0.1
    Not supported
    macOS (Arm64) paddlepaddle==2.3.2
    paddleocr>=2.0.1
    Not supported

     
  4. Patched an ugly hack in the PaddlePaddle code to get things working.

    In the paddle package under your site-packages folder in your Python installation (or virtual environment) you'll find a folder dataset, and within that the file image.py. Line #37 has a FIXME for the ugly hack to fix an issue with numpy when importing OpenCV. They import OpenCV by spinning up a new Python interpreter and import OpenCV directly:
    Python
     43      interpreter = sys.executable
     44      # Note(zhouwei): if use Python/C 'PyRun_SimpleString', 'sys.executable'
     45      # will be the C++ execubable on Windows
     46      if sys.platform == 'win32' and 'python.exe' not in interpreter:
     47          interpreter = sys.exec_prefix + os.sep + 'python.exe'
     48      import_cv2_proc = subprocess.Popen(
     49          [interpreter, "-c", "import cv2"],
     50          stdout=subprocess.PIPE,
     51          stderr=subprocess.PIPE,
     52          shell=True)
     53      out, err = import_cv2_proc.communicate()
     54      retcode = import_cv2_proc.poll()
     55      if retcode != 0:
     56          cv2 = None
     57      else:
     58          import cv2
     59  else:
     60      try:
     61          import cv2
     62      except ImportEr:...
    

    That's some creative problem solving.

    The issue is the process is spun up using Popen with Shell=False (specifically, they just let Shell take the default value, which is False). In the code sample above, at line 52, you can see the fix. For Windows and Ubuntu you need to have Shell=True, otherwise the import fails. For macOS Shell=False is fine (both Intel and Arm64).

    Evidently this hack was only for Ubuntu, but it's in the Paddle code for all operating systems, so it needs to be fixed for both Windows and Ubuntu.

And with that we have PaddleOCR running. It's not very fast on a CPU. 10-15 seconds for half a page of text, but turn on GPU and it's 200ms or so. Totally useable and very accurate.

The next release of CodeProject.AI Server will include an option to install OCR using PaddleOCR. Our project is for the first week of December.

Postscript: GPU support for PaddlePaddle in Ubuntu under WSL

This one has defeated us so far. PaddleOCR (CPU only) in Ubunut 22.04 on WSL works fine. GPU enabled does not. One major issue with WSL is you need to install CUDA in WSL rather than rely on the Windows CUDA drivers doing their thing. A write-up can be found here. Once you have CUDA installed, test it be opening a Python terminal and entering

Python
import paddle
paddle.utils.run_check()

If this is all good then you can look at installing the Paddle packages, our best effort so far is using

-f https://paddlepaddle.org.cn/whl/stable/noavx.html
paddlepaddle-gpu==2.4.0rc0
paddleocr==2.6.0.1

This installs succesfully and will at least launch (other combinations crash half a dozen different ways) but you end up with the very issue that the hack (above) was meant to solve: namely, you will get a segfault when trying to run PaddleOCR due to a numpy bug caused by importing OPenCV. The hack is meant to solve this by importing OpenCV in a spawned Python process. This process isn't successful in ubuntu 22.04 (at least for us) so it falls back to a classic "import cv2" which then leads to the segfault.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Founder CodeProject
Canada Canada
Chris Maunder is the co-founder of CodeProject and ContentLab.com, and has been a prominent figure in the software development community for nearly 30 years. Hailing from Australia, Chris has a background in Mathematics, Astrophysics, Environmental Engineering and Defence Research. His programming endeavours span everything from FORTRAN on Super Computers, C++/MFC on Windows, through to to high-load .NET web applications and Python AI applications on everything from macOS to a Raspberry Pi. Chris is a full-stack developer who is as comfortable with SQL as he is with CSS.

In the late 1990s, he and his business partner David Cunningham recognized the need for a platform that would facilitate knowledge-sharing among developers, leading to the establishment of CodeProject.com in 1999. Chris's expertise in programming and his passion for fostering a collaborative environment have played a pivotal role in the success of CodeProject.com. Over the years, the website has grown into a vibrant community where programmers worldwide can connect, exchange ideas, and find solutions to coding challenges. Chris is a prolific contributor to the developer community through his articles and tutorials, and his latest passion project, CodeProject.AI.

In addition to his work with CodeProject.com, Chris co-founded ContentLab and DeveloperMedia, two projects focussed on helping companies make their Software Projects a success. Chris's roles included Product Development, Content Creation, Client Satisfaction and Systems Automation.

Comments and Discussions

 
-- There are no messages in this forum --