AI Form Recognition in Java, Part 2: Adding Image Upload to the Spring Boot App and Processing via Form Recognizer

Dawid Borycki

5.00/5 (1 vote)

Apr 19, 2022

CPOL

6 min read

3966

In this article, we’ll implement file upload functionality and connect it with Azure Form Recognition.

Download project files - 724 KB

Synopsis

This three-article series demonstrates how to use Azure Form Recognizer to build a realistic end-to-end form recognition app using Java and Spring Boot.

Part 1 — App Creation and Deployment via Azure App Service
Part 2 — Adding Image Upload to the Spring Boot App and Processing via Form Recognizer
Part 3 — Making Practical Use of Data Returned by Form Recognizer

The complete code repository is available on GitHub.

Introduction

We created our project and infrastructure in the first article in this series.

Here in the second article, we’ll modify the app to handle image uploads. In our case, the images contain receipts that will be sent to the Azure Form Recognizer for processing. We’ll then store three of the recognized fields (MerchantName, TransactionDate, and Total) in our PostgreSQL database, which we prepared in Part 1.

After completing the steps in this article, our app will be able to recognize receipts and output the selected fields to the terminal.

Our application will be capable of applying machine learning to solve a real-world business problem. Usually, business travelers need to provide scans of their receipts to receive reimbursement. Often, these scans are not processed automatically. Instead, accountants need to enter the data into another system. We shorten this process by automating receipt recognition and data ingestion. Let’s continue our app development to achieve this.

Setting up the Form Recognizer

First, we need to provision Form Recognizer in Azure. You can do this using the Azure CLI or Azure Portal. We’ll use Azure Portal. We use the search box to look up the Form Recognizers, and then click Create form recognizer button in the center of the Applied AI Services view. It will open another wizard:

In the Create Form Recognizer form, follow these steps:

Select DB-General as your Subscription.
Select recognition as your Resource group.
Choose an Azure Region for your instance. We set our region to East US.
Enter the globally unique name you set for your app when deploying to the Azure App Service in Part 1. We set our app name to db-receipt-recognizer-82.
Select a pricing tier. We set our tier to Free F0.

The Free F0 pricing tier enables us to use Azure Form Recognizer for free with a restriction of 500 pages per month. This is more than enough for our proof of concept. When setting up the Form Recognizer, ensure that you can access it from any network.

After the service is provisioned, navigate to Keys and Endpoint under Resource Management.

Then, open application.properties and add the last two lines of code in the following example:

logging.level.org.springframework.jdbc.core=DEBUG
 
spring.datasource.url=<YOUR_POSTGRES_URL>
spring.datasource.username=<YOUR_USERNAME>
spring.datasource.password=<YOUR_PASSWORD>
 
spring.jpa.hibernate.ddl-auto=update 
server.port=80
 
azure.form.recognizer.key=<YOUR_FORM_RECOGNIZER_KEY>
azure.form.recognizer.endpoint=<YOUR_FORM_RECOGNIZER_ENDPOINT>

This stores your endpoint and one of your two keys.

Finally, supplement the dependencies group of your pom.xml file to include the azure-sdk-bom and azure-ai-formrecognizer packages:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-sdk-bom</artifactId>
    <version>1.1.1</version>
    <type>pom</type>
    <scope>import</scope>
</dependency>
<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-formrecognizer</artifactId>
    <version>3.1.8</version>
</dependency>

Image Upload View

Let’s now modify the upload.html view.

In resources/templates/upload.html, add the code between the  tags in the following example to create a form that enables users to upload their images:

<body class="w3-black">
    
    <header class="w3-container w3-padding-32 w3-center w3-black">
        <h1 class="w3-jumbo">Form recognizer</h1>
        <p>Upload a new file</p>
    </header>
<!-- insert code starting here -->                
    <form class="w3-container w3-padding-32" 
          method="POST" 
          enctype="multipart/form-data" 
          action="/upload"
          style="margin:auto;width:40%">
                   
        <div class="w3-container" >
            <input type="file" name="file" />
        </div>
        <div class="w3-container">
    
        <input type="submit" value="Upload"/>
        </div> 
    </form>
<!-- end of code to insert -->
</body>
</html>

After rendering, the upload view looks like this:

Here, we only have one form element, which allows us to upload an image. In a real-world scenario, we might supplement the form with elements like a field to submit the user’s name, or proper validation to limit the image size, image format, et. al.

Handling Image Upload

Next, we need to implement the logic that handles the image upload on the controller end. This requires us to send the image for recognition after authorization within the Azure Form Recognizer service.

To make our code more generic, let's use Azure Form Recognizer’s key and endpoint from application.properties. To obtain these values at runtime, we’ll modify the FormRecognitionController by adding two fields. You can refer to the companion code to see the full class.

In FormRecognitionController class, add the following code:

@Value("${azure.form.recognizer.key}")
private String key;

@Value("${azure.form.recognizer.endpoint}")
private String endpoint;

These two fields use the @Value attribute to retrieve the corresponding values from the application.properties. Note that we only need to provide the property name as the argument of the @Value attribute.

Then, we need to implement the controller’s method, handleFileUpload, which will be invoked whenever the user submits the image upload form.

Add the following code to FormRecognitionController:

@PostMapping("/upload")        
public String handleFileUpload(@RequestParam("file") MultipartFile file) {  
      
    // Create FormRecognizerClient
    FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
        .credential(new AzureKeyCredential(key))
        .endpoint(endpoint)
        .buildClient();
 
    try (InputStream receiptImage = file.getInputStream()) {
        SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = 
            formRecognizerClient.beginRecognizeReceipts(receiptImage, file.getSize());
        
        List<RecognizedForm> recognizedForms = syncPoller.getFinalResult();
        
        // Check if we have at least one form
        if(recognizedForms.size() >= 1) {                
            // Get recognized form
            final RecognizedForm recognizedForm = recognizedForms.get(0);
                            
            // Extract fields
            RecognitionResult recognitionResult = ExtractFormFields(file, recognizedForm);
 
            // Store result
            resultsRepository.save(recognitionResult);
 
            // Debut results
            System.out.println("\n\n--== Recognition result ==--\n\n" 
                + recognitionResult.toString()); 
        }
        
    } catch (IOException e) { 
        e.printStackTrace();
    }
 
    return "index";        
}

First, the method creates an instance of the FormRecognizerClient using the FormRecognizerClientBuilder. The client builder requires our Azure Form Recognizer credentials, which we already have available to retrieve from the key and endpoint fields.

Then, the handleFileUpload method starts an asynchronous receipt recognition. To that end, we call beginRecognizeReceipts of the FormRecognizerClient class instance. The beginRecognizeReceipts method takes two arguments: the input stream containing the uploaded image, and the file size.

On the back end, beginRecognizeReceipts sends our image to Azure Form Recognizer for processing. The underlying process uses the default, pre-trained machine learning model, which recognizes specific elements of receipt images. In this case, we only pass one image, but you can send multiple images at once. When beginRecognizeResults completes, we need to retrieve and interpret the recognition results.

Retrieving Recognition Results

The beginRecognizeResults method returns a collection of instances of the RecognizedForm class. The RecognizedForm class has several properties, but we are mostly interested in the fields member:

private final Map<String, FormField> fields;

This member stores the names of recognized elements and their values as instances of the FormField class. The FormField class has several members used to interpret the recognized element of our receipt:

public final class FormField {

    private final float confidence;
    private final FieldData labelData;
    private final String name;
    private final FieldValue value;
    private final FieldData valueData;
 
    // Constructor, getters and setters
}

As we can see, we can optionally retrieve the field name, value, and even the prediction confidence. This is particularly important if the receipt image is low quality and causes poor or unreliable predictions. In a case like this, we might want to reject recognitions that have confidences below a certain threshold and then inform our submitter to re-upload the image.

To extract the actual values of the recognized fields, you can proceed in one of two ways.

The first approach is to use the strongly typed Receipt class. Its constructor takes an instance of RecognizedForm and creates an object with properties that are mapped to the corresponding elements in the receipt image:

public final class Receipt {
 
    /**
     * List of recognized field items.
     */
    private List<ReceiptItem> receiptItems;
 
    /**
     * Recognized receipt type information.
     */
    private ReceiptType receiptType;
 
    /**
     * Recognized field merchant name.
     */
    private TypedFormField<String> merchantName;
 
    // …
}

The other approach is to parse the form fields manually:

// Retrieve total                
Map<String, FormField> recognizedFields = recognizedForm.getFields();
FormField totalField = recognizedFields.get("Total");
if (totalField != null) {
    if (FieldValueType.FLOAT == totalField.getValue().getValueType()) { 
        recognitionResult.setTotal(totalField.getValue().asFloat());
    }
}

For this tutorial, we’ll combine both approaches in handleFileUpload by using the ExtractFormFields method. To do this, we need to supplement the project with Receipt.java. We use this class to extract the MerchantName and TransactionDate fields, and we parse the form fields manually to extract the Total field.

The ExtractFormFields method looks like this:

private RecognitionResult ExtractFormFields(MultipartFile file, 
    final RecognizedForm recognizedForm) {
 
    RecognitionResult recognitionResult = new RecognitionResult();
 
    Receipt receipt = new Receipt(recognizedForm);
    
    // Set receipt file name based on the upload image name
    recognitionResult.setReceiptFileName(file.getOriginalFilename());
 
    // Get Merchant name and transaction date
    recognitionResult.setMerchantName(receipt.getMerchantName().getValue());
    recognitionResult.setTransactionDate(receipt.getTransactionDate().getValue());
    
    // Retrieve total                
    Map<String, FormField> recognizedFields = recognizedForm.getFields();
    FormField totalField = recognizedFields.get("Total");
    if (totalField != null) {
        if (FieldValueType.FLOAT == totalField.getValue().getValueType()) {
            recognitionResult.setTotal(totalField.getValue().asFloat());
        }
    }
 
    return recognitionResult;
}

The ExtractFormFields helper method returns an instance of the RecognitionResult class, which we implemented in Part 1 of this tutorial. Once we have an instance of the RecognitionResult, we store it in the database. You can refer to the handleFileUpload method in the FormRecognitionController in the companion code.

In FormRecognitionController, add the following code:

resultsRepository.save(recognitionResult);

To test the solution, compile your app using mvn clean install. Then, run your app using mvn spring-boot:run.

Next, go to the upload view to recognize a receipt image. We are using the example image in the images/ folder in the companion GitHub repository. The recognition result appears in the output window of your IDE.

Summary

In this part of the tutorial, we learned how to extend the Spring Java app with image uploads. We added the form and controller method we need for this functionality.

Additionally, we used two different approaches to upload receipt images, which were sent for recognition to an instance of the Azure Form Recognizer. The recognition results were stored in the PostgreSQL database deployed to Azure.

In the final part of this tutorial, we will use the display of the recognition results.

To learn more tips for the easiest ways to deliver Java code to Azure and other clouds, check out the webinar Azure webinar series - Delivering Java to the Cloud with Azure and GitHub.