Click here to Skip to main content
15,886,067 members
Articles / Web Development / Apache
Tip/Trick

PDF Extraction Tool (PET)

, , ,
Rate me:
Please Sign up or sign in to vote.
4.82/5 (7 votes)
3 Dec 2015CPOL2 min read 28.2K   578   16   6
It is a tool to extract desired information from Pdf documents, we have developed it in context of extracting information of an individual from E-Aadhar (it is a unique identity issued by Govt. of India). User can modify the code according to desired Pdf documents.

Introduction

The developed tool is a Java application. It can run on any platform supporting Java Run time Environment (JRE) Framework. The developed tool extracts text from password protected Pdf documents and can store the required information in database. (We have developed this tool with the feature of extracting an individuals information from E-Aadhar and storing it in Microsoft Access database.

Therefore, the main objectives of PET are as follows:

  • Extraction of text and images: It allows to extract text and images from password protected Pdf documents.
  • Storing data in database: It stores the data extracted in database (Microsoft Access) which creates automatically as user clicks save button.

Overview of PET

The proposed and developed PET is a tool that allows user to extract the desired information and images from Pdf documents (with or without password) and stores it in database. The user can also retrieve the desired information from the database. We have developed this in context of extracting an individuals' information from E-Aadhar (protected with password) and storing it in Access Database.

Operations of PET

As the user installs the application, the following interface will appear:

Image 1

Fig. i. Main interface of application

The main operations of PET involve the following steps:

  1. Selecting the Pdf file: As the user clicks on File button, a dialog box will open allowing the user to select the file. (Fig. ii)

    Image 2

    Fig. ii. Selecting a Pdf file to open

    The user will have to provide the password if the file is password protected. In our case, the password is Pin-Code of the user. (Fig. iii)

    Image 3

    Fig. iii. Entering credentials
  2. Digitization of Information from Pdf: As the file is uploaded, the application will extract the desired information from the file and will show it in the new window.

    In our case, we are extracting the information of an individual from E-Aadhar as shown in fig. iv.

    Image 4

    Fig. iv. Displaying Extracted information from Pdf
  3. Storing the information in Database: If the user wants to save the data, he can click on save button, as the user clicks on save button, a database will be created in the folder (Aadhar Data in our case).

    Image 5

    Fig. v. Location where database is created. (C:\Aadhardata)
  4. Retrieving the data from Database: If the user wants to retrieve the information, he can fill in the Unique key value to get the desired data, we have taken the Aadhar no. of user as the Key value. As in Fig vi, user inputs the value and the desired data is fetched from the database (Fig. vii).

    Image 6

    Fig. vi

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Writer KNIT, Sultanpur
India India
Dr. Neelendra Badal is an Assistant Professor in the Department of Computer Science & Engineering at Kamla Nehru Institute of Technology (KNIT), Sultanpur (U.P), INDIA. He received B.E. (1997) from Bundelkhand Institute of Technology (BIET), Jhansi in Computer Science & Engineering, M.E. (2001) in Communication, Control and Networking from Madhav Institute of Technology and Science (MITS), Gwalior and PhD (2009) in Computer Science & Engineering from Motilal Nehru National Institute of Technology (MNNIT), Allahabad. He is Chartered Engineer (CE) from Institution of Engineers (IE), India. He is a Life Member of IE, IETE, ISTE and CSI-India. He has published about 30 papers in International/National Journals, conferences and seminars. His research interests are Distributed System, Parallel Processing, GIS, Data Warehouse & Data mining, Software engineering and Networking.

Written By
Student
India India
I am student of Computer Science and Engineering at Kamla Nehru Institute of technology,Sultanpur,U.P.,India.

Written By
Systems Engineer Tata Consultancy Services
India India
I am a Software Engineer specializing in the ability to grasp new technologies quickly and implement them to create applications that are used by people and organizations to achieve their goals. I have a reputation for challenging the prevalent demanding methods of development and coming up with better and simpler solutions to achieve the aim.

Written By
India India
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Questionthe eaadhar is not working Pin
Bhargava Ganti15-Jun-17 22:23
professionalBhargava Ganti15-Jun-17 22:23 
Questionnot working with eadhar Pin
Member 119548782-Sep-15 17:42
Member 119548782-Sep-15 17:42 
QuestionLibraries Pin
Amit Patel18-May-15 3:52
Amit Patel18-May-15 3:52 
QuestionThe source code and file is not available to download Pin
IVPL@Y10-May-15 21:04
IVPL@Y10-May-15 21:04 
SuggestionSource code availability. Pin
joekor9999999995-May-15 17:17
joekor9999999995-May-15 17:17 
GeneralRe: Source code availability. Pin
Prince Verma15-May-15 5:00
Prince Verma15-May-15 5:00 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.