Click here to Skip to main content
15,878,809 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
Hi to all,
I am starting my carrier with thi post ,I have task to count number of words existed in PDF file when uploading pdf file.Can any one help me how can I do it.Hope some one will help me to complete my first task in my life.


Thanks,
Neelam.
Posted

Dear Neelam,

Try to do some google search on this:- google.com[^]

the solution to you problem is in this link:-

http://stackoverflow.com/questions/6734374/get-only-word-count-from-pdf-document[^]

Hope this will help you out.

Thanks
 
Share this answer
 
v2
Comments
RaviRanjanKr 3-Jan-12 7:45am    
5+
 
Share this answer
 
Comments
RaviRanjanKr 3-Jan-12 7:46am    
5+
Karthik Harve 3-Jan-12 7:49am    
Thank You.
Hi neelamrathod,
welcome to codeproject.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text.pdf.parser;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            string InputFile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Input.pdf");

            //Get all the text
            string T = ExtractAllTextFromPdf(InputFile);
            //Count the words
            int I = GetWordCountFromString(T);

        }

        public static string ExtractAllTextFromPdf(string inputFile)
        {
            //Sanity checks
            if (string.IsNullOrEmpty(inputFile))
                throw new ArgumentNullException("inputFile");
            if (!System.IO.File.Exists(inputFile))
                throw new System.IO.FileNotFoundException("Cannot find inputFile", inputFile);

            //Create a stream reader (not necessary but I like to control locks and permissions)
            using (FileStream SR = new FileStream(inputFile, FileMode.Open, FileAccess.Read, FileShare.Read))
            {
                //Create a reader to read the PDF
                iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(SR);

                //Create a buffer to store text
                StringBuilder Buf = new StringBuilder();

                //Use the PdfTextExtractor to get all of the text on a page-by-page basis
                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    Buf.AppendLine(PdfTextExtractor.GetTextFromPage(reader, i));
                }

                return Buf.ToString();
            }
        }
        public static int GetWordCountFromString(string text)
        {
            //Sanity check
            if (string.IsNullOrEmpty(text))
                return 0;

            //Count the words
            return System.Text.RegularExpressions.Regex.Matches(text, "\\S+").Count;
        }
    }
}
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900