|
I am trying to find a way to read the text out of a PDF file as a search facility. This is a requirement that a client has asked me to quote on. My application is written in C#.
I am having a lot of difficulty trying to find any examples or even .NET components that I could purchase to do this task.
This application will be running on a server with very limited permissions. I have no ability to install standard COM components.
Hope someone can help.
Enjoy
Craig
|
|
|
|
|
PDF'S wern't really made for people to be able to rip text contents. However I did see somthing for .NET and PDF'S that I "think" can read them. I am not sure I never tested this out but just remember the link.
http://sourceforge.net/projects/pdflibrary/[^]
The project is now defunct but maybe you can use some of the code.
Hope this helps
Matthew Hazlett
Windows 2000/2003 MCSE
Never got an MCSD, go figure...
|
|
|
|
|
Thanks for your help.
I looked into that product, which was stopped at a very green age. There is also itextsharp.sourceforge.net[^] which seems to be extensive in its generation ability but specifically points out the same as you have, that PDF's aren't made to rip text.
So the solution I have found that works for me is that Adobe has a filter for Microsofts indexing server that allows for searching through PDF files. See http://www.adobe.com/support/salesdocs/1043a.htm[^] for more information. I can use Chris Mauders article on using indexing server as a search facility to code the rest.
Enjoy
Craig
|
|
|
|
|
Actually, PDFs weren't made to stop "ripping of text", but to solve a common problem - to provide a standard format for delivering rich content on the web (among other media).
You can get the text because the text is available in PDFs unless the page is one giant image (which is rare).
There's many ways to get text. One easy way is to install Adobe PDF IFilter 5.0[^]. An IFilter is an interfaces that COM servers implement to facilitate searching of text. Office installs their own implementation, and Windows 2000 and higher have default IFilter implementations for searching text files, HTML documents, and several other common formats.
While this would be easiest to use in C++, you can P/Invoke the necessary APIs and redeclare the interfaces so that you can use them in C#.
There is an example that gets the IFilter for a doc (the system provides the right implementation, so you could easily replace the .doc filename with a .pdf filename) here: http://sqljunkies.com/weblog/acencini/posts/716.aspx[^].
Microsoft MVP, Visual C#
My Articles
|
|
|
|
|
|
I have a library FPLIB.DLL with have the following function:
DLLEXPORT DWORD WINAPI FPMGetImage(BYTE* buffer);
How I declare my access in C#? What is byte* ?
[DllImport("fplib.dll", EntryPoint="FPMGetImage",
SetLastError=true,
CharSet=CharSet.Ansi,
ExactSpelling=true,
CallingConvention=CallingConvention.StdCall)]
public static extern int FPMGetImage( buffer);
Thanks for all help!
Alexsander "Axia" Antunes
|
|
|
|
|
byte* is a pointer to a byte structure.
And need to be compiled in an unsafe block.
You can try using ref in the decleration to get around the pointer.
Matthew Hazlett
Windows 2000/2003 MCSE
Never got an MCSD, go figure...
|
|
|
|
|
Have you an example of API Windows with use BYTE*?
Because I use API Viewer 2003 and I can compare the sintax.
Alexsander "Axia" Antunes
|
|
|
|
|
I asked somthing like this the other day, heres what Heath Stewert told me:
>Instead of passing byte* as the parameter, declare your parameter using either ref or out
>for value types (like a Byte). This is the most common method.
>
>For instance, if the C functions is declared like so:void SomeFunc(byte* b);
>...declare your method like so:[DllImport("...")]private static extern void SomeFunc(ref byte b);
>Microsoft MVP, Visual C#
>My Articles
Matthew Hazlett
Windows 2000/2003 MCSE
Never got an MCSD, go figure...
|
|
|
|
|
OK!
<br />
[DllImport("fplib.dll", EntryPoint="FPMGetImage", <br />
SetLastError=true, <br />
CharSet=CharSet.Ansi, <br />
ExactSpelling=true,<br />
CallingConvention=CallingConvention.StdCall)]<br />
public static extern int FPMGetImage([In, Out] byte[] buffer);<br />
This code run perfectly! THANKS FOR ALL!!
Matthew Hazlett wrote:
I asked somthing like this the other day, heres what Heath Stewert told me:
>Instead of passing byte* as the parameter, declare your parameter using either ref or out
>for value types (like a Byte). This is the most common method.
>
>For instance, if the C functions is declared like so:void SomeFunc(byte* b);
>...declare your method like so:[DllImport("...")]private static extern void SomeFunc(ref byte b);
>Microsoft MVP, Visual C#
>My Articles
Matthew Hazlett
Windows 2000/2003 MCSE
Never got an MCSD, go figure...
Alexsander "Axia" Antunes
|
|
|
|
|
I have a generic "value" object as a property within a class which can then be later filled from either columns from a DataReader (retrieving) or from a text box, scrollbar or other input control (restoring).
Later on the data is compared with another property which is defined on class construction, a System.Type. So this is basically acting as a datatype constraint: I intend for it to produce an error if the type of "value" does not match the constraint.
Now here's the problem, I can't find a way to create a dynamic type-cast, or a comparison with "is"/"as". Any ideas?
|
|
|
|
|
for dynamic casting use Convert.ToType(oject, dynamicobject.Type.BaseType)
where dynamicobject.Type.BaseType (or a similar expression ) would give you the base type of the object that you have. - i hope this is what you are looking at.
|
|
|
|
|
|
HAHAHA_NEXT wrote:
Use the typeof operator
Not much use with casting...
top secret
|
|
|
|
|
Sorry my error. Did not see, the fact that hee needed to type case the object after verifiing that it was of teh good type.
|
|
|
|
|
Is there a difference between MyMethod1() and MyMethod2() ? First method doesn't use this , second method does use it. Same thing in constructor:
public class MyClass
{
int i1, i2;
MyClass()
{
i1 = 10;
this.i2 = 20;
}
int MyMethod1()
{
int iRet = i1 + i2;
return iRet;
}
int MyMethod2()
{
int iRet = this.i1 + this.i2;
return iRet;
}
}
Regards, mYkel
|
|
|
|
|
|
To explain why RNEELY simple answered "no", it's because the compiler assumes that any un-qualified calls use the this reference anyway. It's an implicit object. When it compiles, the exact same Intermediate Language (IL, the language embedded in modules of which an assembly is partly comprised) is produced. In both cases, the optimized body of each method would look something like this:
ldfld int32 MyClass::i1
ldfld int32 MyClass::i2
add
ret
Microsoft MVP, Visual C#
My Articles
|
|
|
|
|
Okay... Here's a question for you...
[Edit]
Ooops! The original version of this program had an error in it that I didn't intend. Here is the correction, the multiple choice answers were as before.
[/Edit]
What is the output of this program:
01: class App
02: {
03: int i1 = 0;
04: int i2 = 0;
05:
06: public void Method1()
07: {
08: i1 = 2;
09: i2 = 3;
10: }
11:
12: public void Method2(int i2)
13: {
14: i1 = i2;
15: }
16:
17: public static void Main()
18: {
19: App a = new App();
20: a.Go();
21: }
22:
23: public void Go()
24: {
25: Console.WriteLine("i1 = {0}, i2 = {1}", i1, i2);
26: Method1();
27: Console.WriteLine("i1 = {0}, i2 = {1}", i1, i2);
28: Method2(5);
29: Console.WriteLine("i1 = {0}, i2 = {1}", i1, i2);
30: }
31: }
Is it:
a)
i1 = 0, i2 = 0
i1 = 2, i2 = 3
i1 = 3, i2 = 3
b)
i1 = 0, i2 = 0
i1 = 2, i2 = 3
i1 = 5, i2 = 3
c)
i1 = 0, i2 = 0
i1 = 2, i2 = 3
i1 = 5, i2 = 5
d)
None of the above - it generates a compiler error on line 12
EuroCPian Spring 2004 Get Together[^]
"You can have everything in life you want if you will just help enough other people get what they want." --Zig Ziglar
|
|
|
|
|
boogs guesses c) -> arguments take precedence over members.
and no, i didn't test it - that would be cheating
|
|
|
|
|
d0h, it was b). for the same reason.
|
|
|
|
|
E) None of the above
The signature for Main is an invalid one for an entry point. Also, changing it to static would preclude Main from accessing the public members of the App class without first creating an instance of the class.
RageInTheMachine9532
|
|
|
|
|
Okay - let's assume I didn't make that error...
EuroCPian Spring 2004 Get Together[^]
"You can have everything in life you want if you will just help enough other people get what they want." --Zig Ziglar
|
|
|
|
|
You know, this keyword is mostly used whenever in a function, the passed in parameter has a name the same as the name of a field:
private string name;
public void SetName(string name)
{
this.name = name;
}
Don't forget, that's Persian Gulf not Arabian gulf!
Murphy: Click Here![^] I'm thirsty like sun, more landless than wind...
|
|
|
|
|
Thanks for your comment... what you say makes totally sense!
OT: You should check the link to the murphy page in your signature it's not "http://www.thecodeproject.com/..." but "http://www.codeproject.com/...". Glad I could help you too
Regards, mYkel
|
|
|
|