Click here to Skip to main content
15,891,633 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I was able to encrypt text documents by using StreamReader and StreamWriter to read and change the file content.
But Word, I tried to rename it to txt first, then open it, and I encrypt whatever inside, when I decrypt it, it stay corrupted.
I need explanation to read string in word document file.

PS: I use my Caesar Encryption Method

Edit :
I don't care if it's weak or what.
I just want to know the step to encrypt word document file.

For example I had word document with "ABCDEFGHIJKLMNOPQRSTUVWXYZ" text inside.
And I'm going to encrypt it using vernam cipher.
How?
Posted
Updated 16-Nov-19 9:48am
v3
Comments
Member 13566383 18-Nov-19 16:27pm    
If your encrypting / decrypting software cannot handle binary files but only text files you should convert the binary filestream into Base64 strings or character arrays (each 3 Bytes will be represented by 4 characters).
The methods to use are Convert.ToBase64String / Convert.ToBase64CharArray.
Decrypting your encrypted file will produce Base64 strings / characters which you can convert to binary format by using the corresponding methods Convert.FromBase64...
The Magical Magikarp 19-Nov-19 17:41pm    
Why convert to Base64? Why not do File.ReadAllBytes(path), then convert the bytes to Hexadecimal Characters, so you may view in, say, a RichTextBox? :D
The Magical Magikarp 19-Nov-19 17:44pm    
Base64 is hardly ideal for this, because if you attempted to Convert.FromBase64String(string) then you would see nothing, if displayed in a RichTextBox or a label, etc. Hexadecimal allows you to 100% view the bytes, even NUL bytes("\x0", or "00"), perfectly.
Member 13566383 20-Nov-19 2:09am    
I had never any problems reading Base64 characters. A Base64 stream consists of valid ASCII characters only. No problem with NUL bytes, see the MDSN documentation for the Convert.ToBase64String Method:

".....The following example demonstrates the ToBase64String method. The input is divided into groups of three bytes (24 bits) each. Consequently, each group consists of four 6-bit numbers where each number ranges from decimal 0 to 63. In this example, there are 85 3-byte groups with one byte remaining. The first group consists of the hexadecimal values 00, 01, and 02, which yield four 6-bit values equal to decimal 0, 0, 4, and 2. Those four values correspond to the base-64 digits "A", "A", "E", and "C" at the beginning of the output."

As everyone keeps pointing out, a word document is not plain text.

What they have not asked is what format of word document are you processing?
Are you processing a DOC or a DOCX file? i.e. does it conform to the OpenXML format?

If you are reading a DOC file, then you need to use the correct interop/libary to extract the raw text data which you can then encrypt. (I suggest you start with the Word Interop, an alternative that doesn't require Word to be installed is NPOI NPOI[^], the 2.0 beta version also supports DOCX files)

If you are reading a DOCX file, then you need to the an OpenXML libary to extract the raw text data, and then encrypt it and insert it back into the document.Microsoft Help Page[^]
 
Share this answer
 
Comments
Midnight Ahri 3-Dec-13 3:26am    
Thank you very much for the information.
How if I ignore doc / docx / whatever it is.
I'll just read the byte / byte from the file and encrypt it using vernam cipher.
I can even encrypt anything with that.
Is this a good solution?
Pheonyx 3-Dec-13 3:33am    
It depends on you actual objective. Personally, I don't know that much about encryption in that manner.

If it was me I would probably just use something like this:
Click me
Midnight Ahri 3-Dec-13 3:36am    
Thank you very much for the information, it helps a lot. =D
PS: I use my own encryption method.

There's your big mistake. Usually, unless your holding a PhD in Math, rolling your own "encryption" method is about the most insecure encryption you can come up with.

Your second mistake is that you haven't posted any code having anything to do with your "encryption" and "decryption" so it's impossible for anyone to tell you what you have done wrong there.
 
Share this answer
 
Comments
Midnight Ahri 2-Dec-13 22:46pm    
Sorry for my terrible information.
I'm using Caesar Cipher that encrypt character one by one.
Read text file, encrypt then modify text file.
I need explanation of how to read string in word document file without the xml format.
Dave Kreskowiak 2-Dec-13 23:57pm    
Yeah, well, that's the problem. The Caesar Cypher only works on text files. Word documents are NOT text files. They are binary!

Also, the Caesar Cipher is very weak and easily broken. You're really not encrypting anything with that.
Midnight Ahri 3-Dec-13 1:53am    
Yes it's true that Caesar Cipher is very weak.
But my main point is to get plain text from word documents.
Can you provide me information about that?
You're wrong because Word documents are not completely text files; they are instead binary files.
So, in your process of
- change extension to txt
- open the file as a text one
- encrypt it as if it was a true text file
you are corrupting it.

You need a binary encryption process, which is far more complicated than a simple Caesar encryption. Some brilliant mathematicians have worked and still work on the subject nowadays.
 
Share this answer
 
Comments
Midnight Ahri 3-Dec-13 3:00am    
Now I got my Vernam Cipher Encryption, but still I need a way to read Word Document plain text.
I can't simply choose the file and encrypt it right?
phil.o 16-Nov-19 16:14pm    
What problem?
The Magical Magikarp 19-Nov-19 17:39pm    
By "problem", I mean his original question. Plus, I did answer his question "Encrypt Word Document File". -And in C#, too. He commented and asked a question, but no-one answered. I answered :)
phil.o 19-Nov-19 17:50pm    
I still wonder what this has to do with me. But nevermind, that is not so important anyway.
The Magical Magikarp 2-Dec-19 14:14pm    
No, I wasn't referring to you -- I was talking to Midnight Ahri :) Sorry for confusion

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900