Introduction
The TextSplit
class is useful when the need arises to
split a text file into smaller sub_text files. The class's default
constructor takes 2 parameters, the path/filename and how many lines each
sub_text file should contain.
Real-world example: I wrote a console application
importing networks & subnet mask information from a Cisco Router. I would
then pipe the results into a command file, where each line of the command file
will launch a separate discovery process. Now if you have a lot of networks (say
a 1000) you may want to automatically create sub_command files where only 10
processes are started each time. With this class, I can split the original
command file up into several smaller ones.
This class have been designed using the standard
<fstream>
libraries and should integrate easy with your console
application.
Using the class
- First, create a console application and copy
textsplit.h & textsplit.cpp to your current folder.
- Then, in your implantation file, include the header:
#include "textsplit.h"
- Add the textplit.cpp to your project, Select File-View,
right-click on your project and select "add files to project"
- Create a
TextSplit
object and call the CreateOutPutFiles()
method
TextSplit R(fileName, howManyLines);
R.CreateOutputFiles();
First, the fileName
object will be validated, and depending on how many
lines there are in the input file and the maximum number of lines you want, the
correct number of output files will be created in the format: x_filename.extention (where x is the numerical value). If there are any remainder lines that is less than the maximum
specified, they will be included in the last file. The example program included in this article provides an input file
(test.txt) with 10 lines, there must be a maximum of 3 lines in each sub_text file. You will have the following
output:
- 1_text.txt (line1-3)
- 2_text.txt (line 4-6)
- 3_text.txt (line 7-9)
- 4_text.txt (line 10)
Update
(2002/10/15)
Special thanks to Hernan Berguan for pointing out the 1000 line limitation,
I used normal Arrays for holding the each line from the source file
(... and we all know the limitations of arrays), so instead I decided
on the vector<string>
class. I tested demo program with a text file
of 500,000 lines, creating sub_text files with 50,000 lines each. As a final
test I used one of the sub_text files as source and create 1000 sub_"sub"_text
files with 50 lines each. The 50 line is the default if the user input is < 1
(Thanks again to Hernan). It seems that everything is working smooth now!
Summary
Here is a list of the public interfaces of the TextSplit
class
TextSplit(string FileName, int numberOfLinesForEachFile);
void CreateOutputFiles();
int GetNumberOfFiles() const;
Note that there is another function worth
mentioning, GetNumberOfFiles()
(This function returns the number of lines for the input file).
I am an IT security consultant that focuses mainly in Oracle, Microsoft, Citrix, RSA, SUN, and Linux security. What I do is Perimeter security design (firewalls, IDS etc) as well as internal / external network assessments (penetration).
On the programming side I’m sellable on VC++. I’m also strong using c# (asp.net) TSQL, VC++.Net, STL, COM, ATL, Java/VBscript / Coldfusion /
Home page: http://www.starbal.net