Click here to Skip to main content
15,920,828 members
Articles / Desktop Programming / MFC
Article

TextFileSplitter - A Class to Split Text Files

Rate me:
Please Sign up or sign in to vote.
3.33/5 (3 votes)
16 Oct 20022 min read 175.3K   3.3K   17   12
A class for splitting a large text file into equal amounts of smaller sub_text files

Image 1

Introduction

The TextSplit class is useful when the need arises to split a text file into smaller sub_text files. The class's default constructor takes 2 parameters, the path/filename and how many lines each sub_text file should contain. 

Real-world example: I wrote a console application importing networks & subnet mask information from a Cisco Router. I would then pipe the results into a command file, where each line of the command file will launch a separate discovery process. Now if you have a lot of networks (say a 1000) you may want to automatically create sub_command files where only 10 processes are started each time. With this class, I can split the original command file up into several smaller ones.

This class have been designed using the standard <fstream> libraries and should integrate easy with your console application.

Using the class

  1. First, create a console application and copy textsplit.h & textsplit.cpp to your current folder.
  2. Then, in your implantation file, include the header:
  3. #include "textsplit.h"
  4. Add the textplit.cpp to your project, Select File-View, right-click on your project and select "add files to project"
  5. Create a TextSplit object and call the CreateOutPutFiles() method
  6. TextSplit R(fileName, howManyLines); 
    R.CreateOutputFiles();

First, the fileName object will be validated, and depending on how many lines there are in the input file and the maximum number of lines you want, the correct number of output files will be created in the format: x_filename.extention (where x is the numerical value). If there are any remainder lines that is less than the maximum specified, they will be included in the last file. The example program included in this article provides an input file (test.txt) with 10 lines, there must be a maximum of 3 lines in each sub_text file. You will have the following output:

  • 1_text.txt (line1-3)
  • 2_text.txt (line 4-6)
  • 3_text.txt (line 7-9)
  • 4_text.txt (line 10)

Update (2002/10/15)

Special thanks to Hernan Berguan for pointing out the 1000 line limitation, I used normal Arrays for holding the each line from the source file (... and we all know the limitations of arrays), so instead I decided on the vector<string> class. I tested demo program with a text file of 500,000 lines, creating sub_text files with 50,000 lines each. As a final test I used one of the sub_text files as source and create 1000 sub_"sub"_text files with 50 lines each. The 50 line is the default if the user input is < 1 (Thanks again to Hernan). It seems that everything is working smooth now! 

Summary

Here is a list of the public interfaces of the TextSplit class

// default constructor 
TextSplit(string FileName, int numberOfLinesForEachFile); 

void CreateOutputFiles(); 

// return number of lines in text file.
int GetNumberOfFiles() const;

Note that there is another function worth mentioning, GetNumberOfFiles() (This function returns the number of lines for the input file).

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
South Africa South Africa
I am an IT security consultant that focuses mainly in Oracle, Microsoft, Citrix, RSA, SUN, and Linux security. What I do is Perimeter security design (firewalls, IDS etc) as well as internal / external network assessments (penetration).

On the programming side I’m sellable on VC++. I’m also strong using c# (asp.net) TSQL, VC++.Net, STL, COM, ATL, Java/VBscript / Coldfusion /

Home page: http://www.starbal.net

Comments and Discussions

 
Generalsplitting at specific position in a text Pin
dreyfus22-Apr-08 1:22
dreyfus22-Apr-08 1:22 
GeneralRe: splitting at specific position in a text Pin
nums23-Apr-08 2:37
nums23-Apr-08 2:37 
GeneralTextFileSplitter Pin
shahjayesh152-May-07 19:44
shahjayesh152-May-07 19:44 
QuestionAbout text files combination! Pin
snailflying15-Apr-07 23:59
snailflying15-Apr-07 23:59 
Generalcrash Pin
dan o30-Jan-04 1:30
dan o30-Jan-04 1:30 
QuestionLine limit? Pin
nacnuduk197517-Nov-03 14:20
nacnuduk197517-Nov-03 14:20 
AnswerRe: Line limit? Pin
nums17-Nov-03 20:55
nums17-Nov-03 20:55 
GeneralRe: Line limit? Pin
nacnud_uk21-Nov-03 6:07
nacnud_uk21-Nov-03 6:07 
Pieter,

More below.

>Hi there,
>
>Thanks for the question... I must admit you have a valid point there - Here is a brief >explanation on the choices:

Thank you.

>
>1. Why do you have a line limit? Well, the array I used in the beginning was just >hardcoded to a 1000 as I never expected someone to use text-files with millions of >lines. The choice for vector was the easiest at the time with the least amount of >code change.
>

Ah, one of the programmers first lessons friend, *never*, and I mean *never*, assume anything. The rule is, if it can happen, it *will* happen. Just a bit of a hint. Smile | :)

>2. Why do you "store" the file? Not sure what you mean…

It’s a hint to my proposed implementation. You need not store the file, in any sort of array, as you are doing, you can just use the implementation that I had suggested, then the only thing you have to store, at the very most, is one line at a time. You could get away with character per character, and cut it down to just 8 bits, but that’s a bit dramatic, and slow. If speed were to become an issue on huge files, you could read 2K blocks or so, at a time, work in memory, and then read another 2K block.

>
>3. Why do you use "vectors" or "arrays"? 2 reasons: I needed practice on classes & >vectors for my practical c++ exam. I recon it would be the perfect project to >experiment with.

The implementation could have still been done with a class, as you could have wrapped standard functions into class methods. You could have used a base class and expanded upon it.

>
>The final reason is because I needed to count how many files there exist in the >source file before my output_files loop can start.

Why do you need to know how many lines there are, before you start? I mean, if you wanted to make sure that the “split” files were all the same size, you could do a one pass scan of the file, count the lines, and store their offset, as indexes into the file, then just write a routine that would put the same number of lines in “each file”, or your output.



>So I created the private class
>method to count the lines.

Lines could have been counted, either as you went along, or as I said, in a pre-split pass.

>Again, vectors seemed like the easiest choice and
>because I have to iterate through the source file once anyway, so I just pushed each >line into the vector. (This could be retrieved on the CreateOutputFiles method later)

Of course, I’m not criticising your implementation here friend, I’m only showing you that there were and are other ways to get to the same end and that as you become more aware of not only your own talents as a programmer, and also you increase your experience, these things will come naturally to you.

>
>Your pseudocode does make sense to me and will probably work – guess it’s just a >different style of programming and the way I designed the class @ the time.

Of course. I was just highlighting what some may see as a “limitation” of your design. And of course, you, in hindsight, can see that too. Life is for learning, and living, seems that you’ve got the basis there mate, keep at it.

>
>Cheers,

No bother, it was my pleasure, thanks for listening.

Peace,
DunkSmile | :)
>Pieter
>

Generalvector&lt;string&gt; Pin
tja0129-Oct-03 11:25
tja0129-Oct-03 11:25 
GeneralRe: vector&lt;string&gt; Pin
nums4-Nov-03 2:58
nums4-Nov-03 2:58 
GeneralLink dos not work Pin
User 269421-Oct-02 22:55
professionalUser 269421-Oct-02 22:55 
GeneralRe: Link dos not work Pin
Pieter (nums)7-Oct-02 0:12
sussPieter (nums)7-Oct-02 0:12 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.