Click here to Skip to main content
15,867,870 members
Please Sign up or sign in to vote.
2.33/5 (2 votes)
See more:
Recently I ran into a bit of a problem. I've made a program that is multi threaded and I seem to be having an issue with a piece of the code that manages the division of the workload.
My program is rather simple and it works in the following steps.
1. Read part of the file into buffer
2. Split the data into 8 parts[This part does not work]
3. Encode/decode the data [This changes the data and does cause the size to change]
4. Data merged into one string
5. Write to file
6. Repeat

So from everything I have learned the issue is with the spliter. I have made two attempts to make it work.
C++
// First try
// Works better but leaves stuff out
void splitData ()
{
    int theShift(theMainBuffer.length()/8);
    cout << theShift << "the divide " << theMainBuffer.length() << endl;
    int place(0);
    for(int i = 0; i != 8; i++)
    {
        if(place == theMainBuffer.length())
        {
            cout << "break" << endl;
        }
        else
        {
            ram[i] = theMainBuffer.substr(place,theShift);
            cout << place << endl;
            place = place + theShift;
        }
    }

    theMainBuffer = "";
}
//Another try
// Work but corupts data
vector<std::string> splitBreak(const std::string& str)
{
    for(int i = 0; i != 8; i++){ram[i] = "";}
    std::size_t n = 8;
    if( n < 2 ) return { str } ;

    std::vector<std::string> fragments;

    const auto min_sz = str.size() / n ;
    const auto excess_chars = str.size() % n ;
    const auto max_sz = min_sz + 1 ;

    // the first excess_chars fragments have one character more
    for( std::size_t i = 0 ; i < excess_chars ; ++i ) fragments.push_back( str.substr( i*max_sz, max_sz ) ) ;
    for( std::size_t i = excess_chars ; i < n ; ++i ) fragments.push_back( str.substr( excess_chars + i*min_sz, min_sz ) ) ;
    return fragments ;
}

So I am at a bit of a snag with this and wondering if someone has a solution.
Full code if needed
[C++] Full code - Pastebin.com[^]
Posted
Comments
[no name] 17-Jan-16 8:03am    
Not difficult - just design your method code it and debug it. This means learn to use your debugger. Using integer division needs to be thought through.
int theShift(theMainBuffer.length()/8);
What happens if length not divisible by 8? Use your debugger if not sure.
Jochen Arndt 17-Jan-16 8:13am    
I don't know if I understand your problem.

But it seems that you want to process the data splitted into eight parts.
Then you should process eight blocks where each block starts at i * block_size and ends at ((i + 1) * block_size) - 1. If the input data size is not dividable by eight, you must also process the remaining data with the last block.

Example for 67 byte input data:
block_size = size / 8 = 8;
1. Block: 0 to 7
2. Block: 8 to 15
...
8. Block: 48 to 63 with 3 additional bytes from 64 to 66
Member 10657083 17-Jan-16 10:26am    
The splitbreak function that is in the code is meant to do that. However it does not function to do so.
Richard MacCutchan 17-Jan-16 11:52am    
You just need to add the code in your first method (splitData) to include any excess characters beyond the last block.
Jochen Arndt 18-Jan-16 3:03am    
Sorry for the late replay.

Your code does not do so. My comment was meant that you think about the algorithm and probably write a complete new implementation.

You already got a solution but I will tell you how I would have done it:
Assuming you have a function that decodes a part and stores the decoded string somewhere (e.g. ram[index]):

void decode(int index, const char *s, size_t size);

the code can be:

size_t block_size = theMainBuffer.length() / 8;
for (int i = 0; i < 7; i++)
 decode(i, theMainBuffer + i * block_size, block_size);
decode(7, theMainBuffer + 7 * block_size, block_size + theMainBuffer.length() % 8);

The decode() function is just an example and may be simply a std::string creation from source and length.

If you haven't got boost then you could use something like:

C++
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
#include <iostream>

std::vector<std::string> split_into_n( const std::string &input, unsigned n )
{
	std::vector<std::string> strings;

	auto start_of_string = cbegin( input );
	auto characters_left_to_process = input.size();

	while( n && characters_left_to_process )
	{
		auto characters_in_next_string = ( characters_left_to_process + n - 1 ) / n;
		strings.push_back( std::string( start_of_string, start_of_string + characters_in_next_string ) );
		start_of_string += characters_in_next_string;
		characters_left_to_process -= characters_in_next_string;
		n--;
	}

	return strings;
}

auto main()->int
{
	std::string s( "123456789abcdefghijklmnopqrstuvwxyz123456789abcdefghijklmnopqrstuvwxyz" );
	auto strings( split_into_n( s, 8 ) );
	std::copy( begin( strings ), end( strings ), std::ostream_iterator<std::string>( std::cout, "\n" ) );
}


It's not as fussy as the versions you've written - there's only one loop and fewer local variables. You could eliminate the iterator but when I did the code looked horrible. Generally less locals and less loops makes code a lot easier to write and debug.

[EDIT: Made it so the larger strings are at the beginning of the vector, removing my naive handling non-divisable number of characters, and then removed my naive method of rounding up integer division - thanks to member(some number) for pointing this out]

I'd be interested to see what other people come up with to solve this. While it's a relatively trivial problem it'd be cool to see what simpler methods people have.
 
Share this answer
 
v6
Comments
_Asif_ 18-Jan-16 2:36am    
+5ed
Member 10657083 18-Jan-16 3:10am    
I would suggest testing it
Aescleal 18-Jan-16 3:59am    
I did test the code, quite thoroughly - I omitted the full tests as they weren't going to add anything to the conversation. However it was early in the morning and if you've got a breaking test case I'd love to see it so I can correct it.
Aescleal 18-Jan-16 4:09am    
And I've just noticed the bug - managed to replace the max with a min, thanks for highlighting it!
Andreas Gieriet 18-Jan-16 7:17am    
As you requested, I've added a second solution. I noticed that your splitting does not always produce a vector of n entries. If the source string is shorter than n characters, your method cuts the resulting vector to the number of characters.
Cheers
Andi
Another solution could be to not do any of the divisions in the loop but pre-calculate how many of the bins get an excess character each. E.g.
C++
using namespaces std;
...
vector<string> split(const string& raw) {
    size_t bins   = 8;
    size_t rawlen = raw.size();
    size_t div    = rawlen / bins;
    size_t excess = rawlen % bins;
    auto pos = raw.cbegin();
    vector<string> fragments(bins);
    for(size_t i = 0; i < bins; ++i) {
        // the first few bins get one of the excess char each
        size_t binlen = div + (i < excess ? 1 : 0); 
        auto nextpos = pos + binlen;
        fragments[i] = string(pos, nextpos);
        pos = nextpos;
    }
    return fragments;
}
This works fine too for any string of any size, including those shorter than 8 characters.
Cheers
Andi
 
Share this answer
 
Comments
Aescleal 18-Jan-16 7:34am    
Cool, thanks! I'd be tempted to just resize the vector after assigning the strings rather than preallocating the strings if making the vector a certain size was important but it's interesting to see the similarities rather than differences.
Andreas Gieriet 18-Jan-16 7:46am    
Or instead of using push_back, create a vector in the first place with n entries and replace the respective entries by indexing with the calculated ones.
Cheers
Andi

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900