Count Occurence Of A Word

Question

1.00/5 (1 vote)

See more:

I'm trying to count how many times a word "file" occurs in the data part. How would I do this? Thank you.

C++

void printRawData(unsigned char *data, int length, int more)
{
	int i, c=0;
	printf("     -------------One Data Begins-------------\n");
	for (i=0; i<length;>	{
		if ((data[i]>30 && data[i]<122) || 
			(((data[i]==10) || (data[i]==13) || (data[i]==123) || (data[i]==125))
            && (more>0)))
		{
			printf("%c", data[i]);
			c+=1;
                }
		else
		{
			printf("[%i]", data[i]);
			c+=3;
			if (data[i]>9) c++;
			if (data[i]>99) c++;
                }
		if (c>=47)
		{
			printf("\n");
			c=0;
                }
       }
}

Posted 29-Aug-11 6:43am

Member 7766180

Add a Solution

Comments

Philippe Mori 29-Aug-11 20:32pm

You should uses hard coded constants and to make it worst, they are not even commented. For example, what is character 125? This will make you code "write once" and thus unmaintainable...

Also explain the purpose of the constants 3, 9, 47 and 99. They are all related to variable c which I guess is the counter. But the purpose of these constants is not clear at all.

What is the purpose of the parameter more. It seems to be used as a boolean filter. Then why it is not of bool type.

4 solutions

Solution 2

It's probably easier to convert to a string and use existing methods. If using MFC, you can use CString and CString::Find() or you can use a std::string and string::find() if you're not using MFC.

Posted 29-Aug-11 7:46am

Albert Holguin

Comments

Member 7766180 29-Aug-11 14:09pm

Thanks Albert. I tried this and nothing was returned...
printf("%c", data[i]);
////////////////////////////////////////////////////////////////////////////
string str ("data[i]");
string str2 ("content");
size_t found;
found=str.find(str2);
if (found!=string::npos)
cout << "first 'content' found at: " << int(found) << endl;
////////////////////////////////////////////////////////////////////////////

Albert Holguin 29-Aug-11 14:15pm

your strings are initialized incorrectly, google how to use std::string

Member 7766180 29-Aug-11 14:37pm

Looking, but still confusing :))

Philippe Mori 29-Aug-11 20:37pm

Uses a debugger and trace your code line by line and inspect the content of each variable. It will be very easy to see where your code does not works that way.

For the initialization of the string, well it won'h hurt to read the documentaion which is one keystroke away in Visual Studio (F1).
The definition should probably be string str(data, length);

Solution 3

Where is the value that counts the words? Whats the meaning of more? What should c do for you?
WARNING - this snippet only works because the word file doesnt overlap themselves:

C++

int printRawData(const char *data, int length, int more)
{
  int      i,c;

  printf("     -------------One Data Begins-------------\n");
  for(c=i=0;i<length;)
  {
    if('f'!=data[i++]) continue; if(length<=i) break;
    if('i'!=data[i++]) continue; if(length<=i) break;
    if('l'!=data[i++]) continue; if(length<=i) break;
    if('e'!=data[i++]) continue;
    ++c;
  }
  return c;
}

int _tmain(int argc, _TCHAR* argv[])
{
  const char*    text = "afile bfile filec file fixle files and so on file fil";
  printf("%d",printRawData(text,strlen(text),1));
  _gettch();
  return 0;
}

Otherwise you should use a fast search like Boyer-Moore.
Regards.

Posted 29-Aug-11 7:56am

mbue

Comments

Member 7766180 29-Aug-11 14:42pm

Thanks mbue...I have this Boyer-Moore code, but how do I implement it?

# include <algorithm>

int BoyerMooreSearch(char* text, char* muster, int tlen, int mlen)
{
int i, j, skip[256];

for(i=0; i<256; ++i) skip[i]=mlen;
for(i=0; i<mlen; ++i)="" skip[muster[i]]="mlen-1-i;

" for(i="j=mlen-1;" j="">=0; --i, --j)
while(text[i] != muster[j]) {
i += max(mlen-j,skip[text[i]]);
if(i >= tlen) return -1;
j = mlen-1;
}
return i;
}

mbue 29-Aug-11 14:57pm

This isnt a fully boyer-moore, cause the skip value initialisation isnt complete. Where do you got the code from?
Ragards.

Member 7766180 29-Aug-11 16:40pm

The internet :)) forgot where. Where can I get the full blown version? Is it really this hard to count the occurences of a word!? Thank you.

mbue 29-Aug-11 20:50pm

Perhaps you should search this site:
* Boyer-Moore
* string search
Most search algorithms using null terminated strings, you can use them if your string is null terminated or you have to clip the buffer range into a string. in this case you can use the other suggestions with strtok, strstr or others from string class libraries.
Take a look at msdn (strtok_s). There is an example too.
regards.

Solution 4

In C++ use method find of basic_string.
For a sample code see MSDN basic_string::find

Update: modify this sample code C++ Parse Split Delimited String to search for a word in string.

Posted 29-Aug-11 7:59am

Sergey Chepurin

Updated 29-Aug-11 8:17am

v2

Comments

Albert Holguin 29-Aug-11 14:18pm

this is just an extension of std::string, which I've already suggested

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

André Kraak · Accepted Answer · 2011-08-29T07:08:00

Solution 1

Have a look at strtok_s[^], with it you can iterate through the words of your string.

[Edit] Changed strtok into strtok_s as strtok is depreciated and insecure.

In response to OP comment:

C++

void printRawData( unsigned char* data, int length, int more )
{
	char* l_sToken = NULL;
	char l_sDelimiter[] = " ,\t\n";
	char* l_sNextToken = NULL;
	int l_nCount = 0;

	l_sToken = strtok_s( (char*)data, l_sDelimiter, &l_sNextToken );
	while( l_sToken != NULL )
	{
		if( strcmp( l_sToken, "Content" ) == 0 )
			l_nCount++;

		l_sToken = strtok_s( NULL, l_sDelimiter, &l_sNextToken );
	}
}

Posted 29-Aug-11 7:08am

André Kraak

Updated 29-Aug-11 12:45pm

v5

Comments

Member 7766180 29-Aug-11 13:47pm

Thank you. I tried this and I'm not getting a count, Turned to word content as this was easier to find,

void printRawData(unsigned char *data, int length, int more)
{
int i, c=0;
printf(" -------------One Data Begins-------------\n");
for (i=0; i<length; i++)
="" {
="" if="" ((data[i]="">30 && data[i]<122) ||
(((data[i]==10) || (data[i]==13) || (data[i]==123) || (data[i]==125))
&& (more>0)))
{
printf("%c", data[i]);
///////////////////////////////////////////////////////////
char string[] = "Content";
char seps[] = " ,\t\n";
char *token;
//printf( "Tokens:\n" );
token = strtok(string,seps); // C4996
while( token != NULL )
{
printf( "This Is Token %s\n", token );
token = strtok( NULL,seps); // C4996
}
/////////////////////////////////////////////////////////
c+=1;
}
else
{
printf("[%i]", data[i]);
c+=3;
if (data[i]>9) c++;
if (data[i]>99) c++;
}
if (c>=47)
{
printf("\n");
c=0;
}
}
}

Member 7766180 29-Aug-11 14:59pm

Thank you, did this...but not getting the count.
l_sToken = strtok_s(reinterpret_cast<char *="">(data), l_sDelimiter, &l_sNextToken );
while( l_sToken != NULL )
{
if( strcmp( l_sToken, "Content" ) == 0 )
l_nCount++;
l_sToken = strtok_s( NULL, l_sDelimiter, &l_sNextToken );
printf("Answer=%i",l_nCount);

André Kraak 29-Aug-11 18:08pm

I am sorry to hear that, I do not known what might be wrong. Did you try debugging the code to see what happens?

Member 7766180 29-Aug-11 18:40pm

It seems to be data, it's an unsigned char but in the code it's a char.

André Kraak 29-Aug-11 18:45pm

Try casting data by using (char*)data.
I have updated the solution.

Member 7766180 30-Aug-11 11:22am

Thank you Andre. It's working now, but is there a way for it to return the result just once? It seems to be checking each word and returning either a 0 or a 1. Can I get it to just return one result that being either Content(0) or Content(1 or whatever count it is)?
Thank You.