Quote:
I need to search one directory (c:\K_txt) of almost 3,000 .txt files
Searching this on the fly is a really very bad idea, and would have to make your users wait each time they make a change in the query. A good approach would be to read your files once, and create tokens (words, in English) in the files. This will tell your algorithm which file contains which words—
you can specialize this into getting sentences, period separated let's say.
This will help you search for the words in your own data structure; a tree, trie, heap, you pick. This will help your users easily check which words are available in which files, because now your application will only have to go to your own data structure, instead of traversing the file system once again.
File system will be traversed once, only. Your structure will contain the data in an ordered and search-friendly way.
Quote:
either a word or exact phrase that the user enters,
Exactly my point, what happens when user wanted to search for "file" and entered "fole", your algorithm would be searching for "fole" in the directory, and then for "file" after it has traversed directory once. Not a good approach, and you need an alternate. One of such approaches is with MapReduce, in this approach you will be reading the files one by one, counting the overall words that exist and their number of occurrences. You can then feed this result in your own structure and query that, for a really better approach that the approach you are considering.
See the following links and learn something from there,
mapreduce - Hadoop searching words from one file in another file - Stack Overflow[
^]
algorithms - Hadoop MapReduce Word Counting Example - Computer Science Stack Exchange[
^] (You can call it, word finding)
Quote:
then load up a listbox with the names of those txt files that contain the word or phrase
Your structure will return everything they need, it will know which files contain "file", and will return them in a list—or however you have specified.
Quote:
a text file with each instance of the word or phrase highlighted
That depends on the app framework, and I will leave you with that here. :-)
Good luck.