Click here to Skip to main content
15,881,204 members
Articles / Operating Systems / Windows
Article

Nata1 .NET Search Engine solution

Rate me:
Please Sign up or sign in to vote.
1.33/5 (16 votes)
28 Jun 20046 min read 83.3K   703   26   18
Nata1 .NET Search Engine solution

Introduction

Visit http://www.nata1.com/ to contribute to the project. We are currently looking for many developer roles. Nata1 is the most powerful .NET Search Engine solution on the market, and its free for non-commercial use only. Nata1 allows you to index large sites, perform powerful queries, and configure advanced normalization a search customization features. Nata1 includes dozens upon dozens of advanced server controls that allow you to drag and drop an advanced search solution in minutes. Nata1 allows you to switch between Google, Nata1, and Index server without writing any custom code. This first article will demonstrate using Nata1 Asp.NET controls, but the next article will discuss adding searching capabilities to the TaskVision application, where we will build a site health monitor, check site ranking in Google, and create tasks accordingly.

Details

Using Microsoft index server to develop a site search engine 4 years back was a lot of fun. At the time, the flexibility to control noise words and what part of a web site gets indexed was interesting.

Imagine you wanted to find “all good surfing places“ in Costa Rica - but the site copy uses surf, not surfing, and words like all, and places are also not in the copy and not relevant. You can use index server, but its much better to have your own control. Imagine you needed to collect info on what people are searching for on your site, or you want to weight pages, exclude directories, etc.

When I got my first experience with .NET during Beta II of Visual Studio, the possibilities jumped out at me and I began working on Nata1.

Using Nata1, you can drag and drop UI search engine components like hit results, relevance, etc. and switch between google, Nata1, or Index Server without writing any custom code.

This article will go over the basics needed to build a basic search page, here is an example. http://www.nata1.com/Photos/Project+Photos/324.aspx

Image 1

For the developer that is more interested in customizing the controls and developing more advanced features, this is a good starting point. Also, computer science students studying Algorithms and data structures can get a good grasp on Binary Search Trees, and implement their own data structures. A series of articles written by Scott Mitchell are an excellent starting point for the computer science student to understand and analyze the differences between different data structures like skipped lists, and the properties of balanced trees. Nata1 was first implemented using BST's on a remote machine, and although you can use SQL server, you can also make a highbred with little work to use SQL server and a BST.

This article will show you how to get up and running with Nata1, but future articles by myself and others will demonstrate developing core search engine components and controls.

Step 1: add some configuration code to the web.config file. Config will be taken from the database, but if its not set in the admin tool it will look to the web.config, and then use defaults if nothing is found. For your search engine, using the web.config is fine, but I've found storing this info in the database is preferable, and there are many admin controls included that allow you to alter everything from normalization rules, to spidering settings.

XML
<Nata1>
<binPath>
<!-- if you want to use bsts, and have a web host, the bst 
has to be serialized as asp.NET restarts frequently -->
<add key="filePath" value="C:\siteName\searchEngine\" />
</binPath>
<sites>
<add key="site" value=siteUrlHere if your indexing just one site />
<add key="defaultPage" value="index.aspx" />
</sites>
<log>
<add key="filePath" value="c:\eventLog.txt" />
</log>
<database>
<add key="connectionString" value="cn string stuff here" />
</database>
<indexing>
<add key="hour" value="4" />
<add key="intervalType" value="daily" />

<!--
<add key="interval" value="hourBased" />
<add key="intervalHours" value="2" />
-K>
</indexing>
<indexRequestTimeOut>
<add key="seconds" value="5" />
</indexRequestTimeOut>
<indexService>
<add key="provider" value="IndexServer" />
</indexService>
<google>
<add key="licenseKey" value="[put your google license key here]" />
</google>
</Nata1>

and if you want to publish exceptions, use this

XML
<exceptionManagement mode="on">
<publisher assembly="Nata1" 
type="Nata1.Engine.Exceptions.ExceptionPublisher" exclude="*" 
include="Nata1.Engine.Exceptions.DataStructureException, Nata1; 
Nata1.Engine.Exceptions.QueryException, Nata1; 
Nata1.Engine.Exceptions.UIException" operatorMail=sedgewick@nata1.com 
filename="c:\.NETpub\wwwroot\SearchEngine\Nata1ErrorLog.txt" /> 
</exceptionManagement>

You'll need to add this as well

XML
<configuration>
<configSections>
<sectionGroup name="Nata1">
<section name="binPath" type="Nata1.Nata1SectionHandler,Nata1" />
<section name="sites" type="Nata1.Nata1SectionHandler,Nata1" />
<section name="log" type="Nata1.Nata1SectionHandler, Nata1" />
<section name="database" type="Nata1.Nata1SectionHandler, Nata1" />
<section name="preferedIndexTime" type="Nata1.Nata1SectionHandler, Nata1" />
<section name="indexRequestTimeOut" type="Nata1.Nata1SectionHandler, Nata1" />
<section name="indexing" type="Nata1.Nata1SectionHandler, Nata1" />
<section name="indexService" type="Nata1.Nata1SectionHandler, Nata1" />
<section name="google" type="Nata1.Nata1SectionHandler, Nata1" />
</sectionGroup>

<section name="exceptionManagement" 
type="Microsoft.ApplicationBlocks.
  ExceptionManagement.ExceptionManagerSectionHandler, 
Microsoft.ApplicationBlocks.ExceptionManagement" /> 
</configSections> 

Step 2. next, run the DataBase install scripts for SQL Server. the Database isn't very complex so you can easily use MySQL. If you don't have a database, you can still use an in memory Binary Search tree, but this isn't recommended, you always want to remote your data structures.

Step 3. Add Nata1.dll to your toolbox. Right click your toolbox. Choose “add/remove items” , click browse, and find Nata1.dll. Nata1 controls are now added to your toolbox.

Image 2

there are dozens of controls, some are container controls, like ResultsRepeater, and other are for individual Items, all the ones with a smiley icon are placed in the Item or Alternating Item template, like HitUrl, HitWords, etc. You can get creative with your toolbox icons, I've included some neat ones like Homestar runner icons. Controls like QueryTime sit in the header template. Some controls are specific to a search provider, e.g. Google has many controls, like spelling suggestions, but index server only has a couple so you have to be careful to make sure the provider supports the controls.

Step 4: We'll need a form to get from a search box to the search results page. Go ahead and drag and drop “SearchForm“ (control with the ducky) onto any ascx or aspx page in your site.

Image 3

To use an image, set the SearchButtonText to an image Url (I know, not the most elegant) or enter text and make sure to set the ButtonType as well as SearchPageUrl. As you can see, there is a bug in the designer as the image isn't updating.

Step 5: We'll build the search results page. Drag and drop “Search Results Repeater” (the one with the fairy icon) onto a ascx or aspx pageImage 4

The two most import properties will be “Query Provider“ - here you want to select Google, Nata1, Index Server, Rss, or ASP.NET Forums. The last two are still in development, anyone want to develop them, be my guest. Image 5

The other property is called SearchQueryTemplate mode, here you want to select simple or advanced.

Step 6: Right click the template, choose the template you want to edit, and start dragging and dropping controls.

Image 6

Here I dragged the controls SearchQuery and TotalHits onto the Header template, and put an ad banner there too, you can rotate based on keyword if you want.

There are several other templates you'll need to set, like NoResults, etc. There's also a template for a Search Form, and you can specify what search form controls to place there, perhaps you want an advanced search form to be at the top.

Step 7: make sure you place this code in your Global.asax! When you restart you web app (I usually just add one space to the web.config) your app will restart, and Nata1 will begin indexing, and follow the index plan you have specified in the web.config or in the database.

VB.NET
Sub Application_Start(ByVal sender As Object, ByVal e As EventArgs)

Nata1.Controller.Start()

End Sub

There are numerous controls for administration if you want those as well. You can manually index your site, and you can manage noise words, see all the words on the site, and manage normalization (what words to normalize or not and also special rules, i.e. running, ran, and run are the same word.)

One import control that is left as an exercise is logging search words - i.e. what are people searching for on the site? How about some info about them?

Here you have a powerful search engine you can put together in minutes, but the future of Nata1 is up to the community: I would like to see a DNN implementation, a CSK implementation, and I am currently working on a TaskVision implementation.

Conclusion

Hope you enjoyed my article, and let me know if you have any problems downloading the code or have any comments on the article. We are looking for contributors, so if you want to write new data structures, new controls, new providers, integrate with DNN, CSK, IBuySpy, or have other ideas, we'd love to hear them! Sedgewick@Nata1.com

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionSearch engine Voice to text Pin
Ariel Cabuga4-Oct-17 17:14
Ariel Cabuga4-Oct-17 17:14 
GeneralMy vote of 5 Pin
LynxDev29-Dec-11 3:32
LynxDev29-Dec-11 3:32 
Generalsearch engine Pin
ngovanchan22-Jan-05 23:40
ngovanchan22-Jan-05 23:40 
GeneralSpeed and Agility Trials Pin
Member 120032130-Jun-04 6:41
Member 120032130-Jun-04 6:41 
GeneralAdvice for writing a better article Pin
Member 120032129-Jun-04 13:28
Member 120032129-Jun-04 13:28 
GeneralHmmm Pin
Stoyan Damov29-Jun-04 11:35
Stoyan Damov29-Jun-04 11:35 
GeneralRe: Hmmm Pin
Member 120032129-Jun-04 11:44
Member 120032129-Jun-04 11:44 
GeneralRe: Hmmm Pin
Anonymous29-Jun-04 12:59
Anonymous29-Jun-04 12:59 
GeneralRe: Hmmm Pin
29-Jun-04 13:18
suss29-Jun-04 13:18 
GeneralRe: Hmmm Pin
Stoyan Damov29-Jun-04 18:22
Stoyan Damov29-Jun-04 18:22 
Maybe I'm blind, but I looked at almost every source file and couldn't find any full-text query processing. You do index words, however I wouldn't really call the SQL tables a full-text index. Can you tell me what is the complexity to search an arbitrary word with your search engine (and your search provider). How much space do I need to index a 100 MB web site. What if I want to index something that's not a web site. I also wouldn't agree that an associative container, implemented in terms of a SQL table is the right way to do word stemming. Now here's what I think: explain the search engine details, we are developers and want to know. We really don't care how to configure the search engine and how to drop controls on a form. Give us the guts, and if you get us interested, we'll ask you how to configure the engine... and please, remove all dead code and put some comments here and there.

Just my 2 1/2 stotinki

Cheers,
Stoyan

My blog
GeneralRe: Hmmm Pin
Member 120032130-Jun-04 4:53
Member 120032130-Jun-04 4:53 
GeneralRe: Hmmm Pin
LFRDG30-Jun-04 4:50
LFRDG30-Jun-04 4:50 
GeneralRe: Hmmm Pin
Member 120032130-Jun-04 5:45
Member 120032130-Jun-04 5:45 
GeneralRe: Hmmm Pin
Stoyan Damov30-Jun-04 6:01
Stoyan Damov30-Jun-04 6:01 
GeneralRe: Hmmm Pin
Member 120032130-Jun-04 6:26
Member 120032130-Jun-04 6:26 
GeneralRe: Hmmm Pin
Stoyan Damov30-Jun-04 7:39
Stoyan Damov30-Jun-04 7:39 
GeneralRe: Hmmm Pin
Member 120032130-Jun-04 8:02
Member 120032130-Jun-04 8:02 
GeneralRe: Hmmm Pin
LynxDev29-Dec-11 3:28
LynxDev29-Dec-11 3:28 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.