Click here to Skip to main content
15,890,512 members
Articles / Programming Languages / C#
Article

WinSpider - The Windows WebCrawler Application

Rate me:
Please Sign up or sign in to vote.
1.22/5 (27 votes)
3 Feb 20052 min read 129.8K   2.5K   27   22
Web leaching utility devoloped in C# - This is a front end named WinSpider, This application uses "wget" in backend for "crawling" operation. It impliments a simple, parellel method of interprocess communication.

 

Sample Image - cp_ws.gif

Introduction

This application can be used to leach a url contents and it subdirectories(optional)

This will work behind firewall and have capabilty to minimize to system try. The progress will update in the status window (yellow)

For allowing url input Iam using a url combo box featuring history.

The back end of this utility is wget (Open source project ), You can get its latest from http://www.wget.org

 

Open Issues

Some one commented that the leached directories are getting deleted from the current folder. ( This is the tempory directory created, where will be the files get leached at first).

Then copied to the specified directory. ( you can see them on leaching in temporary directory)

Please remove the directory removing section to keep both the contents, so that later on you can get updated version of the url faster ).

The leach code looks like this

This uses a parellel way to interprocess communication ;-)

  void StartLeach()
  {
   if(urlComboAddress.Text.ToLower() == "http://"
    || urlComboAddress.Text.ToLower() == "ftp://")
   {
    MessageBox.Show("Please specify an http:// or ftp:// site location.", "Error");
    return;
   }
   
   if(cCheckEnableProxy.Checked)     
   {
    if( cTextServer.Text == ""
     ||cTextUser.Text == ""
     || cTextPass.Text == ""
     || cTextPort.Text == "" )
    {
     MessageBox.Show("Please specify correct proxy server, port, username and password.", "Error");
     return;
    }
   }

 
   if(! Directory.Exists(cOutFolder.Text))
   {
    MessageBox.Show("Directory does not exists");
    return;
   }
 
   MenuStart.Enabled = false;
   MenuCancel.Enabled = true;
   strOutPath = cOutFolder.Text;
   cTextOut.Clear();
 
   String strBatch = "wget.exe ";
   if(cCheckEnableProxy.Checked)
   {
    strBatch += " --proxy-user="
     + cTextUser.Text
     + " --proxy-pass=" + cTextPass.Text
     + " -e http_proxy=" + cTextServer.Text
     + ":"+ cTextPort.Text;
   }
 
    
   if(cCheckRecursive.Checked)
   {
    strBatch += " -r ";
 
    if(cCheckChildOnly.Checked)
    {
     strBatch += " -np ";
     if(cCheckSiblings.Checked)
     {
      strBatch+= " -l 1 "; // one level
     }
     else
     {
      strBatch+= " -l 0 "; //infinite levels
     }
 
    }
   }
 
   // time stambing -N
   // -P prefix
   //
   strBatch
    += " -o out.cap -N --passive-ftp -x -N -P"
    + strOutPath
    + " "
    + urlComboAddress.Text ;
    
   strBatch+= " ";
 
   String filename="wget.bat";
   if(File.Exists(filename))
   {
    File.Delete(filename);
   }
 
   StreamWriter file = File.CreateText(filename);
   file.WriteLine(strBatch);
   file.Close();
   
   myProcess = new Process();
   myProcess.StartInfo.FileName = filename;
   myProcess.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
   myProcess.StartInfo.RedirectStandardOutput = false;
   myProcess.StartInfo.UseShellExecute = true;
   myProcess.StartInfo.CreateNoWindow = true;
   
   try
   {
    cButtonLeach.Enabled = false;
    cTimerUpdate.Enabled = true;
    myProcess.Start();
   }
   catch(Exception eProc)
   {
    MessageBox.Show(eProc.Message);
   }
  }

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
India India
Now working with NeST technologies - a major software firm with global presence(CMM5 and towards Six Sigma).

Comments and Discussions

 
GeneralYou dumb f-ck Pin
notadotyet7-Jan-05 6:48
notadotyet7-Jan-05 6:48 
GeneralRe: You dumb f-ck Pin
noushadkc7-Jan-05 20:55
noushadkc7-Jan-05 20:55 
GeneralRe: You dumb f-ck Pin
Stephan Johnson18-Oct-05 1:40
Stephan Johnson18-Oct-05 1:40 
GeneralAny Updates Pin
wrussell1-Feb-04 13:59
wrussell1-Feb-04 13:59 
GeneralRe: Any Updates Pin
Anonymous1-Feb-04 17:36
Anonymous1-Feb-04 17:36 
GeneralRe: Any Updates Pin
wrussell2-Feb-04 3:25
wrussell2-Feb-04 3:25 
GeneralRe: Any Updates Pin
wrussell2-Feb-04 8:40
wrussell2-Feb-04 8:40 
QuestionAnd the changes were??? Pin
fifi29-Apr-03 12:26
fifi29-Apr-03 12:26 
Generalanother non-creative non-sense article Pin
programmer2003++29-Apr-03 9:29
sussprogrammer2003++29-Apr-03 9:29 
GeneralRe: another non-creative non-sense article Pin
OmegaSupreme29-Apr-03 10:21
OmegaSupreme29-Apr-03 10:21 
GeneralRe: another non-creative non-sense article Pin
A reader29-Apr-03 17:24
A reader29-Apr-03 17:24 
GeneralRe: another non-creative non-sense article Pin
OmegaSupreme29-Apr-03 17:29
OmegaSupreme29-Apr-03 17:29 
GeneralRe: another non-creative non-sense article Pin
Ed Din ar Qadiyyeh20-May-03 21:03
Ed Din ar Qadiyyeh20-May-03 21:03 
GeneralSuch a nifty title... Pin
Marc Clifton10-Feb-03 6:55
mvaMarc Clifton10-Feb-03 6:55 
Generalwww.wget.org Pin
noushadkc10-Feb-03 1:16
noushadkc10-Feb-03 1:16 
Generalwww.wget.com Pin
leppie8-Feb-03 23:21
leppie8-Feb-03 23:21 
GeneralRe: www.wget.com Pin
Jeff J9-Feb-03 8:20
Jeff J9-Feb-03 8:20 
QuestionRobots.txt? Pin
Jörgen Sigvardsson8-Feb-03 22:52
Jörgen Sigvardsson8-Feb-03 22:52 
AnswerRe: Robots.txt? Pin
leppie8-Feb-03 23:25
leppie8-Feb-03 23:25 
GeneralRe: Robots.txt? Pin
Jörgen Sigvardsson9-Feb-03 0:51
Jörgen Sigvardsson9-Feb-03 0:51 
AnswerRe: Robots.txt? Pin
noushadkc10-Feb-03 15:46
noushadkc10-Feb-03 15:46 
GeneralWGET?!?! Pin
Daniel Turini8-Feb-03 21:48
Daniel Turini8-Feb-03 21:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.