Click here to Skip to main content
15,889,200 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hey Guys,

I am new to Nutch. I am part of a IR research team & need to create a setup where in I need to crawl Microsoft's Dataset with Nutch. After googling for a while, I didn't get any tutorial or help. Can anyone guide me for the same?

I am using Nutch 1.4 on Ubuntu 11.10 & Eclipse 3.7.

Till now I am able to crawl public network from my Nutch setup integrated with Eclipse...

Is there any tutorial or wiki explaining how I can achieve this - or any other dataset kept on File System? If not, can you help me please....
Thanks in advance.
Cheers!!!

-
Varun
Posted

1 solution

there is just a little bit of documentation and tutorial right there:

http://nutch.apache.org/wiki.html[^]

Also please check the "related Projects" that might give some more input on how to use that stuff.
 
Share this answer
 
Comments
varunpandeyengg 9-Mar-12 2:09am    
I have checked the entire Nutch wiki, I think either I am not searching the correct thing or not finding any relevant information. Could you please direct me to anything more concrete...

Thanx for the reply :-)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900