Close

Not a member yet?Register now and get started.

Client Login

Account Login
We build enterprise websites using Drupal
(0)845 519 5465

Drupal Nutch Module - Current thinking and road map

30 Nov. 2010

Overview

Thought I would quickly write down some of the work that is in progress on the Drupal 6 upgrade of the nutch module. Over the past month I have been redeveloping the Drupal 6 version of the Nutch module. In past implementations the Nutch module has used the open search client module to display its crawl results while this was a perfectly reasonable solution I felt that to continue down this line was not right for a number of reasons.

* Open Search Clients lack of Drupal 6 version
* Nutch crawler now supports pushing its results into Apache Solr
* The amount of active development on the Drupal Apache Solr module
* The exciting integration options with Apache Solr and views 3 was too hard to pass up

Nutch Module Development This part was quite quick and I have lots of ideas in this area including allowing you to set the crawl seed from Drupal manage the crawl and see reporting about its success. Unfortunately I have put this on hold as I hit a blocker early on as I wanted to be able to have both a Native Drupal Solr index (nodes etc) and a bunch of crawled pages but Nutch's Solr implementation has a set data structure and a few hacks (in my option) which causes a clash with the Drupal Nutch index. So as a result I have extended Nutch to allow you to specify the field sturcture and have submitted a patch to the and am waiting it to be included. I am also working on a Nutch plugin that allow you to pull specific information for a web page either by regex or via xslt that will help the the cause. Thats about it for now, if anyone has other ideas that that want need please post a comment or send me a mail or tweet.

Post new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.

Search with A12 Find

Find what you need without having to browse through unrelated articles.

Recommendations from A12 Find

Careers with Axis12

  • Our busy London office offers a relaxed and informal environment in which to work.
    If you like the sound of that send us your details now

Newsletter

Every couple of months we like to share what we've been up to, with anyone who'll listen...