The Agent - Developer Information

Here's how we get started:

The Open Search Agent

Brainstorm:

  • Start with HTTPd and configuration interface! branch off microhttpd, minihttpd, dhttpd, boa or even lighttpd...

  • proxy-modules might become too complex -> off-the-shelf solutions? maybe http://tinyproxy.sourceforge.net

  • the crawler / indexer / proxy could be a perl script (in the first version) it allows to try out ideas and experiment quickly - perl is preinstalled on Linux and osX but we need some uncommon perl-modules so we should mirror/ship them.. there is perl for windows,too. and even more important: there are ready-made perl modules for crawling, HTML-parsing, indexing and summarizing HTML/pdf, extracting links, etc.

  • Crawler / indexer / proxy: libcurl, libhtml, libcgi,. offer similar functionality in C - better support in the long run, less bugs, easier to maintain (!?), more complicated to make (small) changes.

  • crawler- and indexer-obscurity is easier to do than hiding search/browsing stats. and structure of P2P plus low latency favors reliable communication (transitive lookups) and thus do not allow to introduce query-source obfuscation on application level. The best idea so far is to implement fake query modification and filtering in the local search-key-hash parser.

  • p2p data storage: 1st use ocean-store, freenet or gnutella. write plugins to import/export internal data structure(s). Later in the project we will switch to access the underlying DHT directly: bamboo-dht, libgnutella. Chord, pastry/tapestry, Leopard, k-Ary-DHT,..

  • The first versions (bootstrap) will "fake" the P2P network to get a useable front-end for testing. Traffic estimations from the "dummy P2P network" can be used to simulate P2P scenarios! (first idea to collect stats: share database via NFS. write IP + file-inode logs.)

  • .

Libraries and Classes

Front-end interface and templates.

search results can be updated dynamically by using AJAX techniques, or static reloads! There will be both non-blocking and blocking I/O! The HTTP-client (javascript) specifies a request mode:

[poll,read]? [nonblock|blocktimeout=NN]?

where NN is 1/10 seconds.