Here's how we get started:
Hijack a HTTP server (microhttpd, dhttpd,..)
template (XSLT) + CGI
CGI - open-search config interface
CGI - open-search search + result interface
Plugin Loader
Launch Search
Merge (dummy) results
Hash/Key Data-format (wrapper)
P2P Plugins
Brainstorm:
Start with HTTPd and configuration interface! branch off microhttpd, minihttpd, dhttpd, boa or even lighttpd...
proxy-modules might become too complex -> off-the-shelf solutions? maybe http://tinyproxy.sourceforge.net
the crawler / indexer / proxy could be a perl script (in the first version) it allows to try out ideas and experiment quickly - perl is preinstalled on Linux and osX but we need some uncommon perl-modules so we should mirror/ship them.. there is perl for windows,too. and even more important: there are ready-made perl modules for crawling, HTML-parsing, indexing and summarizing HTML/pdf, extracting links, etc.
Crawler / indexer / proxy: libcurl, libhtml, libcgi,. offer similar functionality in C - better support in the long run, less bugs, easier to maintain (!?), more complicated to make (small) changes.
crawler- and indexer-obscurity is easier to do than hiding search/browsing stats. and structure of P2P plus low latency favors reliable communication (transitive lookups) and thus do not allow to introduce query-source obfuscation on application level. The best idea so far is to implement fake query modification and filtering in the local search-key-hash parser.
p2p data storage: 1st use ocean-store, freenet or gnutella. write plugins to import/export internal data structure(s). Later in the project we will switch to access the underlying DHT directly: bamboo-dht, libgnutella. Chord, pastry/tapestry, Leopard, k-Ary-DHT,..
The first versions (bootstrap) will "fake" the P2P network to get a useable front-end for testing. Traffic estimations from the "dummy P2P network" can be used to simulate P2P scenarios! (first idea to collect stats: share database via NFS. write IP + file-inode logs.)
.