Project Structure

Review of Existing Technologies

YaCy

yacy.de - p2p web search engine with all virtues and vices of a Java implementation. YaCy was originally designed as P2P web-proxy and virtual cyberspace on top of the Internet (TCP/IP). It offers a fairly good playground for P2P applications.

open-search and yacy differ in few design but many implementation decisions. see Table 1, “P2P-Search Software-Application Matrix” . The main difference is that yacy started off as a p2p-web-cache, while open-search separates the cache and builts on top of a p2p index, addressing privacy issues.

Opensearch

opensearch.a9.com, opensearch.org

Opensearch is a collection of simple formats for the sharing of search results.

basically content indexing software, interesting for later stages of the open-search project

Table 1. P2P-Search Software-Application Matrix

 open-searchYaCyGPU-search
What?distributed p2p-based web indexingdistributed p2p-based web indexingcentralized p2p-based web indexing
Websiteopen-search.netyacy.degpu.sf.net
Implementation:none yet - modular; POSIX-C monolithic; JAVAmodular; windows / linux-wine
P2P engine  GPU
DHT...
Crawler..implements crawler only.
Privacy --
Anonymity...
Notes.bottom up design. search on top of the web-proxy. >2 yrs ahead. about 100 users. drops TLAs.distributed crawling and indexing on top of GPU.

Goals, Requirements and Distinct Features of open-search

Goals:

  • open and free: open source, cross platform, portable.

  • quick search query execution

  • consistent and up-to date search results

  • scalable to > 1010 documents in the index

  • easy to install, setup and use.

Distinct Feat:

  • many.

  • many more.

  • stay tuned until we get there.

  • low latency and pricavy are good guesses for now.

Framework Structure

Fig 1. The Big Picture

Figure 1. gives an overview of the open search agent and its major internal modules: red: user interface (http server). blue: search-core. yellow: distributed data storage.

In short: the user specifies one or more search-terms and is redirected to the result-page. Meanwhile the user's request is parsed and hashed to search-key(s), which are resolved using the open-search P2P network. Finally the returned values for each key are merged, extended (web-cache) and rendered as result entries.

Naming Conventions

open-search-agent

the whole software framework

server

any part within the agent that listens on a TCP/IP port

crawler

the part that does standalone crawling

proxy

optional network pass-tru, capture, caching software that harvests and indexes user-browser traffic

user

source of PEBTACs

client

the P2P network software - "network client"

front-engine

HTTP-server that manages user-search requests

backend

general term for funderlying software structure (database, P2P engine, network, content indexing,..)

web-bar

standalone UI that can be integrated in a browser