Using the privss Toolkit

To build and run the programs in the privss toolkit, two other packages are needed: libpaillier and the Integer Matrix Library (IML), which is used for the special linear algebra techniques of the private searching scheme. IML in turn requires the ATLAS implementation of BLAS.

Make sure you have these installed, then download and unpack the most recent privss tarball. It can be installed with the standard GNU build system commands.

$ ./configure
$ make
$ make install

In the above, the "$" denotes your shell’s prompt. If necessary, libraries installed in non-standard locations can be selected with the --with-paillier-lib=<path>, --with-iml-lib=<path>, etc. options to configure (see ./configure --help for details). If you have any trouble getting IML and its prerequisites installed, check this project again soon because I am currently implementing the linear algebra functionality natively, which will remove the need for IML.

Once privss is successfully installed, to set up a private search, we will select algorithm parameters and encrypt our query with privss-qcon. For now let’s use try the default parameters in a search for documents with the string "illuminati" or "mkultra".

$ privss-qcon illuminati mkultra
generating key pair (move mouse to add to entropy pool) ...
encrypting query table ...    256 / 256   
$ ls
enc_query  prv_key

We can send the file enc_query off to an untrusted server that will perform the private search for us. That file doesn’t directly include the words "illuminati" or "mkultra", and furthermore doesn't reveal anything about our query at all, assuming the security of the Paillier cryptosystem. We keep the secret file prv_key to later reconstruct the results of the search.

The untrusted server can use privss-search to process documents, one with each invocation. The first time it is run, a new file of intermediate, encrypted search results will be created. That file will be updated with subsequent invocations for additional documents. This process is illustrated below.

$ ls
enc_query  kennedy.jpg  mc_report.pdf  teeter_interview.mp3
$ privss-search enc_query enc_res kennedy.jpg "robert kennedy" rfk
$ ls
enc_query  enc_res  kennedy.jpg  mc_report.pdf  teeter_interview.mp3
$ privss-search enc_query enc_res mc_report.pdf mkultra "midnight climax" "sodium pentothal"
$ privss-search enc_query enc_res teeter_interview.mp3 "lawrence teeter" illuminati sirhan

Each time we invoke privss-search, we specify the document we wish to process and a list of keywords associated with it. The privss-search tool doesn’t attempt to read keywords out of the document itself. Instead, we let the higher-level invoking application (or user) specify keywords explicitly; this way a variety of document types may be handled in application specific ways. The above example illustrates this; the keywords for the file mc_report.pdf could have been extracted using pdftotext, while the keywords for teeter_interview.mp3 may have been obtained using id3info.

When the server is done processing documents, it sends the file enc_res back to the client. Using prv_key, the client can then obtain the documents which matched the query.

$ ls
enc_query  enc_res  prv_key
$ privss-recon enc_query enc_res prv_key 
decrypting results ...    8460 / 8460   
solving linear system of 30 variables ...
solving linear system of 9 variables ...
2 documents matched query
saving file mc_report.pdf ...
saving file teeter_interview.mp3 ...
$ ls
enc_query  enc_res  prv_key  mc_report.pdf  teeter_interview.mp3

That’s all there is to doing a very simple private search with the privss toolkit using the default search parameters.

Unfortunately, if you want to run a larger-scale search and maintain high space efficiency, things get a little trickier. The issue is minimizing the possibility of "overflow" while keeping the space (and time) requirements low. At the moment, the only way to set the parameters to the private searching algorithms is using the low level privss-qcon options --c-buf-len, --l-buf-len, --d-buf-len, etc. Doing so intelligently would require reading New Techniques for Private Stream Searching and doing some arithmetic.

I’m hoping to soon implement higher-level, more intuitive interface for setting the search parameters. Until then you can just try doing searches with the default parameters, or, if you’re adventurous, try to figure out how to set them well for your application.