Using the privss Toolkit
To build and run the programs in the privss toolkit, two other packages are needed: libpaillier and the Integer Matrix Library (IML), which is used for the special linear algebra techniques of the private searching scheme. IML in turn requires the ATLAS implementation of BLAS.
Make sure you have these installed, then download and unpack the most recent privss tarball. It can be installed with the standard GNU build system commands.
$ ./configure $ make $ make install
In the above, the "$" denotes your shell’s prompt. If
necessary, libraries installed in non-standard locations can be
selected with the --with-paillier-lib=<path>
,
--with-iml-lib=<path>
, etc. options to
configure
(see ./configure --help
for
details). If you have any trouble getting IML and its
prerequisites installed, check this project again soon because I
am currently implementing the linear algebra functionality
natively, which will remove the need for IML.
Once privss is successfully installed, to set up a private search, we will select algorithm parameters and encrypt our query with privss-qcon. For now let’s use try the default parameters in a search for documents with the string "illuminati" or "mkultra".
$ privss-qcon illuminati mkultra generating key pair (move mouse to add to entropy pool) ... encrypting query table ... 256 / 256 $ ls enc_query prv_key
We can send the file enc_query
off to an untrusted
server that will perform the private search for us. That file
doesn’t directly include the words "illuminati" or
"mkultra", and furthermore doesn't reveal anything about our
query at all, assuming the security of
the
Paillier cryptosystem. We keep the secret file
prv_key
to later reconstruct the results of the
search.
The untrusted server can use privss-search to process documents, one with each invocation. The first time it is run, a new file of intermediate, encrypted search results will be created. That file will be updated with subsequent invocations for additional documents. This process is illustrated below.
$ ls enc_query kennedy.jpg mc_report.pdf teeter_interview.mp3 $ privss-search enc_query enc_res kennedy.jpg "robert kennedy" rfk $ ls enc_query enc_res kennedy.jpg mc_report.pdf teeter_interview.mp3 $ privss-search enc_query enc_res mc_report.pdf mkultra "midnight climax" "sodium pentothal" $ privss-search enc_query enc_res teeter_interview.mp3 "lawrence teeter" illuminati sirhan
Each time we invoke privss-search, we specify the document we
wish to process and a list of keywords associated with it. The
privss-search tool doesn’t attempt to read keywords out of
the document itself. Instead, we let the higher-level invoking
application (or user) specify keywords explicitly; this way a
variety of document types may be handled in application specific
ways. The above example illustrates this; the keywords for the
file mc_report.pdf
could have been extracted using
pdftotext
, while the keywords for
teeter_interview.mp3
may have been obtained using
id3info
.
When the server is done processing documents, it sends the file
enc_res
back to the client. Using
prv_key
, the client can then obtain the documents
which matched the query.
$ ls enc_query enc_res prv_key $ privss-recon enc_query enc_res prv_key decrypting results ... 8460 / 8460 solving linear system of 30 variables ... solving linear system of 9 variables ... 2 documents matched query saving file mc_report.pdf ... saving file teeter_interview.mp3 ... $ ls enc_query enc_res prv_key mc_report.pdf teeter_interview.mp3
That’s all there is to doing a very simple private search with the privss toolkit using the default search parameters.
Unfortunately, if you want to run a larger-scale search and
maintain high space efficiency, things get a little trickier.
The issue is minimizing the possibility of "overflow" while
keeping the space (and time) requirements low. At the moment,
the only way to set the parameters to the private searching
algorithms is using the low level privss-qcon options
--c-buf-len
, --l-buf-len
,
--d-buf-len
, etc. Doing so intelligently would
require
reading New
Techniques for Private Stream Searching and doing some
arithmetic.
I’m hoping to soon implement higher-level, more intuitive interface for setting the search parameters. Until then you can just try doing searches with the default parameters, or, if you’re adventurous, try to figure out how to set them well for your application.