today i go over a tool i wrote to learn and run simple experiments on PIR (a fascinating research subject in cryptography).
this particular research is highly based on the work of alexandra henzinger on SimplePIR/DoublePIR and janmajaya mall’s zuzalu demo.
if you are advanced in the subject already, here is an excellent presentation by alexandra (at the simons institute, a foundation from quant-father jim simons, which also sponsored part of my research in my phd in physics):
0000. introduction to pir and this work
0001. magick-py, a tool for simple PIR
0010. setting up magick-py
0011. experiment I: linear key regev encryption with sampled error
0100. experiment II: linear key regev encryption with a scaled message
0101. experiment III: proving the scheme is additive homomorphic
0110. experiment IV: proving the scheme supports plaintext inner product
0111. experiment V: a very simple pir setup without encryption
1000. experiment VI: a full secret key regev PIR experiment
1001. concluding thoughts
private information retrieval (PIR) was first introduced in 1995 by b. chor et al. and refers to the ability to query a database without revealing which item is looked up or whether it exists, by using cryptography primitives.
it’s a pretty cool technology. once PIR becomes less expensive or prohibitive (i.e., cheaper computation with a small cipher, as PIR inherently has a high cost for server-side computation), these are some of the possible applications that could utilize the protocol:
searching IP databases: when filing a new IP, the author must search the IP database to check that no previous entry significantly overlaps with their invention. PIR could allow the search to be performed without leaving search terms on the query log of the IP database.
real-time asset quotes: investors interested in a particular asset often monitor the market to determine when to purchase. PIR could allow their interest to be confidential.
safe browsing and private oracles, checking passwords over breached databases (or any type of credentials), certificate transparency (CT) checks, certificate revocation checks, among many others.
by the way, PIR schemes are generally divided into single-server schemes and multiple-server schemes (which allows you to remove the trust from a subset of the servers).
in this research, we will look at simple single-server PIR protocol setups, where a server holds an embedded database D
represented by a n x n
square matrix (whose elements are under a constant modulo), and a client wants to privately read the ith
database item (Di
, with n
elements) without letting the server learn about i
.
the subject of PIR is also a subset of the broad topic of lattice-based cryptography, which refers to a series of quantum-resistant cryptographic primitives that involve lattices, either in their construction or in the security proof.
💡 Over an n-dimensional vector space, a lattice is an infinite set of points represented by a collection of vectors.
if you need an introduction to quantum cryptography, check out my (uber-old and outdated) def con ‘16 talk:
💡 in group theory, a lattice in the
R^n
is an infinite set of points in this space in which coordinate-wise addition or subtraction of two points produces another point, so every point in the space is within some maximum distance of any lattice point.a lattice can also be described as a free abelian (commutative) group of dimension
n
, spanning the vector spaceR^n
; or the symmetry group of a discrete translation symmetry inn
directions.if you would like to learn more about group theory, especially in a language for physicists, here is a free and open-source graduate book i wrote back when i was a student.
before we start, we need to review the concept of homomorphic encryption.
suppose a server that can XOR
some client’s data. the client would send their cipher c0
, obtained from their plaintext data m0
and their key k0
:
c = m0 ⌖ k0
homomorphism is the property that if a client sends two encrypted messages, c1
and c2
(from messages m0
and m1
, respectively), the server can return c1 ⌖ c2
so the client can retrieve m0 ⌖ m1
.
additive homomorphism occurs when, given two ciphertexts (a0, c0)
and (a1, c1)
, their sum (a0 + a1, c0 + c1)
decrypts to the sum of the plaintexts (provided that the error remains sufficiently small).
partially homomorphic encryption can be easily achieved as it accepts the possibility that not all data is encrypted (or homomorphic) through other operations (such as multiplication).
fully homomorphic encryption (FWE), which is much harder to achieve, would occur if a server operated on encrypted data without seeing ANY of its content.
💡 in a more formal definition, homomorphic encryption is a form of encryption with evaluation capability for computing over encrypted data without access to the secret key, i.e., supporting arbitrary computation on ciphers. fully homomorphic encryption could be defined as the evaluation of arbitrary circuits of multiple types of (unbounded depth) gates (relevant to zero-knowledge proof setups). vitalik’s talks about homomorphic encryption in this post.
a subsequent important progress in the the field was a 2005 seminal paper, where oded regev introduced the first lattice-based public-key encryption scheme, and the learning with errors (LWE) problem.
the LWE problem relies on the hardness of distinguishing between a message with added noise and a random sample. it can be thought of as a search in a (noisy) modular set of equations whose solutions can be very difficult to solve. in other words, given m samples of coefficients (bi, ai) in the linear equation bi = <ai, s> + ei
, with the error ei
sampled from a small range [-bound, bound]
, finding the secret key s is "hard".
note, however, that LWE-based encryption schemes have a significant drawback due to noise growth. as the ciphertexts produced by these schemes are noisy encodings of the plaintext, homomorphic operations between ciphertexts increase the magnitude of the noise. if the noise exceeds a certain threshold, the correctness of the decryption may no longer hold. despite this problem, regev encryption can be very efficient for PIR as it is additively homomorphic.
in the past decades, regev's security proof and the LWE scheme's efficiency have been the subject of intense research among cryptographers, including craig gentry's thesis on the first fully homomorphic encryption scheme (2009).
a PIR protocol aims to design schemes that satisfy privacy and correctness constraints while achieving the minimum possible download cost.
💡 the download cost of a PIR scheme is defined as the total number of bits downloaded by the user from all the databases, normalized by the message size. the PIR rate is defined as the reciprocal of the PIR download cost.
one possible implementation approach is to choose a suitable polynomial and then have a single server preprocess the data. this preprocessing depends only on the database D
and the public parameters of the regev encryption scheme, so that the server can reuse the work across many queries from many independent clients.
after the preprocessing step, to answer a client's query, the server must compute only roughly N 32-bit
integer multiplications and additions on a database of N bytes
. the catch is that the client must download a hint matrix about the database contents after this preprocessing.
therefore, a simple serve PIR scheme would comprise two phases:
the offline phase, with pre-computations and the exchange of hints, and
the online phase, with the query processing on the server and response decoding on the client.
the practicality of PIR-based applications is primarily impacted by the query processing time and the hint exchange phase. the theoretical query size grows as the square root of the number of field elements representing the database. for example, the largest query size for a database of 32 GB
is around 600 KB
.
although modern PIR schemes require surprisingly little communication and the protocol works well enough at smaller scales, the time needed to scan it grows proportionally as the database grows. for bigger databases, the process becomes prohibitively inefficient (fetching a database record grows only polylogarithmically with the number of records, N
).
after preprocessing the database, the server can answer a query in time sublinear in N
. thus, the current hard limit on the throughput of PIR schemes is the ratio between the database size and the server time to answer a query (the speed with which the PIR server can read the database from memory).
finally, it's important to note that PIR protocols do not ensure data integrity or authentication. an authenticated PIR scheme could combine an unauthenticated multi-server PIR scheme with a standard integrity-protection mechanism, such as merkle trees.
in this approach, PIR servers download the data from the blockchain to construct PIR databases. for each database, the PIR server creates a description file (usually called a manifest file). the user collects all available block headers and fetches the manifest files from the PIR servers to query the PIR database later efficiently.
this paper introduces a design for SimplePIR, the fastest single-server PIR scheme known to date.
the security is held under a Learning with Errors scheme that requires no polynomial arithmetic or fast fourier transforms. regev encryption gives a secret-key encryption scheme that is secure under the LWE assumption.
to answer a client’s query, the server performs fewer than one 32-bit multiplication and one 32-bit addition per database byte, achieving 10 GB/s/core server throughput.
the first approach to query a 1 GB database demands the client to first download a 121 MB "hint" about the database contents. then, the client can make any number of queries, each requiring 242 KB of communication.
the second approach shrinks the hint to 16 MB. then, the following queries demand 345 KB of communication.
finally, the scheme is applied, together with a novel data structure for approximate set membership, to private auditing in certificate transparency. the results can be compared to google chrome’s current approach, with 16 MB of downloads per month, and 150 bytes per TLS connection.
in our code, the single-server database is represented by a square matrix (m x m)
, while a query is a vector filled by 0s
except at the asking row and column (m x 1)
. any result should have the same dimension as the query vector (i.e., the space is reduced to the size of the column where the data is located).
the server retrieves the queried item by:
looping over every column and multiplying their values to the value in the same row of the query vector, and
adding the values found in each column in its own matrix.
a secret key regev encryption scheme using sampled errors to reproduce LWE is then built on top of the ideas above. privacy is guaranteed by checking that fully homomorphic encryption is held with respect to addition in this setup (i.e., additive homomorphism).
now that we have some context, we are ready to do some coding and math.
in the following sections, i will be illustrating a simple single-server PIR setup with LWE, built on several progressive experiments:
running a simple linear key regev encryption experiment with sampled error
running a simple linear key regev encryption experiment with a scaled message
proving that the regev scheme is additive homomorphic
proving that the regev scheme supports plaintext inner product
running a very simple pir setup (without encryption)
running the full secret key regev PIR experiment
we will now be building the parts of the tool i wrote to help understand how PIR can work. i call it magick:
before we jump into the experiments, let’s take a look at the main parts of magic-py’s source code:
more specifically, we want to look at the primitive classes.
this is how Message()
in message.py
looks:
class Message:
def __init__(self, mod=None, rows=None, cols=None, message=None):
"""Initialize a message vector"""
self.mod = mod
self.rows = rows
self.cols = cols
self.message = message
############################
# Private methods
############################
def _check_dimensions(self, other_msg) -> None:
"""Check the dimensions of two matrices"""
if self.rows != other_msg.rows or self.cols != other_msg.cols:
log_error(f'Matrices have different dimensions:')
exit_with_error(f'{self.rows}x{self.cols} and {other_msg.rows}x{other_msg.cols}')
def __add__(self, vector):
"""Add two matrices"""
self._check_dimensions(vector)
for i in range(len(self.message)):
self.message[i] = (self.message[i] + vector.message[i]) % self.mod
return self
def __sub__(self, vector):
"""Subtract two matrices"""
self._check_dimensions(vector)
for index in range(len(self.message)):
self.message[index] = (self.message[index] - vector.message[index]) % self.mod
return self
def __mul__(self, vector):
"""Multiply two matrices"""
this_vector = [0] * (self.rows * vector.cols)
for i in range(self.rows):
for j in range(self.cols):
for k in range(vector.cols):
this_vector[i * vector.cols + k] = (this_vector[i * vector.cols + k] + \
(self.message[i * self.cols + j] * \
vector.message[j * vector.cols + k])) % self.mod
return Message(self.mod, self.rows, vector.cols, this_vector)
def __eq__(self, vector):
"""Check if two matrices are equal"""
return (self.rows == vector.rows) and \
(self.cols == vector.cols) and \
(self.message == vector.message)
def __repr__(self):
"""Print the message vector"""
return f'\nRows: {self.rows}\nCols: {self.cols}\nVector: {self.message}\n'
############################
# Public methods
############################
def calculate_scaling(self, numerator, denominator, this_mod):
"""Calculate the scaled message vector"""
this_vector = [0] * (self.rows * self.cols)
for i in range(len(self.message)):
this_vector[i] = round((numerator * self.message[i]) / denominator) % this_mod
return Message(this_mod, self.rows, self.cols, this_vector)
def set_query_element(self, row, col, value) -> None:
"""Set the value at a particular index"""
self.message[row * self.cols + col] = value
def get_query_element(self, row, col) -> int:
"""Get the value at a particular index"""
return self.message[row * self.cols + col]
############################
# Static methods
############################
@staticmethod
def create_random_message(mod, rows, cols):
"""Create a random message vector"""
return Message(mod, rows, cols, [random.randint(0, mod - 1) for _ in range(rows * cols)])
@staticmethod
def create_zero_message(mod, rows, cols):
"""Create a zero message vector"""
return Message(mod, rows, cols, [0 for _ in range(rows * cols)])
@staticmethod
def calculate_sample_error(bound, mod, rows, cols):
"""Create a random error message vector"""
return Message(mod, rows, cols, [sample_error(bound) % mod for _ in range(cols * rows)])
and this is how Regev()
in regev.py
looks:
class Regev():
def __init__(self):
self.mod = None
self.n = None
self.m = None
self.p = None
self.bound = None
self._load_env_parameters()
############################
# Private methods
############################
def _load_env_parameters(self) -> None:
"""Load environment variables"""
env_vars = load_config()
self.mod = int(env_vars['mod'])
self.n = int(env_vars['n'])
self.m = int(env_vars['m'])
self.p = int(env_vars['p'])
self.bound = int(env_vars['bound'])
############################
# Public methods
############################
def print_results(self, m0, m1, m0_string, m1_string) -> None:
"""Print the results of the experiment"""
if m0 == m1:
log_info(f'Original msg was successfully retrieved!\n')
else:
log_error(f'Original msg was not retrieved.')
log_info(f'{m0_string}: {m0}\n')
log_info(f'{m1_string}: {m1}\n')
log_info(f'Parameters: \nmod: {self.mod} \nn: {self.n} \nm: {self.m} \np: {self.p} \nbound: [-{self.bound}, {self.bound}] \n')
def print_noise_growth(self, m0, m1, noise_growth) -> None:
"""Print the noise growth"""
log_info(f'Correct decryption for Delta / 2: {(self.mod / self.p) / 2}? {m0 == m1}')
log_info(f'Noise growth: {noise_growth.message[0]}')
def create_secret_key(self, this_mod=None, msg_n=1):
"""Create a secret key vector"""
if this_mod is None:
this_mod = self.mod
return Message.create_random_message(this_mod, self.n, msg_n)
def create_message_setup(self, this_m=None, this_n=None, this_mod=None, msg_n=None):
"""Create a message vector setup"""
if this_mod is None:
this_mod = self.mod
if this_m is None:
this_m = self.m
if this_n is None:
this_n = self.n
if msg_n is None:
msg_n = 1
# message vector of size `m`, where each element has a modulus `mod`
m0= Message.create_random_message(this_mod, self.m, msg_n)
# public
A = Message.create_random_message(self.mod, self.m, self.n)
# error vector
e = Message.calculate_sample_error(self.bound, self.mod, self.m, msg_n)
return m0, A, e
############################
# Static methods
############################
@staticmethod
def calculate_encryption(A, s, e, m0):
"""
Encrypt this message with a simple `B = A * s + e + m0`,
where `s` is the secret and `e` is the error vector.
Set the cipher as the tuple c = (B, A).
"""
B = (A * s) + e + m0
return (B, A)
@staticmethod
def calculate_decryption(s, c):
"""
Calculate the decryption of a ciphertext, given c
and a secret, such that m1 = m0 + e.
"""
B = c[0]
A = c[1]
return B - (A * s)
in the subsequent sections, we review each experiment's code and results. but first, let me show you how you can set up the tool.
install the requirements in a virtual environment with:
python3 -m venv venv
source venv/bin/activate
make install_deps
create a .env
file and set the environment variables and parameters.
LWE parameters are:
size of msg vector, m
and n
message’s modulo mod
and p
a work around the sampling errors (i.e., the standard variation sigma
of a Gaussian distribution with zero mean sigma
) by setting a bound
range for them
you should also take a look at lattice-estimator to learn how to make security estimates for LWE parameters.
finally, install magick with:
make install
in this simple experiment of learning with error (LWE), we operate our message vector over a ring modulo mod
, so some information is lost. luckily, gaussian elimination can still be used to recover the original message vector as it works over a ring modulo mod
.
the method for this experiment can be found at experiments/simple_encryption.py
:
def linear_secret_key_regev_encryption_with_error() -> None:
"""
This method runs a secret key Regev encryption and decryption
experiment for a msg vector with a sampled error vector.
In this simple example of learning with error (LWE), we operate
our message vector over a ring modulo mod, such that some
information is lost. This is not a problem since gaussian elimination
can be used to recover the original message vector (i.e., it works
over a ring modulo mod).
We represent the message vector m0 of size m where each element is
modulus mod. The cipertext c is B = A * s + e + m0, which can be
decrypted as c = (B, A).
"""
########################################################################
# 1. Key generation
########################################################################
regev = Regev()
m0, A, e = regev.create_message_setup()
s = regev.create_secret_key()
########################################################################
# 2. Encryption by calculating B and ciphertext c
########################################################################
c = regev.calculate_encryption(A, s, e, m0)
########################################################################
# 3. Calculate the decryption of the ciphertext c
########################################################################
m1 = regev.calculate_decryption(s, c)
########################################################################
# 4. The message vector m1 should be equal to m0 plus the error vector e
########################################################################
regev.print_results(m0, m0 + e, 'm0', 'm0 + e')
simply put, these are the steps of this experiment:
represent a message vector m0
of size m
, where each element has modulo mod
.
encrypt this message with a simple B = A * s + e + m0
, where s
is the secret and e
is an error vector.
set the ciphertext as the tuple c = (B, A)
.
decrypt c = (B, A)
for a given s
, such that m1 = m0 + e
.
here is an example of an output:
in this another simple example of learning with error (LWE), we lose information on the least significant bits by adding noise, i.e., by scaling the message vector (before adding it to encryption) with:
delta = mod / p
then, during the decryption, we scale the message vector back by:
1 / delta
the scaling ensures that m
is in the highest bits of the message vector, without losing information by adding the error vector e
.
consequently, the message m0
vector has each element modulo p
(not mod
), where p < q
.
the scaled message is
m0_scaled = m0 * delta = m0 * mod / p
the ciphertext c
is
B = A * s + e + m0_scaled
which can be decrypted as
c = (B, A)
here is the source code:
def linear_secret_key_regev_encryption_scaled() -> None:
"""
This method runs a secret key regev encryption and decryption experiment
for a msg vector with a scaled msg vector.
In this another simple example of learning with error (LWE), we loose
information on least significant bits by adding noise, i.e., by scaling
the message vector by delta = mod / p before adding it to encryption.
Then, during the decryption, we scale the message vector by 1 / delta.
The scaling ensures that m is in the highest bits of the message vector,
without losing information with the addition of the error vector e.
Now, the message m0 vector has each element module p (not mod), where
p < q. The scaled message is now m0_scaled = m0 * delta = m0 * mod / p.
The cipertext c is B = A * s + e + m0_scaled, which can be decrypted as
c = (B, A), i.e., m0 = (B - A * s) / delta = (delta * m0 + e) / delta.
"""
########################################################################
# 1. Key generation
########################################################################
regev = Regev()
m0, A, e = regev.create_message_setup(this_mod = regev.p)
s = regev.create_secret_key()
########################################################################
# 2. Scale message vector by delta = mod / p
########################################################################
scaled_m0 = m0.calculate_scaling(regev.mod, regev.p, regev.mod)
########################################################################
# 3. Encryption by calculating B and ciphertext c
########################################################################
c = regev.calculate_encryption(A, s, e, scaled_m0)
########################################################################
# 4. Calculate the decryption of the ciphertext c
########################################################################
m1 = regev.calculate_decryption(s, c)
########################################################################
# 5. Scale m1 vector by 1/ delta = p / mod
########################################################################
scaled_m1 = m1.calculate_scaling(regev.p, regev.mod, regev.p)
########################################################################
# 6. The message vector m0 should be equal to m1
########################################################################
regev.print_results(m0, scaled_m1, 'm0', 'scaled m1')
here is an example of an output:
we saw that additive homomorphism means that if c0
is the encryption of m1
under secret key s
and c2
is the encryption of m2
under the same secret key s
, then c0 + c1
is the encryption of m0 + m1
under s
.
for a large number of ci
, noise can be introduced from error, so the correctness of the results will depend on the values of m
, n
, mod
, and p
, such that:
|sum ei| < mod / (2 * p)
here is the source code for this experiment:
def additive_homomorphism() -> None:
"""
This method proves that the secret key Regev encryption scheme is
additive homomorphic, i.e., if c0 encrypts m0 and c1 encrypts m1,
both under s, then c0 + c1 decrypts to m0 + m1.
"""
########################################################################
# 1. Key generation for two independent messages m0 and m1
########################################################################
r0 = Regev()
m0, A0, e0 = r0.create_message_setup(this_mod = r0.p)
r1 = Regev()
m1, A1, e1 = r1.create_message_setup(this_mod = r1.p)
s = r0.create_secret_key()
########################################################################
# 3. Scale message vectors by delta = mod / p
########################################################################
scaled_m0 = m0.calculate_scaling(r0.mod, r0.p, r0.mod)
scaled_m1 = m1.calculate_scaling(r1.mod, r1.p, r1.mod)
########################################################################
# 4. Encryption by calculating B and ciphertext c for each message
########################################################################
c0 = r0.calculate_encryption(A0, s, e0, scaled_m0)
c1 = r1.calculate_encryption(A1, s, e1, scaled_m1)
########################################################################
# 5. Add the ciphertexts, with c2 = c0 + c1
########################################################################
c2 = (c0[0] + c1[0], c0[1] + c1[1])
########################################################################
# 6. Decrypt the sum of the ciphertexts
########################################################################
r2 = Regev()
m2 = r2.calculate_decryption(s, c2)
########################################################################
# 5. Scale m1 vector by 1/ delta = p / mod
########################################################################
scaled_m2 = m2.calculate_scaling(r2.p, r2.mod, r2.p)
########################################################################
# 6. The sum of the message vectors m0 and m1 should be equal to m2
########################################################################
r2.print_results(m0 + m1, scaled_m2, 'm0 + m1', 'm2')
here is an example of an output:
this experiment shows that given a cipher c
and a message vector m0
, c -> c1
can be transformed such that it also encrypts the inner product of m0
with a plaintext vector k
of size m
and element modulo p
.
because of noise growth with the vector k
, fine-tuning the initial parameters is crucial for the message to be successfully retrieved. as you will see in the snippet below, to guarantee correct decryption, the following must hold:
k * e0 < mod / (2 * p)
here is the source code:
def plaintext_inner_product() -> None:
"""
This method proves that the secret key Regev encryption scheme is
supports plaintext inner product, i.e., if c0 encrypts m0 and c1
encrypts m1, both under s, then c0 * c1 decrypts to m0 * m1.
"""
########################################################################
# 1. Key generation
########################################################################
r0 = Regev()
m0, A, e = r0.create_message_setup(this_mod = r0.p)
s = r0.create_secret_key(this_mod = r0.p)
########################################################################
# 2. Scale message vector by delta = mod / p
########################################################################
scaled_m0 = m0.calculate_scaling(r0.mod, r0.p, r0.mod)
########################################################################
# 3. Encryption by calculating B and ciphertext c
########################################################################
c = r0.calculate_encryption(A, s, e, scaled_m0)
########################################################################
# 4. Calculate a plaintext vector transposed k and then scale it by
# delta = mod / p
########################################################################
rk = Regev()
k = m0.create_random_message(rk.p, 1, rk.m )
scaled_k = m0.calculate_scaling(1, 1, rk.mod)
########################################################################
# 5. Calculate the noise growth
########################################################################
noise_growth = scaled_k * e
########################################################################
# 6. Define the ciphertext of the inner product of m0 and k
########################################################################
c1 = (scaled_k * c[0], scaled_k * c[1])
########################################################################
# 7. Decrypt the ciphertext of the inner product of m0 and k
########################################################################
m1 = r0.calculate_decryption(s, c1)
########################################################################
# 8. Scale m1 vector by 1/ delta = p / mod
########################################################################
m1_scaled = m1.calculate_scaling(r0.p, r0.mod, r0.p)
########################################################################
# 9. Scale back the plaintext vector k by 1/ delta = p / mod
########################################################################
scaled_scaled_k = scaled_k.calculate_scaling(1, 1, rk.p)
########################################################################
# 10. The message vector m1 scaled should be equal scaled k * m0
########################################################################
r0.print_results(m1_scaled, scaled_scaled_k * m0, 'scaled m1', 'scaled k * m0')
########################################################################
# 11. Print results on noise, decryption fails when noise > delta / 2
########################################################################
rk.print_noise_growth(m1_scaled, scaled_scaled_k * m0, noise_growth)
here is an example of a successful decryption:
now, changing MOD_P
from 10 to 100, we see a failed decryption case, showing how noise growth can interfere with the results:
in this experiment, we get the first taste of how PIR works, but without encryption yet.
to do that, we define our server’s database by a square vector of size m x m
, with each entry modulo p
.
we query a value at a specific row r
and col c
in plaintext, by creating a query vector of size m x 1
that is filled with 0
, except for the desired column index c
.
we then show that computing the dot product of the database vector to the query vector will give a result vector with all rows in the column index c
, where you can retrieve the row r
.
here is the source code, located at experiments/simple_pir.py
:
def no_encryption_example() -> None:
"""
Run a tutorial presenting the logic of a PIR experiment
without encryption.
"""
########################################################################
# 1. Represent a database as a square matrix, where the columns are
# the database entries and the rows are the database attributes
########################################################################
log_debug('In this PIR tutorial, we represent a database as a square matrix, ' +
'where columns are the database entries and rows are the database attributes.')
log_debug('We start the class Message(), creating a random database ' +
'with mod 500, and 20 entries and 20 attributes.\n')
msg = Message()
db = msg.create_random_message(500, 20, 20)
log_debug(f'db: {db}\n')
########################################################################
# 2. Create some random query value for row and column
########################################################################
log_debug('Now, let\'s create a random query value for row and column. ' +
'Say, row 10 and column 10.')
query_row = 10
query_col = 10
log_debug(f'query_row: {query_row}, query_col: {query_col}\n')
########################################################################
# 3. Create a message that is 5 at the query column and 0 elsewhere
########################################################################
log_debug('Let\'s create a query message vector, of size 500, that is 1 at ' +
'the query column and 0 elsewhere.')
query = msg.create_zero_message(500, 20, 1)
query.set_query_element(query_col, 0, 1)
log_debug(f'query vector: {query.message}')
########################################################################
# 4. Compute resulting message vector
########################################################################
log_debug('Let\'s compute the resulting message vector, which is the ' +
'dot product of the database and the query.')
result = db * query
log_debug(f'result = db * query: {result}\n')
########################################################################
# 5. Compute msg retrieved from the database
########################################################################
log_debug('Finally, let\'s compute the message retrieved from the database, ' +
'by getting the element at the query row and column.')
log_debug(f'db.get_query_element({query_row}, {query_col}): {db.get_query_element(query_row, query_col)}\n')
log_debug('This should be the same as the result message vector element at the query row.')
log_debug(f'result.get_query_element({query_row}, 0): {result.get_query_element(query_row, 0)}\n')
correct_retrieval = result.get_query_element(query_row, 0) == \
db.get_query_element(query_row, query_col)
log_info(f'Are they the same? Did we get a correct retrieval? {correct_retrieval}')
here is an example of an output:
we are ready to run our first full PIR experiment, where we build a query vector as in the previous experiment, but encrypt it using the secret key s
from the regev encryption scheme.
here is the source code:
def secret_key_regev_example() -> None:
"""Run a secret key Regev encryption and decryption PIR experiment."""
########################################################################
# 1. Represent a database as a square matrix, where the columns are
# the database entries and the rows are the database attributes
########################################################################
regev = Regev()
msg = Message()
log_debug('1. We start creating a random message vector ' +
'as a square m x m database with mod p')
db = msg.create_random_message(regev.p, regev.m, regev.m)
log_debug(f'db: {db}\n')
########################################################################
# 2. Create some random query value for row and column
########################################################################
log_debug('2. Now, let\'s create a random query value for row and column.')
query_row = 5
query_col = 5
log_debug(f'query_row: {query_row}, query_col: {query_col}\n')
########################################################################
# 3. Create query message vector
########################################################################
log_debug('3. Let\'s create a query message vector, of size m, that is 1 at ' +
'the query column and 0 elsewhere.')
query = msg.create_zero_message(regev.mod, regev.m, 1)
query.set_query_element(query_col, 0, 1)
log_debug(f'query vector: {query.message}\n')
########################################################################
# 4. Encrypt query message vector
########################################################################
log_debug('4. Let\'s encrypt the query message vector, calculating A and e.')
_, A, e = regev.create_message_setup()
# Here we could either use mod or p as the scaling factor.
s = regev.create_secret_key()
log_debug(f'The secret key s: {s}')
########################################################################
# 5. Scale query vector by delta = mod / p and db vector from p to mod
########################################################################
log_debug('5. We scale the query vector by delta=mod/p and db vector to 1/p')
scaled_query = query.calculate_scaling(regev.mod, regev.p, regev.mod)
scaled_db = db.calculate_scaling(1, 1, regev.mod)
log_debug(f'scaled_query: {scaled_query}')
log_debug(f'scaled_db: {scaled_db}\n')
########################################################################
# 6. Encryption by calculating B and ciphertext c
########################################################################
log_debug('6. Let\'s encrypt the query vector by calculating B and ciphertext c.')
c_query = regev.calculate_encryption(A, s, e, scaled_query)
log_debug(f'c_query: {c_query}\n')
########################################################################
# 7. Compute encrypted result
########################################################################
log_debug('7. Let\'s compute the encrypted result by calculating the dot ' +
'product of the encrypted query and the encrypted database.')
c_result = (scaled_db * c_query[0], scaled_db * c_query[1])
log_debug(f'c_result: {c_result}\n')
########################################################################
# 8. Calculate the decryption of the ciphertext c_result to find the
# result of the PIR query at the query_col th column
########################################################################
log_debug('8. Let\'s calculate the decryption of the ciphertext c_result')
m1 = regev.calculate_decryption(s, c_result)
log_debug(f'm1: {m1}\n')
########################################################################
# 9. Scale the result by p / mod
########################################################################
log_debug('9. Let\'s scale the result by p / mod.')
m1_scaled = m1.calculate_scaling(regev.p, regev.mod, regev.p)
log_debug(f'm1_scaled: {m1_scaled}\n')
########################################################################
# 10. The message vector m1_scaled should be equal to the db at the
# query vector query_row, query_col, showing that PIR works.
########################################################################
log_debug('10. The message vector m1_scaled should be equal to the db at ' +
'the query vector query_row, query_col, showing that PIR works.')
log_debug(f'db.get_query_element({query_row}, {query_col}): {db.get_query_element(query_row, query_col)}')
log_debug(f'm1_scaled.get_query_element({query_row}, 0): {m1_scaled.get_query_element(query_row, 0)}\n')
correct_retrieval = m1_scaled.get_query_element(query_row, 0) == \
scaled_db.get_query_element(query_row, query_col)
log_info(f'Are they the same? Did we get a correct retrieval? {correct_retrieval}\n')
here is an example of an output when you run this experiment:
if you would like to learn more about PIR, here are some canonical papers:
private information retrieval and its applications, sajani vithana et al.
practical private information retrieval, femi george olumofin
how practical is single-server private information retrieval?, sophia artioli
applying private information retrieval to lightweight bitcoin clients, kaihua oin et al.
computationally PIR with polylogarithmic communication, c. cachin et al.
single-database PIR with constant communication rate, c. gentry et al.
reducing the servers’ computation in PIR retrieval, a. beimel et al.
piano, extremely simple, single-Server PIR with sublinear server computation, m. zhou et al.