on simple private information retrieval experiments

tl; dr

today i go over a tool i wrote to learn and run simple experiments on PIR (a fascinating research subject in cryptography).

here is the source code.

this particular research is highly based on the work of alexandra henzinger on SimplePIR/DoublePIR and janmajaya mall’s zuzalu demo.

if you are advanced in the subject already, here is an excellent presentation by alexandra (at the simons institute, a foundation from quant-father jim simons, which also sponsored part of my research in my phd in physics):


👾 today’s outline

0000. introduction to pir and this work
0001. magick-py, a tool for simple PIR 
0010. setting up magick-py 
0011. experiment I: linear key regev encryption with sampled error
0100. experiment II: linear key regev encryption with a scaled message
0101. experiment III: proving the scheme is additive homomorphic
0110. experiment IV: proving the scheme supports plaintext inner product
0111. experiment V: a very simple pir setup without encryption
1000. experiment VI: a full secret key regev PIR experiment
1001. concluding thoughts

🎶 today’s mood


0000. introduction to private information retrieval, lattice-based cryptography, and fully homomorphic encryption schemes

what’s PIR

private information retrieval (PIR) was first introduced in 1995 by b. chor et al. and refers to the ability to query a database without revealing which item is looked up or whether it exists, by using cryptography primitives.

it’s a pretty cool technology. once PIR becomes less expensive or prohibitive (i.e., cheaper computation with a small cipher, as PIR inherently has a high cost for server-side computation), these are some of the possible applications that could utilize the protocol:

  • searching IP databases: when filing a new IP, the author must search the IP database to check that no previous entry significantly overlaps with their invention. PIR could allow the search to be performed without leaving search terms on the query log of the IP database.

  • real-time asset quotes: investors interested in a particular asset often monitor the market to determine when to purchase. PIR could allow their interest to be confidential.

  • safe browsing and private oracles, checking passwords over breached databases (or any type of credentials), certificate transparency (CT) checks, certificate revocation checks, among many others.

by the way, PIR schemes are generally divided into single-server schemes and multiple-server schemes (which allows you to remove the trust from a subset of the servers).

in this research, we will look at simple single-server PIR protocol setups, where a server holds an embedded database D represented by a n x n square matrix (whose elements are under a constant modulo), and a client wants to privately read the ith database item (Di, with n elements) without letting the server learn about i.


interlude: lattice-based cryptography and group theory

the subject of PIR is also a subset of the broad topic of lattice-based cryptography, which refers to a series of quantum-resistant cryptographic primitives that involve lattices, either in their construction or in the security proof.

💡 Over an n-dimensional vector space, a lattice is an infinite set of points represented by a collection of vectors.

if you need an introduction to quantum cryptography, check out my (uber-old and outdated) def con ‘16 talk:

💡 in group theory, a lattice in the R^n is an infinite set of points in this space in which coordinate-wise addition or subtraction of two points produces another point, so every point in the space is within some maximum distance of any lattice point.

a lattice can also be described as a free abelian (commutative) group of dimension n, spanning the vector space R^n; or the symmetry group of a discrete translation symmetry in n directions.

if you would like to learn more about group theory, especially in a language for physicists, here is a free and open-source graduate book i wrote back when i was a student.

.an excerpt from my book, addressing homomorphism for covering groups in lie algebra.
.an excerpt from my book, addressing homomorphism for covering groups in lie algebra.

homomorphic encryption schemes

before we start, we need to review the concept of homomorphic encryption.

suppose a server that can XOR some client’s data. the client would send their cipher c0, obtained from their plaintext data m0 and their key k0:

c = m0 ⌖ k0

homomorphism is the property that if a client sends two encrypted messages, c1 and c2 (from messages m0 and m1, respectively), the server can return c1 ⌖ c2 so the client can retrieve m0 ⌖ m1.

additive homomorphism occurs when, given two ciphertexts (a0, c0) and (a1, c1), their sum (a0 + a1, c0 + c1) decrypts to the sum of the plaintexts (provided that the error remains sufficiently small).

partially homomorphic encryption can be easily achieved as it accepts the possibility that not all data is encrypted (or homomorphic) through other operations (such as multiplication).

fully homomorphic encryption (FWE), which is much harder to achieve, would occur if a server operated on encrypted data without seeing ANY of its content.

💡 in a more formal definition, homomorphic encryption is a form of encryption with evaluation capability for computing over encrypted data without access to the secret key, i.e., supporting arbitrary computation on ciphers. fully homomorphic encryption could be defined as the evaluation of arbitrary circuits of multiple types of (unbounded depth) gates (relevant to zero-knowledge proof setups). vitalik’s talks about homomorphic encryption in this post.


learning with errors (LWE)

a subsequent important progress in the the field was a 2005 seminal paper, where oded regev introduced the first lattice-based public-key encryption scheme, and the learning with errors (LWE) problem.

the LWE problem relies on the hardness of distinguishing between a message with added noise and a random sample. it can be thought of as a search in a (noisy) modular set of equations whose solutions can be very difficult to solve. in other words, given m samples of coefficients (bi, ai) in the linear equation bi = <ai, s> + ei, with the error ei sampled from a small range [-bound, bound], finding the secret key s is "hard".

note, however, that LWE-based encryption schemes have a significant drawback due to noise growth. as the ciphertexts produced by these schemes are noisy encodings of the plaintext, homomorphic operations between ciphertexts increase the magnitude of the noise. if the noise exceeds a certain threshold, the correctness of the decryption may no longer hold. despite this problem, regev encryption can be very efficient for PIR as it is additively homomorphic.

in the past decades, regev's security proof and the LWE scheme's efficiency have been the subject of intense research among cryptographers, including craig gentry's thesis on the first fully homomorphic encryption scheme (2009).


a simple implementation of the PIR protocol

a PIR protocol aims to design schemes that satisfy privacy and correctness constraints while achieving the minimum possible download cost.

💡 the download cost of a PIR scheme is defined as the total number of bits downloaded by the user from all the databases, normalized by the message size. the PIR rate is defined as the reciprocal of the PIR download cost.

one possible implementation approach is to choose a suitable polynomial and then have a single server preprocess the data. this preprocessing depends only on the database D and the public parameters of the regev encryption scheme, so that the server can reuse the work across many queries from many independent clients.

after the preprocessing step, to answer a client's query, the server must compute only roughly N 32-bit integer multiplications and additions on a database of N bytes. the catch is that the client must download a hint matrix about the database contents after this preprocessing.

therefore, a simple serve PIR scheme would comprise two phases:

  • the offline phase, with pre-computations and the exchange of hints, and

  • the online phase, with the query processing on the server and response decoding on the client.

the practicality of PIR-based applications is primarily impacted by the query processing time and the hint exchange phase. the theoretical query size grows as the square root of the number of field elements representing the database. for example, the largest query size for a database of 32 GB is around 600 KB.


why PIR is still not feasible

although modern PIR schemes require surprisingly little communication and the protocol works well enough at smaller scales, the time needed to scan it grows proportionally as the database grows. for bigger databases, the process becomes prohibitively inefficient (fetching a database record grows only polylogarithmically with the number of records, N).

after preprocessing the database, the server can answer a query in time sublinear in N. thus, the current hard limit on the throughput of PIR schemes is the ratio between the database size and the server time to answer a query (the speed with which the PIR server can read the database from memory).

finally, it's important to note that PIR protocols do not ensure data integrity or authentication. an authenticated PIR scheme could combine an unauthenticated multi-server PIR scheme with a standard integrity-protection mechanism, such as merkle trees.

in this approach, PIR servers download the data from the blockchain to construct PIR databases. for each database, the PIR server creates a description file (usually called a manifest file). the user collects all available block headers and fetches the manifest files from the PIR servers to query the PIR database later efficiently.


"simple and fast single-server private information retrieval", by alexandra henzinger et. al (2022)

  • this paper introduces a design for SimplePIRthe fastest single-server PIR scheme known to date.

  • the security is held under a Learning with Errors scheme that requires no polynomial arithmetic or fast fourier transforms. regev encryption gives a secret-key encryption scheme that is secure under the LWE assumption.

  • to answer a client’s query, the server performs fewer than one 32-bit multiplication and one 32-bit addition per database byte, achieving 10 GB/s/core server throughput.

  • the first approach to query a 1 GB database demands the client to first download a 121 MB "hint" about the database contents. then, the client can make any number of queries, each requiring 242 KB of communication.

  • the second approach shrinks the hint to 16 MB. then, the following queries demand 345 KB of communication.

  • finally, the scheme is applied, together with a novel data structure for approximate set membership, to private auditing in certificate transparency. the results can be compared to google chrome’s current approach, with 16 MB of downloads per month, and 150 bytes per TLS connection.


this work: my illustration of simplePIR

in our code, the single-server database is represented by a square matrix (m x m), while a query is a vector filled by 0s except at the asking row and column (m x 1). any result should have the same dimension as the query vector (i.e., the space is reduced to the size of the column where the data is located).

the server retrieves the queried item by:

  1. looping over every column and multiplying their values to the value in the same row of the query vector, and

  2. adding the values found in each column in its own matrix.

a secret key regev encryption scheme using sampled errors to reproduce LWE is then built on top of the ideas above. privacy is guaranteed by checking that fully homomorphic encryption is held with respect to addition in this setup (i.e., additive homomorphism).


the rest of this article

now that we have some context, we are ready to do some coding and math.

in the following sections, i will be illustrating a simple single-server PIR setup with LWE, built on several progressive experiments:

  1. running a simple linear key regev encryption experiment with sampled error

  2. running a simple linear key regev encryption experiment with a scaled message

  3. proving that the regev scheme is additive homomorphic

  4. proving that the regev scheme supports plaintext inner product

  5. running a very simple pir setup (without encryption)

  6. running the full secret key regev PIR experiment


0001. magick-py, a CLI tool for simple PIR

we will now be building the parts of the tool i wrote to help understand how PIR can work. i call it magick:


primitive classes for message and regev encryption

before we jump into the experiments, let’s take a look at the main parts of magic-py’s source code:

more specifically, we want to look at the primitive classes.

this is how Message() in message.py looks:

class Message:

    def __init__(self, mod=None, rows=None, cols=None, message=None):
        """Initialize a message vector"""

        self.mod = mod
        self.rows = rows
        self.cols = cols
        self.message = message


    ############################
    #      Private methods 
    ############################

    def _check_dimensions(self, other_msg) -> None:
        """Check the dimensions of two matrices"""

        if self.rows != other_msg.rows or self.cols != other_msg.cols:
                log_error(f'Matrices have different dimensions:')
                exit_with_error(f'{self.rows}x{self.cols} and {other_msg.rows}x{other_msg.cols}')

    def __add__(self, vector):
        """Add two matrices"""

        self._check_dimensions(vector)
        
        for i in range(len(self.message)):
            self.message[i] = (self.message[i] + vector.message[i]) % self.mod

        return self

    def __sub__(self, vector):
        """Subtract two matrices"""

        self._check_dimensions(vector)

        for index in range(len(self.message)):
            self.message[index] = (self.message[index] - vector.message[index]) % self.mod
        
        return self

    def __mul__(self, vector):
        """Multiply two matrices"""

        this_vector = [0] * (self.rows * vector.cols)
        for i in range(self.rows):
            for j in range(self.cols):
                for k in range(vector.cols):
                    this_vector[i * vector.cols + k] = (this_vector[i * vector.cols + k] + \
                                                       (self.message[i * self.cols + j] * \
                                                       vector.message[j * vector.cols + k])) % self.mod
        
        return Message(self.mod, self.rows, vector.cols, this_vector)
    
    def __eq__(self, vector):
        """Check if two matrices are equal"""

        return (self.rows == vector.rows) and \
               (self.cols == vector.cols) and \
               (self.message == vector.message)

    def __repr__(self):
        """Print the message vector"""

        return f'\nRows: {self.rows}\nCols: {self.cols}\nVector: {self.message}\n'


    ############################
    #     Public methods 
    ############################

    def calculate_scaling(self, numerator, denominator, this_mod):
        """Calculate the scaled message vector"""

        this_vector = [0] * (self.rows * self.cols)

        for i in range(len(self.message)):
            this_vector[i] = round((numerator * self.message[i]) / denominator) % this_mod

        return Message(this_mod, self.rows, self.cols, this_vector)

    def set_query_element(self, row, col, value) -> None:
        """Set the value at a particular index"""

        self.message[row * self.cols + col] = value
        
    def get_query_element(self, row, col) -> int:
        """Get the value at a particular index"""

        return self.message[row * self.cols + col]


    ############################
    #     Static methods 
    ############################

    @staticmethod
    def create_random_message(mod, rows, cols): 
        """Create a random message vector"""

        return Message(mod, rows, cols, [random.randint(0, mod - 1) for _ in range(rows * cols)])

    @staticmethod
    def create_zero_message(mod, rows, cols): 
        """Create a zero message vector"""

        return Message(mod, rows, cols, [0 for _ in range(rows * cols)])

    @staticmethod
    def calculate_sample_error(bound, mod, rows, cols): 
        """Create a random error message vector"""

        return Message(mod, rows, cols, [sample_error(bound) % mod for _ in range(cols * rows)])

and this is how Regev() in regev.py looks:

class Regev():

    def __init__(self):

        self.mod = None
        self.n = None
        self.m = None
        self.p = None
        self.bound = None
        self._load_env_parameters()

    ############################
    #      Private methods
    ############################

    def _load_env_parameters(self) -> None:
        """Load environment variables"""

        env_vars = load_config()
        self.mod = int(env_vars['mod'])
        self.n = int(env_vars['n'])
        self.m = int(env_vars['m'])
        self.p = int(env_vars['p'])
        self.bound = int(env_vars['bound'])


    ############################
    #      Public methods
    ############################

    def print_results(self, m0, m1, m0_string, m1_string) -> None:
        """Print the results of the experiment"""

        if m0 == m1:
            log_info(f'Original msg was successfully retrieved!\n')
        else:
            log_error(f'Original msg was not retrieved.')

        log_info(f'{m0_string}: {m0}\n')
        log_info(f'{m1_string}: {m1}\n')
        log_info(f'Parameters: \nmod: {self.mod} \nn: {self.n} \nm: {self.m} \np: {self.p} \nbound: [-{self.bound}, {self.bound}] \n')

    def print_noise_growth(self, m0, m1, noise_growth) -> None:
        """Print the noise growth"""

        log_info(f'Correct decryption for Delta / 2: {(self.mod / self.p) / 2}? {m0 == m1}')
        log_info(f'Noise growth: {noise_growth.message[0]}')

    def create_secret_key(self, this_mod=None, msg_n=1):
        """Create a secret key vector"""

        if this_mod is None:
            this_mod = self.mod

        return  Message.create_random_message(this_mod, self.n, msg_n)

    def create_message_setup(self, this_m=None, this_n=None, this_mod=None, msg_n=None):
        """Create a message vector setup"""
        
        if this_mod is None:
            this_mod = self.mod
        
        if this_m is None:
            this_m = self.m
        
        if this_n is None:
            this_n = self.n
        
        if msg_n is None:
            msg_n = 1

        # message vector of size `m`, where each element has a modulus `mod`
        m0= Message.create_random_message(this_mod, self.m, msg_n)

        # public    
        A = Message.create_random_message(self.mod, self.m, self.n)

        # error vector
        e = Message.calculate_sample_error(self.bound, self.mod, self.m, msg_n)

        return m0, A, e

    ############################
    #      Static methods
    ############################

    @staticmethod
    def calculate_encryption(A, s, e, m0):
        """
            Encrypt this message with a simple `B = A * s + e + m0`, 
            where `s` is the secret and `e` is the error vector.
            Set the cipher as the tuple c = (B, A).
        """

        B = (A * s) + e + m0
        return (B, A)

    @staticmethod
    def calculate_decryption(s, c):
        """ 
            Calculate the decryption of a ciphertext, given c
            and a secret, such that m1 = m0 + e.
        """

        B = c[0]
        A = c[1]
        return B - (A * s)

in the subsequent sections, we review each experiment's code and results. but first, let me show you how you can set up the tool.


0010. setting up magick-py

install the requirements in a virtual environment with:

python3 -m venv venv
source venv/bin/activate
make install_deps

create a .env file and set the environment variables and parameters.

LWE parameters are:

  • size of msg vector, m and n

  • message’s modulo mod and p

  • a work around the sampling errors (i.e., the standard variation sigma of a Gaussian distribution with zero mean sigma) by setting a bound range for them

you should also take a look at lattice-estimator to learn how to make security estimates for LWE parameters.

finally, install magick with:

make install

0011. experiment I: simple linear encryption and decryption of a msg vector with a sampled error vector

in this simple experiment of learning with error (LWE), we operate our message vector over a ring modulo mod, so some information is lost. luckily, gaussian elimination can still be used to recover the original message vector as it works over a ring modulo mod.

the method for this experiment can be found at experiments/simple_encryption.py:

def linear_secret_key_regev_encryption_with_error() -> None:
    """ 
        This method runs a secret key Regev encryption and decryption 
        experiment for a msg vector with a sampled error vector.

        In this simple example of learning with error (LWE), we operate
        our message vector over a ring modulo mod, such that some
        information is lost. This is not a problem since gaussian elimination
        can be used to recover the original message vector (i.e., it works
        over a ring modulo mod).

        We represent the message vector m0 of size m where each element is
        modulus mod. The cipertext c is B = A * s + e + m0, which can be
        decrypted as c = (B, A).
    """

    ########################################################################
    # 1. Key generation
    ########################################################################
    regev = Regev()
    m0, A, e = regev.create_message_setup()
    s = regev.create_secret_key()

    ########################################################################
    # 2. Encryption by calculating B and ciphertext c
    ########################################################################
    c = regev.calculate_encryption(A, s, e, m0)

    ########################################################################
    # 3. Calculate the decryption of the ciphertext c
    ########################################################################
    m1 = regev.calculate_decryption(s, c)

    ########################################################################
    # 4. The message vector m1 should be equal to m0 plus the error vector e
    ########################################################################
    regev.print_results(m0, m0 + e, 'm0', 'm0 + e')

simply put, these are the steps of this experiment:

  1. represent a message vector m0 of size m, where each element has modulo mod.

  2. encrypt this message with a simple B = A * s + e + m0, where s is the secret and e is an error vector.

  3. set the ciphertext as the tuple c = (B, A).

  4. decrypt c = (B, A) for a given s, such that m1 = m0 + e.

here is an example of an output:


0100. experiment II: secret key regev encryption by scaling a message vector

in this another simple example of learning with error (LWE), we lose information on the least significant bits by adding noise, i.e., by scaling the message vector (before adding it to encryption) with:

delta = mod / p

then, during the decryption, we scale the message vector back by:

1 / delta

the scaling ensures that m is in the highest bits of the message vector, without losing information by adding the error vector e.

consequently, the message m0 vector has each element modulo p (not mod), where p < q.

the scaled message is

m0_scaled = m0 * delta = m0 * mod / p

the ciphertext c is

B = A * s + e + m0_scaled

which can be decrypted as

 c = (B, A)

here is the source code:

def linear_secret_key_regev_encryption_scaled() -> None:
    """ 
        This method runs a secret key regev encryption and decryption experiment
        for a msg vector with a scaled msg vector.

        In this another simple example of learning with error (LWE), we loose
        information on least significant bits by adding noise, i.e., by scaling 
        the message vector by delta = mod / p before adding it to encryption. 
        Then, during the decryption, we scale the message vector by 1 / delta.

        The scaling ensures that m is in the highest bits of the message vector,
        without losing information with the addition of the error vector e.

        Now, the message m0 vector has each element module p (not mod), where
        p < q. The scaled message is now m0_scaled = m0 * delta = m0 * mod / p.
        The cipertext c is B = A * s + e + m0_scaled, which can be decrypted as
        c = (B, A), i.e., m0 = (B - A * s) / delta = (delta * m0 + e) / delta.
    """

    ########################################################################
    # 1. Key generation
    ########################################################################
    regev = Regev()
    m0, A, e = regev.create_message_setup(this_mod = regev.p)
    s = regev.create_secret_key()

    ########################################################################
    # 2. Scale message vector by delta = mod / p
    ########################################################################
    scaled_m0 = m0.calculate_scaling(regev.mod, regev.p, regev.mod)

    ########################################################################
    # 3. Encryption by calculating B and ciphertext c
    ########################################################################
    c = regev.calculate_encryption(A, s, e, scaled_m0)

    ########################################################################
    # 4. Calculate the decryption of the ciphertext c
    ########################################################################
    m1 = regev.calculate_decryption(s, c)

    ########################################################################
    # 5. Scale m1 vector by 1/ delta = p / mod
    ########################################################################
    scaled_m1 = m1.calculate_scaling(regev.p, regev.mod, regev.p)

    ########################################################################
    # 6. The message vector m0 should be equal to m1
    ########################################################################
    regev.print_results(m0, scaled_m1, 'm0', 'scaled m1')

here is an example of an output:


0101. experiment III: proving that the secret key regev encryption scheme supports additive homomorphism

we saw that additive homomorphism means that if c0 is the encryption of m1 under secret key s and c2 is the encryption of m2 under the same secret key s, then c0 + c1 is the encryption of m0 + m1 under s.

for a large number of ci, noise can be introduced from error, so the correctness of the results will depend on the values of m, n, mod, and p, such that:

|sum ei| < mod / (2 * p)

here is the source code for this experiment:

def additive_homomorphism() -> None:
    """ 
        This method proves that the secret key Regev encryption scheme is
        additive homomorphic, i.e., if c0 encrypts m0 and c1 encrypts m1,
        both under s, then c0 + c1 decrypts to m0 + m1. 
    """

    ########################################################################
    # 1. Key generation for two independent messages m0 and m1
    ########################################################################
    
    r0 = Regev()
    m0, A0, e0 = r0.create_message_setup(this_mod = r0.p)

    r1 = Regev()
    m1, A1, e1 = r1.create_message_setup(this_mod = r1.p)

    s = r0.create_secret_key()

    ########################################################################
    # 3. Scale message vectors by delta = mod / p
    ########################################################################
    scaled_m0 = m0.calculate_scaling(r0.mod, r0.p, r0.mod)
    scaled_m1 = m1.calculate_scaling(r1.mod, r1.p, r1.mod)

    ########################################################################
    # 4. Encryption by calculating B and ciphertext c for each message
    ########################################################################
    c0 = r0.calculate_encryption(A0, s, e0, scaled_m0)
    c1 = r1.calculate_encryption(A1, s, e1, scaled_m1)

    ########################################################################
    # 5. Add the ciphertexts, with c2 = c0 + c1
    ########################################################################
    c2 = (c0[0] + c1[0], c0[1] + c1[1])

    ########################################################################
    # 6. Decrypt the sum of the ciphertexts
    ########################################################################
    r2 = Regev()
    m2 = r2.calculate_decryption(s, c2)

    ########################################################################
    # 5. Scale m1 vector by 1/ delta = p / mod
    ########################################################################
    scaled_m2 = m2.calculate_scaling(r2.p, r2.mod, r2.p)

    ########################################################################
    # 6. The sum of the message vectors m0 and m1 should be equal to m2
    ########################################################################
    r2.print_results(m0 + m1, scaled_m2, 'm0 + m1', 'm2')

here is an example of an output:


0110. experiment IV: proving that the secret key regev encryption scheme supports plaintext inner product

this experiment shows that given a cipher c and a message vector m0, c -> c1 can be transformed such that it also encrypts the inner product of m0 with a plaintext vector k of size m and element modulo p.

because of noise growth with the vector k, fine-tuning the initial parameters is crucial for the message to be successfully retrieved. as you will see in the snippet below, to guarantee correct decryption, the following must hold:

k * e0 < mod / (2 * p)

here is the source code:

def plaintext_inner_product() -> None:
    """ 
        This method proves that the secret key Regev encryption scheme is
        supports plaintext inner product, i.e., if c0 encrypts m0 and c1
        encrypts m1, both under s, then c0 * c1 decrypts to m0 * m1.
    """

    ########################################################################
    # 1. Key generation
    ########################################################################
    r0 = Regev()
    m0, A, e = r0.create_message_setup(this_mod = r0.p)
    s = r0.create_secret_key(this_mod = r0.p)

    ########################################################################
    # 2. Scale message vector by delta = mod / p
    ########################################################################
    scaled_m0 = m0.calculate_scaling(r0.mod, r0.p, r0.mod)

    ########################################################################
    # 3. Encryption by calculating B and ciphertext c
    ########################################################################
    c = r0.calculate_encryption(A, s, e, scaled_m0)

    ########################################################################
    # 4. Calculate a plaintext vector transposed k and then scale it by
    #    delta = mod / p
    ########################################################################
    rk = Regev()
    k = m0.create_random_message(rk.p, 1, rk.m )
    scaled_k = m0.calculate_scaling(1, 1, rk.mod)

    ########################################################################
    # 5. Calculate the noise growth 
    ########################################################################
    noise_growth = scaled_k * e

    ########################################################################
    # 6. Define the ciphertext of the inner product of m0 and k
    ########################################################################
    c1 = (scaled_k * c[0], scaled_k * c[1])

    ########################################################################
    # 7. Decrypt the ciphertext of the inner product of m0 and k
    ########################################################################
    m1 = r0.calculate_decryption(s, c1)

    ########################################################################
    # 8. Scale m1 vector by 1/ delta = p / mod
    ########################################################################
    m1_scaled = m1.calculate_scaling(r0.p, r0.mod, r0.p)

    ########################################################################
    # 9. Scale back the plaintext vector k by 1/ delta = p / mod
    ########################################################################
    scaled_scaled_k = scaled_k.calculate_scaling(1, 1, rk.p)

    ########################################################################
    # 10. The message vector m1 scaled should be equal scaled k * m0
    ########################################################################
    r0.print_results(m1_scaled, scaled_scaled_k * m0, 'scaled m1', 'scaled k * m0')

    ########################################################################
    # 11. Print results on noise, decryption fails when noise > delta / 2 
    ########################################################################
    rk.print_noise_growth(m1_scaled, scaled_scaled_k * m0, noise_growth)

here is an example of a successful decryption:

now, changing MOD_P from 10 to 100, we see a failed decryption case, showing how noise growth can interfere with the results:


0111. experiment V: run an intro tutorial on how PIR should work (without encryption)

in this experiment, we get the first taste of how PIR works, but without encryption yet.

to do that, we define our server’s database by a square vector of size m x m, with each entry modulo p.

we query a value at a specific row r and col c in plaintext, by creating a query vector of size m x 1 that is filled with 0, except for the desired column index c.

we then show that computing the dot product of the database vector to the query vector will give a result vector with all rows in the column index c, where you can retrieve the row r.

here is the source code, located at experiments/simple_pir.py:

def no_encryption_example() -> None:
    """
        Run a tutorial presenting the logic of a PIR experiment 
        without encryption.
    """

    ########################################################################
    # 1. Represent a database as a square matrix, where the columns are 
    #    the database entries and the rows are the database attributes
    ########################################################################
    log_debug('In this PIR tutorial, we represent a database as a square matrix, ' + 
        'where columns are the database entries and rows are the database attributes.')
    
    log_debug('We start the class Message(), creating a random database ' +
                    'with mod 500, and 20 entries and 20 attributes.\n')

    msg = Message()
    db = msg.create_random_message(500, 20, 20)
    
    log_debug(f'db: {db}\n')

    ########################################################################
    # 2. Create some random query value for row and column
    ########################################################################
    log_debug('Now, let\'s create a random query value for row and column. ' +
                                            'Say, row 10 and column 10.')
    
    query_row = 10
    query_col = 10

    log_debug(f'query_row: {query_row}, query_col: {query_col}\n')

    ########################################################################
    # 3. Create a message that is 5 at the query column and 0 elsewhere
    ########################################################################
    log_debug('Let\'s create a query message vector, of size 500, that is 1 at ' +
                                            'the query column and 0 elsewhere.')
    query = msg.create_zero_message(500, 20, 1)
    query.set_query_element(query_col, 0, 1)

    log_debug(f'query vector: {query.message}')

    ########################################################################
    # 4. Compute resulting message vector
    ########################################################################
    log_debug('Let\'s compute the resulting message vector, which is the ' +
                               'dot product of the database and the query.')
    
    result = db * query
    log_debug(f'result = db * query: {result}\n')

    ########################################################################
    # 5. Compute msg retrieved from the database
    ########################################################################
    log_debug('Finally, let\'s compute the message retrieved from the database, ' + 
                    'by getting the element at the query row and column.')
    log_debug(f'db.get_query_element({query_row}, {query_col}): {db.get_query_element(query_row, query_col)}\n')

    log_debug('This should be the same as the result message vector element at the query row.')
    log_debug(f'result.get_query_element({query_row}, 0): {result.get_query_element(query_row, 0)}\n')

    correct_retrieval = result.get_query_element(query_row, 0) == \
                        db.get_query_element(query_row, query_col)

    log_info(f'Are they the same? Did we get a correct retrieval? {correct_retrieval}')

here is an example of an output:


1000. experiment VI: run a simple PIR experiment with secret key regev encryption

we are ready to run our first full PIR experiment, where we build a query vector as in the previous experiment, but encrypt it using the secret key s from the regev encryption scheme.

here is the source code:

def secret_key_regev_example() -> None:
    """Run a secret key Regev encryption and decryption PIR experiment."""
    ########################################################################
    # 1. Represent a database as a square matrix, where the columns are 
    #    the database entries and the rows are the database attributes
    ########################################################################
    
    regev = Regev()
    msg = Message()

    log_debug('1. We start creating a random message vector ' + 
                                 'as a square m x m database with mod p')
    
    db = msg.create_random_message(regev.p, regev.m, regev.m)
    log_debug(f'db: {db}\n')

    ########################################################################
    # 2. Create some random query value for row and column
    ########################################################################
    log_debug('2. Now, let\'s create a random query value for row and column.')
   
    query_row = 5
    query_col = 5

    log_debug(f'query_row: {query_row}, query_col: {query_col}\n')

    ########################################################################
    # 3. Create query message vector
    ########################################################################
    log_debug('3. Let\'s create a query message vector, of size m, that is 1 at ' +
                                            'the query column and 0 elsewhere.')                

    query = msg.create_zero_message(regev.mod, regev.m, 1)
    query.set_query_element(query_col, 0, 1)

    log_debug(f'query vector: {query.message}\n')

    ########################################################################
    # 4. Encrypt query message vector
    ########################################################################
    log_debug('4. Let\'s encrypt the query message vector, calculating A and e.')
   
    _, A, e = regev.create_message_setup()

    # Here we could either use mod or p as the scaling factor.
    s = regev.create_secret_key()

    log_debug(f'The secret key s: {s}')

    ########################################################################
    # 5. Scale query vector by delta = mod / p and db vector from p to mod
    ########################################################################
    log_debug('5. We scale the query vector by delta=mod/p and db vector to 1/p')

    scaled_query = query.calculate_scaling(regev.mod, regev.p, regev.mod)
    scaled_db = db.calculate_scaling(1, 1, regev.mod)

    log_debug(f'scaled_query: {scaled_query}')
    log_debug(f'scaled_db: {scaled_db}\n')

    ########################################################################
    # 6. Encryption by calculating B and ciphertext c
    ########################################################################
    log_debug('6. Let\'s encrypt the query vector by calculating B and ciphertext c.')
    c_query = regev.calculate_encryption(A, s, e, scaled_query)

    log_debug(f'c_query: {c_query}\n')

    ########################################################################
    # 7. Compute encrypted result
    ########################################################################
    log_debug('7. Let\'s compute the encrypted result by calculating the dot ' +
                 'product of the encrypted query and the encrypted database.') 

    c_result = (scaled_db * c_query[0], scaled_db * c_query[1])

    log_debug(f'c_result: {c_result}\n')

    ########################################################################
    # 8. Calculate the decryption of the ciphertext c_result to find the
    #    result of the PIR query at the query_col th column
    ########################################################################
    log_debug('8. Let\'s calculate the decryption of the ciphertext c_result')                 
    m1 = regev.calculate_decryption(s, c_result)

    log_debug(f'm1: {m1}\n') 

    ########################################################################
    # 9. Scale the result by p / mod
    ########################################################################
    log_debug('9. Let\'s scale the result by p / mod.')
    m1_scaled = m1.calculate_scaling(regev.p, regev.mod, regev.p)

    log_debug(f'm1_scaled: {m1_scaled}\n')

    ########################################################################    
    # 10. The message vector m1_scaled should be equal to the db at the 
    # query vector query_row, query_col, showing that PIR works.
    ########################################################################
    log_debug('10. The message vector m1_scaled should be equal to the db at ' +
               'the query vector query_row, query_col, showing that PIR works.')  

    log_debug(f'db.get_query_element({query_row}, {query_col}): {db.get_query_element(query_row, query_col)}') 
    log_debug(f'm1_scaled.get_query_element({query_row}, 0): {m1_scaled.get_query_element(query_row, 0)}\n')            

    correct_retrieval = m1_scaled.get_query_element(query_row, 0) == \
                        scaled_db.get_query_element(query_row, query_col)

    log_info(f'Are they the same? Did we get a correct retrieval? {correct_retrieval}\n')

here is an example of an output when you run this experiment:

did i convince you that PIR (or, more properly, math) feels just like magick?


1001. concluding thoughts

if you would like to learn more about PIR, here are some canonical papers:


◻️♄

Subscribe to go outside labs
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.