So far, we have studied symmetric-key cryptosystems to allow two
parties to communicate securely with each other when they share a secret
key. We have also studied how two parties can establish a shared secret
key using the Diffie-Hellman key exchange algorithm.
One of the limitations of symmetric-key cryptosystems is that a
shared secret key needs to be established for every pair of people who
want to communicate. If there are
people who each want to communicate securely with each other, there are
keys needed:
The first person needs
secret keys to communicate with everyone else.
The second person needs
secret keys to communicate with everyone else besides the first
person.
The third person needs
secret keys to communicate with everyone else besides the first two
people.
This pattern repeats, for a total of .
In this section, we’ll introduce a new form of cryptosystem called a
public-key cryptosystem, for each each person has two
keys: a private key known only to them, and a public key known to
everyone. We’ll see what how to encrypt and decrypt messages in these
cryptosystems, how they reduce the number of keys needed for people to
communicate, and learn about the most widely-used public-key
cryptosystem today, the RSA cryptosystem.
Public-key cryptography
A public-key cryptosystem is one where each party in
the communication generates a pair of keys: a private
(or secret) key, known only to them, and a public key
which is known to everyone. Let’s start with some intuition. Suppose Bob
wants to send Alice a message using a public-key cryptosystem. Since he
knows Alice’s public key, he uses that key to encrypt the message, and
sends her the ciphertext. Then, Alice uses her private key to
decrypt the
ciphertext. Recall that in a symmetric-key cryptosystem, messages
are encrypted and decrypted with the same key–hence, the symmetry.
Public-key cryptosystems are a form of asymmetric cryptosystem,
since different keys are used for encryption and decryption.
Similarly, if Alice wants to send a message to Bob, she uses Bob’s
public key to encrypt the message, and Bob uses his private key to
decrypt it.
More formally, we define a secure public-key
cryptosystem as a system with the following parts:
A set of
possible original messages, called plaintext messages.
(E.g., a set of strings)
A set of
possible encrypted messages, called ciphertext
messages. (E.g., another set of strings)
A set of
possible public keys and a set of possible private
keys.
A subset of possible
public-private key pairs. Note that we use and not because not every public key can be
paired with every private key.
Two functions and that satisfy the following two properties:
(correctness) For all and , . (That is, if you encrypt and then decrypt
the same message with a public-private key pair, you get back the
original message.)
(security) For all and , if an eavesdropper only knows the values of the
public key and the ciphertext
but does not
know , it is computationally
infeasible to find the plaintext message .
The RSA cryptosystem
The Diffie-Hellman key exchange algorithm we studied in the last
section worked by relying on the hardness of the discrete logarithm
problem. This allowed Alice and Bob to communicate their numbers
and $ publicly, without anyone being
able to find the “secret” and
.
The Rivest-Shamir-Adleman (RSA) cryptosystem works
with numbers as well, and relies on the surprising hardness of factoring
large integers. You could write a small Python program to answer this
question quite quickly, but that was only a number with five digits.
What about the number , with 22
digits? In practice, RSA relies on the hardness of factoring integers
with hundreds of digits!
Let’s see how RSA works.
Phase 1: Key generation
Each person in a public-key cryptosystem must first generate a
public-private key pair before they can communicate with anyone else.
(Think about this as choosing a valid key-pair from the set .) For RSA, we’ll put ourselves in Alice’s shoes
and see what she must do to to generate a key pair.
First, Alice picks two distinct prime numbers and .
Next, Alice computes the product .
Then, Alice chooses an integer such that .Remember from 7.5 Modular
Exponentiation and Order that is the number of positive
integers that are coprime to
.
Finally, Alice chooses an integer that is the modular inverse
of modulo .
That is, .
That’s it! Alice’s private key is the tuple , and her public key is the
tuple . Alice shares her
public key with the world, but she never tells her private key to
anyone.
Phase 2: Message encryption
Now suppose that Bob wants to send Alice a plaintext message . For now we’ll treat the message as a
number between and , and will discuss string messages
later on in this section. Bob uses Alice’s public key :
Bob computes the ciphertext and sends it to Alice.
Phase 3: Message decryption
Finally, Alice receives the ciphertext . She uses her private key to decrypt the message:
Alice computes . Techincally, Alice can recompute from the and of her private key. Another version of
RSA is actually just to store in
the private key, or use the from
her public key (which Alice also has access to) and keep only as the private key.
An example
Before moving on, let’s see an example of a full use of the RSA
cryptosystem in action. Alice first needs to generate a public and
private key.
Alice chooses the prime numbers and .
The product is .
Next, Alice needs to choose an where . Alice calculates
that , and
chooses to satisfy the
constraints on .
Finally, Alice calculates the modular inverse to find the last part
of the private key (), so .
At the end of this phase:
Alice’s private key is .
Alice’s public key is .
Now suppose Bob wants to send the number to Alice. He computes the encrypted
number to be and sends it to Alice.
Alice receives the number
from Bob. She computes the decrypted number to be . Voila!
Proving the correctness of
RSA
In the RSA cryptosystem, the encryption and decryption algorithms are
very straightforward. The “interesting” part is in how the
public-private key pair is generated to make the encryption and
decryption work! In this section, we’ll come to understand why the key
generation involves the steps that it does by proving that the RSA
algorithm works correctly, using all the number theory work we developed
in the previous chapter.
Let be a private key and its corresponding public key as generated
by “RSA Phase 1”. Let be the original plaintext message, ciphertext,
and decrypted message, respectively, as described in the RSA encryption
and decryption phases.
Then (i.e., the
decrypted message is the same as the original message).
Let be defined as in the above definition of the RSA algorithm.
We need to prove that .
NOTE: For the rest of this proof, we will introduce one
additional assumption: that . (It is possible to prove this theorem without this
assumption, but we will not do so here).
From the definition of in
the decryption step, we know . From the definition of in the encryption step, we know . Putting these
together, we have:
So we need to prove that . From Steps 3 and 4 of the RSA key generation phase, we
know that , i.e., there exists a such that .
We also know that since , by Euler’s Theorem .
Putting this all together, we have
So .
Since we also know and are between and , we can conclude that .
The security of RSA
Now that we’ve established the correctness of the RSA cryptosystem,
let’s now discuss its security. As we did for the Diffie-Hellman key
exchange, we’ll put ourselves in the role of an eavesdropper who is
trying to gain information about a secret message. Suppose we observe
Bob sending an encrypted message
to Alice. In addition to the ciphertext, we also know Alice’s public key
. Remember that “public” means that everyone
can see it—including possibly malicious users! What information
can we hope to gain about Bob’s original plaintext message?
Approach 1:
Reverse-engineering the message itself
First, we know from the RSA encryption phase that , so if we know all
three of , , and , can we determine the value of ?
No! As we saw in in 8.3 Computing Shared Secret
Keys, we don’t have an efficient way of computing “-th roots” in modular arithmetic—this is
the discrete logarithm problem.
Approach
2: Determine Alice’s private key from her public key
Another approach we could take is to attempt to discover Alice’s
private key. Recall that . So is
the inverse of modulo , and we learned in the last
chapter that we can compute modular inverses, so this should be easy,
right?
Not so fast! We can compute the modular inverse of modulo when we know both and , but right now we only know
, not .
So how do we compute ?
Well, we know that if
where and are distinct primes, then . But here is
the problem: it is not computationally feasible to factor when it is extremely large. This
is our second “computationally hard” problem in computer science, the
Integer Factorization Problem. Despite the best efforts
of computer scientists and mathematicians for centuries, there is no
known efficient general algorithm for factoring integers, and it is this
fact that keeps the RSA private key secure.