Why is encrypting with the same one-time-pad not good?

https://cs.stackexchange.com/questions/349

16-10-2019
|

سؤال

To encrypt a message $m_1$ with a one-time-pad key $k$ you do $Enc(m_1,k) = m_1 \oplus k$.

If you use the same $k$ to encrypt a different message $m_2$ you get $Enc(m_2,k) = m_2 \oplus k$, and if you perform Xor of the two ciphertext you get $$( m_1 \oplus k) \oplus ( m_2 \oplus k) = m_1 \oplus m_2$$

so, OK, there is some information leakage becuse you learn $m_1 \oplus m_2$, but why is it not secure? I have no way to learn (say) $m_1$ unless I know $m_2$. So why is it wrong to use $k$ twice??

المحلول

I have no way to learn (say) $m_1$ unless I know $m_2$.

That is exactly the problem - if you re-use the same key, and someone has access to one message you encrypted in both plaintext and encrypted form, they can use that to find your key: $$ (m_2 \oplus k) \oplus m_2 = k $$

As an alternative scenario, if you use the same key over and over, the attackers may be able to guess just pieces of various encrypted message, and each successful guess reveals a piece of the key $k$, so that over time more and more of the key is revealed.

This general strategy for breaking a cryptosystem is known as a known plaintext attack. Many systems, like AES and RSA, are believed to be secure against these attacks. But a one-time pad becomes completely insecure against them unless a new pad is used for every encryption, which is why they are named "one-time pads".

نصائح أخرى

It's insecure precisely due to the reason you mention - there is some information leakage.

Basically, if you have any assumptions about plaintexts (english text, files with known structure, etc), it leads to an easy statistical analysis. Probably using it twice doesn't change the practicality of the attack significantly, but using it many times with a non-random plaintext, eventually reveals enough information to recover the key.

Finally, if you have the ability to use it only twice, you also have the ability to use it only once - the restriction is that these one-time-pads are not to be used potentially unknown and over time, damaging number of times.

At one extreme, if you're aware (known plain-text) that the $m_1$ is just a null-string of the length of the pad, you have handed the attacker the key prior to computing $m_2$.

Known plain-text attacks are fairly common, it's reasonably easy to coerce an encryption mechanism to encrypt something you know a-priori. If not, you can usually make reasonable statistical assumptions.

Suppose you use a one-time pad with English text, and you use it twice. Now, you can get $(m_1 \oplus k) \oplus (m_2 \oplus k) = m_1 \oplus m_2$.

English text has an entropy of something like 1.3 bits per letter. The XOR of two messages has 2.6 bits per letter. There are 26 letters in the alphabet, so this gives $\log_2 26 = 4.7$ bits per letter. This means that, even if you take the XOR of two English texts, there is still theoretically enough information to decode both of these texts. Furthermore, I am pretty sure that I have read that cryptanalysts are able to do it in practice.

If you want to use a one-time pad twice, you need to compress your message first. And even then, if you don't use a nearly-perfect compression algorithm, and you use the one-time pad multiple times, there will be enough entropy left to theoretically recover the messages. I don't know how hard it would be in practice.

The thing is that even if you know nothing about $m_1$ and $m_2$, you might be able to recover them both from $m_1 \oplus m_2$.

Actually, for many cases, it is very simple. Here's a simple visualization.

Here's an intuitive way of representing the approach without recourse to mathematics. Let's say you have two encrypted messages which have been encrypted by the same one time pad.

Make a guess at a word or phrase that may be contained in one of the messages. Let's say the phrase "Weather report"
Starting with message 1, assume that "Weather Report" occurs at the first letter position.
Back-calculate the first 14 characters of the one time pad.
Decrypt the first 14 characters of the message 2 using the back-calculated OTP.
If the plaintext looks like gobble-di-gook, then go back to step 2 and repeat at the 2nd letter position. However, if you get meaningful text (for example "Good Morning I" then congratulations, you've worked out the first 14 characters of the OTP (and the first 14 characters of each letter)
If you get to the end of message 1 without throwing up anything other than random letters, then you can conclude that the phrase "Weather Report" does not occur in message 1. Go back to step 1 with a different phrase such as "Dear Colonel"

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى cs.stackexchange