Clojure for Cryptanalysis

Presenter Notes

About Me

  • Mark Champine, Security Architect @ Akamai
    • twitter: @mchampine
    • email: tiez.mfitdrjw@izitir.mvt
      • Decode it first! Your first guess is probably right :)
  • Software Engineer since the beginning of time
    • Way back: Computer Graphics, User Interface, Distributed Computing
  • Security professional, 15+ years. HP, IBM, Kronos, Akamai
  • Security Protocols (Kerberos, GSSAPI, SSL)
  • Security Standards (W3C Digital Signature, S/MIME, PKIX)
  • Worked on commercializing Elliptic Curve Cryptography at HP

Disclaimer: "IANAC" (I Am Not A Cryptologist)

Presenter Notes

This talk

Is Not

  • Advanced, sophisticated, or hardcore Cryptanalysis
  • Advanced, sophisticated, or hardcore Clojure

Is

  • An exploration of applying Clojure's strengths to a particular problem domain
  • Just scratching the surface of potentials of Clojure for Crypto

Presenter Notes

Cryptanalysis & Clojure

Taxonomy

  • Cryptology = { Cryptography, Cryptanalysis }
  • Cryptography = techniques for secure communication, e.g. encryption
  • Cryptanalysis = techniques used to breach cryptographic systems

Clojure Strengths re: Cryptology

Clojure is particularly good for the data-analysis phase of cryptanalysis (e.g. data transformations and statistical analysis)

See: clojure.math.combinatorics, core.matrix, hiphip (array), Incanter, Clojure Data Analysis Cookbook.

Less good for: Encryption or Brute-force methods (speed is #1 priority)

Presenter Notes

Warm up

"Frequency analysis is the basic tool for breaking most classical ciphers." -- Wikipedia

Dead Simple frequency analysis with Clojure

"frequencies" is your friend

1 (frequencies "this is not a drill, it is a screwdriver")
2 ==> {\space 8, \a 2, \c 1, \d 2, \e 2, \h 1, \i 6, \, 1, \l 2, \n 1, \o 1, \r 4, \s 4, \t 3, \v 1, \w 1}

With just a little massaging, you can get relative frequencies by percentage

1 (defn pct-frequencies [text]
2   (let [pct #(double (/ (* %1 100) %2))
3     histo (clojure.core/reverse (sort-by val (frequencies text)))]
4   (map #(assoc % 1 (pct (% 1) (count text))) histo)))
5 
6 (def fq (pct-frequencies "this is not a drill, it is a screwdriver"))
7 ==> ([\space 20.0] [\i 15.0] [\s 10.0] [\r 10.0] [\t 7.5] [\l 5.0] [\e 5.0] [\d 5.0] [\a 5.0] [\w 2.5] [\v 2.5] [\o 2.5] [\n 2.5] [\, 2.5] [\h 2.5] [\c 2.5])

Use Incanter bar-chart:

1 (view (bar-chart (map first fq) (map second fq)))

Presenter Notes

Incanter bar chart

Frequencies

Presenter Notes

Wikipedia letter frequencies chart

Frequencies

Presenter Notes

What then?

  • Perform frequency analysis on a ciphertext
    • E.g. just letters, case insensitive.
  • Also: frequency of
    • First letters in words
    • Letter pairs - e.g. "Th"
  • Correlate frequencies in ciphertext with frequencies in plaintext
  • This is effective on many "classic" ciphers. RSA? Not so much.

Presenter Notes

Cryptogram Letter Frequencies

Frequencies

Presenter Notes

Plaintext Letter Frequencies

Frequencies

Presenter Notes

Moving On

Presenter Notes

One Time Pad

  • http://en.wikipedia.org/wiki/One-time_pad
  • Unbreakable (not just maybe, it's mathematically proven)
    • http://en.wikipedia.org/wiki/Information-theoretic_security
  • The "Pad" is a perfectly random stream of bytes
  • Must be as long as the message (limits usefulness)
  • Encryption using a one-time pad is simply the exclusive-or of pad & message.
  • A ONE-TIME PAD MUST ONLY BE USED ONCE! (thus the name!)

If the one time pad is used more than once, we can notice correlations in the ciphertext and it's no longer unbreakable. That is the primary topic of this talk.

Presenter Notes

The Challenge

Decrypt a message...

32510ba98bb4abb8f71a1569a10cff655ccaee3fd8109858800802a5e109a86e0cc413fb6dd8eb2b7bf10c19afefae468a0287d02a4520

...having seen other messages encrypted with the same one-time pad.

315c4eeaa8b5f8aaf9174145bf43e1784b8fa00dc71d885a804e5ee9fa40b16349c146fb778cdf2d3aff021dfff5b403b510d0d0455468aeb98622b137dae857553ccd8883a7bc37520e06e515d22c954eba5025b8cc57ee59418ce7dc6bc41556bdb36bbca3e8774301fbcaa3b83b220809560987815f65286764703de0f3d524400a19b159610b11ef3e
234c02ecbbfbafa3ed18510abd11fa724fcda2018a1a8342cf064bbde548b12b07df44ba7191d9606ef4081ffde5ad46a5069d9f7f543bedb9c861bf29c7e205132eda9382b0bc2c5c4b45f919cf3a9f1cb74151f6d551f4480c82b2cb24cc5b028aa76eb7b4ab24171ab3cdadb8356f
32510ba9a7b2bba9b8005d43a304b5714cc0bb0c8a34884dd91304b8ad40b62b07df44ba6e9d8a2368e51d04e0e7b207b70b9b8261112bacb6c866a232dfe257527dc29398f5f3251a0d47e503c66e935de81230b59b7afb5f41afa8d661cb
32510ba9aab2a8a4fd06414fb517b5605cc0aa0dc91a8908c2064ba8ad5ea06a029056f47a8ad3306ef5021eafe1ac01a81197847a5c68a1b78769a37bc8f4575432c198ccb4ef63590256e305cd3a9544ee4160ead45aef520489e7da7d835402bca670bda8eb775200b8dabbba246b130f040d8ec6447e2c767f3d30ed81ea2e4c1404e1315a1010e7229be6636aaa
3f561ba9adb4b6ebec54424ba317b564418fac0dd35f8c08d31a1fe9e24fe56808c213f17c81d9607cee021dafe1e001b21ade877a5e68bea88d61b93ac5ee0d562e8e9582f5ef375f0a4ae20ed86e935de81230b59b73fb4302cd95d770c65b40aaa065f2a5e33a5a0bb5dcaba43722130f042f8ec85b7c2070
32510bfbacfbb9befd54415da243e1695ecabd58c519cd4bd2061bbde24eb76a19d84aba34d8de287be84d07e7e9a30ee714979c7e1123a8bd9822a33ecaf512472e8e8f8db3f9635c1949e640c621854eba0d79eccf52ff111284b4cc61d11902aebc66f2b2e436434eacc0aba938220b084800c2ca4e693522643573b2c4ce35050b0cf774201f0fe52ac9f26d71b6cf61a711cc229f77ace7aa88a2f19983122b11be87a59c355d25f8e4
32510bfbacfbb9befd54415da243e1695ecabd58c519cd4bd90f1fa6ea5ba47b01c909ba7696cf606ef40c04afe1ac0aa8148dd066592ded9f8774b529c7ea125d298e8883f5e9305f4b44f915cb2bd05af51373fd9b4af511039fa2d96f83414aaaf261bda2e97b170fb5cce2a53e675c154c0d9681596934777e2275b381ce2e40582afe67650b13e72287ff2270abcf73bb028932836fbdecfecee0a3b894473c1bbeb6b4913a536ce4f9b13f1efff71ea313c8661dd9a4ce
315c4eeaa8b5f8bffd11155ea506b56041c6a00c8a08854dd21a4bbde54ce56801d943ba708b8a3574f40c00fff9e00fa1439fd0654327a3bfc860b92f89ee04132ecb9298f5fd2d5e4b45e40ecc3b9d59e9417df7c95bba410e9aa2ca24c5474da2f276baa3ac325918b2daada43d6712150441c2e04f6565517f317da9d3
271946f9bbb2aeadec111841a81abc300ecaa01bd8069d5cc91005e9fe4aad6e04d513e96d99de2569bc5e50eeeca709b50a8a987f4264edb6896fb537d0a716132ddc938fb0f836480e06ed0fcd6e9759f40462f9cf57f4564186a2c1778f1543efa270bda5e933421cbe88a4a52222190f471e9bd15f652b653b7071aec59a2705081ffe72651d08f822c9ed6d76e48b63ab15d0208573a7eef027
466d06ece998b7a2fb1d464fed2ced7641ddaa3cc31c9941cf110abbf409ed39598005b3399ccfafb61d0315fca0a314be138a9f32503bedac8067f03adbf3575c3b8edc9ba7f537530541ab0f9f3cd04ff50d66f1d559ba520e89a2cb2a83
  • The one-time pad is a random byte sequence. Must be as long as the message.

Presenter Notes

First Attempt

Hint: “what happens when you xor with the space character?"

  • Tried XOR first ciphertext with the 2nd, 3rd, etc. looking for patterns.
  • Noticed some columns had lots of “normal alpha characters”.
  • Noticed that ASCII space (0x20) is the bit that differs between small and capital letters.
  • Laborious to do all the XORs one at a time.

(bit-xor ciphertext-1 ciphertext-N)
1.  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
2. ^R  ^P   L  ^F  ^S   N   W  \t  ^T  ^O  ^P   O  ^B   R  ^[  \n  ^D   B
3. ^C  \r   E   C  ^O  ^G   C  ^C   A  ^W  ^\  ^F  ^\   G   T  \t  ^G   O
4. ^C  \r   E   C  ^B  ^G   P  ^N  ^D  ^Q  ^@  \n  \n   T   T  ^X  ^W   O
5. ^N  \n   U   C  ^E  ^A   N   A  ^U   C  ^C  ^N  ^\   T   T  ^\  \n   ^@
6. ^C  \r   E  ^Q  ^D   N   A  ^T  ^D   C  ^@  ^X  ^]  ^@  ^@  ^Q  ^U   E 
7. ^C  \r   E  ^Q  ^D   N   A  ^T  ^D   C  ^@  ^X  ^]  ^@  ^@  ^Q  ^U   E 
8. ^@  ^@  ^@  ^@  ^@  ^@  ^@  ^U  ^D  ^F   T  ^[  ^Z   E   T  ^X  \n   I 
9. ^V   E  \b  ^S  ^S  ^G   V  ^G  ^U  ^F   Y  ^D  ^W   Y   ]   H   E   E 
10. w   1   H  ^F   A   -   O  \b  ^B  \n  ^G  \n   R   o  \f  ^N  \n   R

;; So: likely spaces in 1st ct at char: 3, 7, 14, 18, ...

Presenter Notes

Clojure to the Rescue

  • Excellent sequence handling
  • Mapping over sequences is a primary problem solving technique
  • REPL for exploration/experimentation
  • Java Interop for basic type conversions

Examples

Map functions (e.g. XOR) over byte sequences

1 (map bit-xor bs1 bs2)

Slicing – extract "columns" in data

1 (defn vslice [s n] (map #(nth s %) n)

Also

  • Permutation (1xN, NxN)
  • map, loop, for
  • Also good matrix libs available

Presenter Notes

The Basic Approach

  • The encrypted messages are “ciphertext”
  • One-time pad encryption is simply performing an XOR with the message and they key (i.e. the “one-time pad” (OTP))
  • If two messages were encrypted with the same OTP, then XOR of the encrypted messages is the same as XOR of the non-encrypted messages – because the OTP gets XORed twice – cancelling it out.
  • Make use of the fact that Capital letters differ from small letters only in their 6th bit, or 0x20. Ox20 is ASCII for the space character, therefore, XOR 0x20 just flips its case.

Presenter Notes

The Basic Approach (cont.)

  • If we XOR 2 messages and the result is a Character [A-Za-z] then it’s more likely that one of the messages’ non-encrypted characters is a space!
  • If we XOR the encrypted space with Ox20 then we find out what the OTP key value is at that position.
  • We can test a given ciphertext byte against all the other ciphertext bytes at the same position (I'll call this a vertical slice).
  • If XOR indicates a likely space character when tried against several ciphertexts, we’re pretty sure it’s an actual space and can assign a value to the OTP key at that position.
  • Do this for every position within the collected ciphertexts in order to get as many known keys as possible.

Presenter Notes

Coding the Solution

Utility: Convert hex-encoding to/from byte sequences

 1 (defn hex->bytes 
 2   "convert a hex encoded string into a list of integers" 
 3   [s] 
 4   (let [ctpairs (map #(apply str %) (partition 2 s))] 
 5         (map #(Integer/parseInt % 16) ctpairs)))
 6 
 7 crypto1.core> (hex->bytes "68697468657265") 
 8 (104 105 116 104 101 114 101)
 9 
10 (defn bytes->hex 
11   "convert from an array of ints to a hex encoded string" 
12   [b] 
13   (let [newmsghex (map #(Integer/toHexString %) b)] 
14     (apply str (map #(if (= 1 (count %)) (str "0" %) %) newmsghex))))
15 
16 crypto1.core> (bytes->hex '(104 105 116 104 101 114 101)) 
17 "68697468657265"

Presenter Notes

Coding the Solution

Utility: Convert ASCII to/from hex-encoding

 1 (defn hex->ascii 
 2   "convert a hex encoded string to ascii" 
 3   [s] 
 4   (->> (hex->bytes s) 
 5        (map char) 
 6        (apply str)))
 7 
 8 crypto1.core> (hex->ascii "68697468657265") 
 9 "hithere”
10 
11 (defn ascii->hex 
12   "convert an ascii string to hex encoding" 
13   [s] 
14   (->> (seq s) 
15        (map int) 
16        bytes->hex))
17 
18 crypto1.core> (ascii->hex "hithere") 
19 "68697468657265"

Presenter Notes

Coding the Solution

Get input ciphertext into an easily manipulable form:

1 crypto1.core> (hex->bytes "32510ba9 . . .")
2 (50 81 11 169 . . .)

map hex->bytes over an array of input ciphertext strings to create an array of sequences.

1 (map hex->bytes hex-encoded-ciphertext)

((49 92 78 234 168 181 248 170 249 23 65 69 191 67 225 120 75 143 160...
 (35 76 2 236 187 251 175 163 237 24 81 10 189 17 250 114 79 205 162 1...
 (50 81 11 169 167 178 187 169 184 0 93 67 163 4 181 113 76 192 187 12...
 (50 81 11 169 170 178 168 164 253 6 65 79 181 23 181 96 92 192 170 13...
 (63 86 27 169 173 180 182 235 236 84 66 75 163 23 181 100 65 143 172...
 (50 81 11 251 172 251 185 190 253 84 65 93 162 67 225 105 94 202 189...
 (50 81 11 251 172 251 185 190 253 84 65 93 162 67 225 105 94 202 189...
 (49 92 78 234 168 181 248 191 253 17 21 94 165 6 181 96 65 198 160 12...
 (39 25 70 249 187 178 174 173 236 17 24 65 168 26 188 48 14 202 160...
 (70 109 6 236 233 152 183 162 251 29 70 79 237 44 237 118 65 221 170...)

Presenter Notes

Identify likely key values

 1 (defn likely-space? 
 2   "xor nth element of vertical-slice with all the other elements,
 3   and return a bool to indicate that the number xors yielding a
 4   letter [A-Za-z] has crossed a threshold"
 5   [cbyte vertical-slice] 
 6   (let [xors (map bit-xor (repeat cbyte) vertical-slice) 
 7         isalpha? #(or (<= 65 % 90) (<= 97 % 122)) 
 8         alphas (count (filter isalpha? xors))] 
 9     (>= alphas 6)))
10 
11 (defn vertical-slice [s n] (map #(nth % n) s)) 
12 ;;where “s” is a list of converted ciphertext strings.

Given a vertical slice, xor the nth element with every other one, counting the alpha characters that result. If there are a lot of alphas, the original message character is likely to be a space.

Presenter Notes

Find likely keys for one vertical slice

 1 (defn likely-keys-per-slice
 2   "s: a list of converted ciphertexts
 3    n: slice to analyze
 4    Call likely-space? on each element in turn, and return the key
 5    for any likely space character in the original text"
 6   [s n]
 7   (let [vs (map #(nth % n) s) ;; vertical slice
 8         xor2 #(if (nil? %) nil (bit-xor % 0x20))] 
 9     (->> (for [tb vs :when (likely-space? tb vs)] tb)
10          first
11          xor2)))

Given the list of converted ciphertexts, select the nth slice and call likely-space? on each element in turn, collecting any likely space characters, and using XOR to reveal the OTP key value.

Presenter Notes

Get the likely keys for all vertical slices

1 (def ks (map #(likely-keys-per-slice (map hex->bytes ctx) %) (range 95)))`

Turn the hex-encoded strings (ctx) into to a list of byte sequences. Get the likely keys for every vertical slice (for slice indexes [0..95])

At this point, the only remaining task is to XOR the likely keys (ks) with the unknown ciphertext in order to reveal the secret message.

ks: (102 57 110 137 201 219 216 203 152 116 53 42 205 99 nil 16 46 175 nil 120 ...

1 (defn decrypt [likelykeys ct]
2   (->>
3    (map #(if (nil? %) 0x5F (bit-xor %1 %2)) likelykeys (hex->bytes ct))
4    (map char)
5   (apply str)))
6 
7 (decrypt ks secret-message)

"The Bosson Clo_ure_Group fill meet y Akamai on Maw 8th"

Presenter Notes

Recovering the Exact Key

step 1, play "wheel of fortune" on the approx message. (I.e. guess it)

(decrypt ks secret-message))
==>          "The Bosson Clo_ure_Group fill meet  y Akamai on Maw 8th"
(def realmsg "The Boston Clojure Group will meet at Akamai on May 8th")

step 2, turn the guessed message into a byte sequence

(def realmsg-bytes (map int realmsg))
==> (84 104 101 32 66 111 115 116 111 110 32 67 108 111 106 117 114 101 32 71 114 111 117 112 32 119 105 108 108 32 109 101 101 116 32 97 116 32 65 107 97 109 97 105 32 111 110 32 77 97 121 32 56 116 104)

step 3, xor the guessed message with the encrypted real message to yield the new real-key guess

;;crypto2.core> secret-message
==> "32510ba98bb4abb8f71a1569a10cff655ccaee3fd8109858800802a5e109a86e0cc413fb6dd8eb2b7bf10c19afefae468a0287d02a4520"

(def secret-message-bytes (hex->bytes secret-message))
==> (50 81 11 169 139 180 171 184 247 26 21 105 161 12 255 101 92 202 238 63 216 16 152 88 128 8 2 165 225 9 168 110 12 196 19 251 109 216 235 43 123 241 12 25 175 239 174 70 138 2 135 208 42 69 32)

(def real-key (map bit-xor secret-message-bytes realmsg-bytes))
==> (102 57 110 137 201 219 216 204 152 116 53 42 205 99 149 16 46 175 206 120 170 127 237 40 160 127 107 201 141 41 197 11 105 176 51 154 25 248 170 64 26 156 109 112 143 128 192 102 199 99 254 240 18 49 72)

Presenter Notes

Recovering the Exact Key cont.

Step 4. Try out our new "real" key on on the encrypted message.

By definition it will give back the exact guessed message

(decrypt real-key secret-message)
==> "The Boston Clojure Group will meet at Akamai on May 8th"

Try the new key on other messages encrypted with that same key

(pprint (map #(decrypt real-key (nth ctx %)) (range 10)))
==> ("We can factor the number 15 with quantum computers. We "
==>  "Euler would probably enjoy that now his theorem becomes"
==>  "The nice thing about Keeyloq is now we cryptographers c"
==>  "The ciphertext produced by a weak encryption algorithm "
==>  "You don't want to buy a set of car keys from a guy who "
==>  "There are two types of cryptography - that which will k"
==>  "There are two types of cyptography: one that allows the"
==>  "We can see the point where the chip is unhappy if a wro"
==>  "A (private-key)  encryption scheme states 3 algorithms,"
==>  " The Concise OxfordDictionary (2006) defines crypto as")

This process can be repeated. Choose a longer ciphertext to extend the "real key" discovery.

Presenter Notes

End

Presenter Notes