Re: Working on a cryptogram decryption program, need some advice.
- From: "paulmd@xxxxxxx" <paulmd@xxxxxxx>
- Date: Mon, 13 Aug 2007 20:52:12 -0700
On Aug 13, 8:19 pm, Richard Heathfield <r...@xxxxxxxxxxxxxxx> wrote:
pau...@xxxxxxx said:
To amuse myself, and to keep on my C programming skills, I'm writing a
simple substitution cypher decrypter. At the moment, All it does is
is simple-shift cyphers (aka Ceaser, aka rotX) (but it does generally
print out the correct answer, instead of printing out all 26
possibilities, by using a small wordlist). I'm looking for advice on
how to do letter frequency analysis. Just in general.
If your cryptogram retains spaces, it's worth looking at word patterns
(and keeping a pattern dictionary). To illustrate what I mean, here are
some patterns, which will reward a few seconds' study:
WORD PATTERN
big abc
word abcd
repeat abcbde
repeated abcbdebf
mississippi abccbccbddb
If you found the space-delimited pattern abccbccbddb in your ciphertext,
that would just about guarantee that your plaintext contained
Mississippi (but watch the replies to this thread for some
counter-examples!). And that would give you your shift right there.
This is not so much "letter frequency analysis" as "letter pattern
analysis", though.
Letter frequency analysis is normally done something like this:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <ctype.h>
#include <assert.h>
struct frequency_map_
{
unsigned char data;
unsigned long count;
};
typedef struct frequency_map_ frequency_map;
int FrequencyAnalyse(frequency_map *pfm, size_t nelems, FILE *input,
unsigned long *counter)
{
int ch = 0;
assert(nelems == UCHAR_MAX + 1);
assert(counter != NULL);
while((ch = getc(input)) != EOF)
{
++*counter; /* keep track of total number of characters */
++pfm[ch].count; /* store the count */
pfm[ch].data = ch; /* make sure we know which character this count
is for */
}
return ferror(input);
}
int FrequencyCompare(const void *vf1, const void *vf2)
{
const frequency_map *f1 = vf1;
const frequency_map *f2 = vf2;
return (f1->count < f2->count) - (f1->count > f2->count);
}
int FrequencyDisplay(frequency_map *pfm, size_t nelems, unsigned long
counter, int suppress_zero)
{
size_t i = 0;
do
{
if(pfm[i].count > 0 || !suppress_zero)
{
printf("Code %3d (", pfm[i].data);
if(isprint(pfm[i].data))
{
printf(" %c", pfm[i].data);
}
else
{
printf("!!");
}
printf(") %6lu (%6.2f%%)\n", pfm[i].count, pfm[i].count * 100.0 /
counter);
}
} while(++i < nelems);
return ferror(stdout);
}
int main(int argc, char **argv)
{
int rc = 1;
if(argc > 1)
{
FILE *fp = fopen(argv[1], "rb");
if(fp != NULL)
{
frequency_map fm[UCHAR_MAX + 1] = {0};
unsigned long counter = 0;
rc = FrequencyAnalyse(fm, sizeof fm / sizeof fm[0], fp, &counter);
fclose(fp);
if(0 == rc)
{
qsort(fm, sizeof fm / sizeof fm[0], sizeof fm[0],
FrequencyCompare);
rc = FrequencyDisplay(fm, sizeof fm / sizeof fm[0], counter, 1);
}
}
}
else
{
fputs("Specify a file to frequency-analyse\n", stderr);
}
return rc;
}
You can use the above program to analyse the text corpus of your choice,
however large (within reason, that is - but it will easily handle far
more text than you might reasonably want to prepare for it). That will
give you a reference frequency table against which to compare your
cryptogram frequency table. That should crack the cryptogram wide open
straight away, especially if it's just a shift cipher.
I've got the shift cyphers. :) Only 26 of those, as opposed to 26!
for the regular cryptograms.
But remember to compare like with like. If your corpus has (unencrypted)
spaces, that's great, but if it doesn't, leave them out of your
reference table. If your cryptogram is ALL UPPER CASE, either force
your corpus text to upper case too, or simply combine lower and upper
case results in the table. And so on.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
This is great! I'll see what I can do with it.
.
- References:
- Working on a cryptogram decryption program, need some advice.
- From: paulmd@xxxxxxx
- Re: Working on a cryptogram decryption program, need some advice.
- From: Richard Heathfield
- Working on a cryptogram decryption program, need some advice.
- Prev by Date: Re: Enigma 1448 - Birthday magic
- Next by Date: Re: Dividing a Cake
- Previous by thread: Re: Working on a cryptogram decryption program, need some advice.
- Next by thread: Re: Working on a cryptogram decryption program, need some advice.
- Index(es):
Relevant Pages
|