# Re: Working on a cryptogram decryption program, need some advice.

On Aug 13, 8:19 pm, Richard Heathfield <r...@xxxxxxxxxxxxxxx> wrote:
pau...@xxxxxxx said:

To amuse myself, and to keep on my C programming skills, I'm writing a
simple substitution cypher decrypter. At the moment, All it does is
is simple-shift cyphers (aka Ceaser, aka rotX) (but it does generally
print out the correct answer, instead of printing out all 26
possibilities, by using a small wordlist). I'm looking for advice on
how to do letter frequency analysis. Just in general.

If your cryptogram retains spaces, it's worth looking at word patterns
(and keeping a pattern dictionary). To illustrate what I mean, here are
some patterns, which will reward a few seconds' study:

WORD PATTERN
big abc
word abcd
repeat abcbde
repeated abcbdebf
mississippi abccbccbddb

If you found the space-delimited pattern abccbccbddb in your ciphertext,
that would just about guarantee that your plaintext contained
Mississippi (but watch the replies to this thread for some
counter-examples!). And that would give you your shift right there.

This is not so much "letter frequency analysis" as "letter pattern
analysis", though.

Letter frequency analysis is normally done something like this:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <ctype.h>
#include <assert.h>

struct frequency_map_
{
unsigned char data;
unsigned long count;

};

typedef struct frequency_map_ frequency_map;

int FrequencyAnalyse(frequency_map *pfm, size_t nelems, FILE *input,
unsigned long *counter)
{
int ch = 0;
assert(nelems == UCHAR_MAX + 1);
assert(counter != NULL);

while((ch = getc(input)) != EOF)
{
++*counter; /* keep track of total number of characters */
++pfm[ch].count; /* store the count */
pfm[ch].data = ch; /* make sure we know which character this count
is for */
}

return ferror(input);

}

int FrequencyCompare(const void *vf1, const void *vf2)
{
const frequency_map *f1 = vf1;
const frequency_map *f2 = vf2;

return (f1->count < f2->count) - (f1->count > f2->count);

}

int FrequencyDisplay(frequency_map *pfm, size_t nelems, unsigned long
counter, int suppress_zero)
{
size_t i = 0;
do
{
if(pfm[i].count > 0 || !suppress_zero)
{
printf("Code %3d (", pfm[i].data);
if(isprint(pfm[i].data))
{
printf(" %c", pfm[i].data);
}
else
{
printf("!!");
}
printf(") %6lu (%6.2f%%)\n", pfm[i].count, pfm[i].count * 100.0 /
counter);
}
} while(++i < nelems);

return ferror(stdout);

}

int main(int argc, char **argv)
{
int rc = 1;

if(argc > 1)
{
FILE *fp = fopen(argv[1], "rb");
if(fp != NULL)
{
frequency_map fm[UCHAR_MAX + 1] = {0};
unsigned long counter = 0;
rc = FrequencyAnalyse(fm, sizeof fm / sizeof fm[0], fp, &counter);
fclose(fp);
if(0 == rc)
{
qsort(fm, sizeof fm / sizeof fm[0], sizeof fm[0],
FrequencyCompare);
rc = FrequencyDisplay(fm, sizeof fm / sizeof fm[0], counter, 1);
}
}
}
else
{
fputs("Specify a file to frequency-analyse\n", stderr);
}
return rc;

}

You can use the above program to analyse the text corpus of your choice,
however large (within reason, that is - but it will easily handle far
more text than you might reasonably want to prepare for it). That will
give you a reference frequency table against which to compare your
cryptogram frequency table. That should crack the cryptogram wide open
straight away, especially if it's just a shift cipher.

I've got the shift cyphers. :) Only 26 of those, as opposed to 26!
for the regular cryptograms.

But remember to compare like with like. If your corpus has (unencrypted)
spaces, that's great, but if it doesn't, leave them out of your
reference table. If your cryptogram is ALL UPPER CASE, either force
your corpus text to upper case too, or simply combine lower and upper
case results in the table. And so on.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999

This is great! I'll see what I can do with it.

.

## Relevant Pages

• Re: Working on a cryptogram decryption program, need some advice.
... If your cryptogram retains spaces, it's worth looking at word patterns ... If you found the space-delimited pattern abccbccbddb in your ciphertext, ... This is not so much "letter frequency analysis" as "letter pattern ... int FrequencyCompare ...
(rec.puzzles)
• Re: What will be the next MAJOR programming language for commercial use?
... Variants can be considered as such a common OO pattern that it is worth ... Int of int ... As lists are built-in, they can be decomposed in a pattern just as they are ... features found in other state-of-the-art programming languages. ...
(comp.lang.misc)
• Re: Question
... and i assign it to int which let's say is 32 bit long, ... Type conversions in C preserve value, not bit pattern. ... signed integer type, the destination will have value -1. ... Sign-extension preserves the value if negative numbers are ...
(comp.lang.c)
• Re: Is there a better way of getting the following output: ?
... Roedy pointed out the pattern you should be trying to solve. ... That means you only need to solve for the number of stars and 1/2 ... nested loops but they don't look nested since I put the second loop into ... int y=1; ...
(comp.lang.java.programmer)
• Re: inserting all possible letter combinations into a phrase
... int is intended to be the natural integer unit for the ... You indicate you intend to process user input instead of command line ... static int done(const char *pattern) { ... current "expanded" pattern. ...
(comp.lang.c.moderated)