Reply To: Help with hillclimb

5th December 2024 at 9:09 pm #98784

upsidedown

Participant

> Just a thought, or more perhaps a question… For the more advanced challenges, where white space has been removed, The normal gram (bigram, trigram etc.) frequencies, from plain English text, will be skewed by the ‘false’ letter groups generated from adjacent words. Do you produce your own counts, from text with white space removed?

I would expect poor(er) performance if you are using (bi|tri)grams that are not skewed by the “false” letter groups. Plus, if you never use letter groups from adjacent words, you miss out all words that have fewer characters than your n-gram.

I always match my corpus to whatever I expect the plaintext of the cipher I am attacking to be. So if there are spaces, or uppercase and lowercase letters, in the ciphertext, I will transform my corpus to include (only) those when computing log probabilities. For example:

– In 4B, there is just one letter case and no meaningful spaces, so I use 26 letters for scoring.
– In madness’ interrupted vigenere cipher, there are no spaces in the ciphertext but there are spaces in the plaintext, so I use 26 letters and a space to score plaintexts.
– Sometimes you get uppercase/lowercase letters in the ciphertext which map to the same case in the plaintext, and you can distinguish between the cases when you score the plaintext (good for names & places from a custom corpus). The same applies any other information that’s left in the ciphertext, like apostrophes, commas, full stops, etc.

You can also make the distribution of your corpus closer to the letter distribution of cipher challenge plaintexts by including the past plaintexts in your corpus.