10 April 2013

PTArticlegen.com: Behind the Scenes

by {"login"=>"averagesecurityguy", "email"=>"stephen@averagesecurityguy.info", "display_name"=>"averagesecurityguy", "first_name"=>"", "last_name"=>""}

The other day I put up a site called ptarticlegen.com that creates a random penetration testing article using Markov chains. If you've never heard of a Markov chain, check out the Wikipedia article. Put simply, a Markov chain is generated by making a random choice based on the current state of a system and using that choice to determine the next state of the system. The current state of the system only depends on the previous state and not all the choices leading up to the previous state.

Markov chains can be used to generate sentences by taking a word pair and choosing the next word from a list of words that typically follow that word pair. But first, a set of source data has to be analyzed to find word pairs and create a list of words that typically follow those word pairs.

As an example consider these two sentences:
The fox jumped over the spoon.
The cow jumped over the moon.

The word pairs and the list of following words would look like this:

(The, fox) - [jumped]
(fox, jumped) - [over]
(jumped, over) - [the, the]
(over, the) - [spoon, moon]
(The, cow) - [jumped]
(cow, jumped) -[over]

If we use (The, fox) as our starting word pair we can generate the sentence, "The fox jumped over the moon" by making the following choices:

(The, fox) -> jumped
(fox, jumped) -> over
(jumped, over) -> the
(over, the) -> moon

To create the articles I wrote a Python script to analyze 600 sentences taken from my blog and then generate new sentences based on the analysis. I also used Python and web.py to create the web site. The Markov chain code I wrote is a modification of code from these two excellent resources. You can get the source code for ptarticlegen.com from my Github account.

tags: