Oh he is so quick
On his feet. He is reading
This term, I’m teaching an introductory computer science course for students in Waterloo’s Global Business and Digital Arts program. We’re using Processing, a fun environment for learning programming, and for simple programming tasks related to visual art and design. Early in the term I taught a module on input and output, including functions for reading text files. In that module’s assignment, students had to write a simple haiku generator. The sketch loads text files containing lines of five and seven syllables, and picks random lines to compose haiku.
To prepare the text files for the students, I wrote a Python script that reads in a corpus of text and finds runs of words with five or seven syllables. For such purposes, I highly recommend the CMU Pronouncing Dictionary; note that in a word’s pronunciation, the number of digits in the CMU dictionary’s code is precisely the number of syllables in the word! I then offered students a choice of lines from three sources: Shakespeare, Arthur Conan Doyle, and Donald Trump.
The Trump haiku generator was fun to play with, so I wrote a slightly better script to extract fragments from his speeches. For example, it tries to delimit lines where there’s punctuation in the original text, and it understands where quoted text starts and ends. The result was satisfyingly incoherent.
Having broken those eggs, I had no choice but to go ahead and make the omelette. The next logical step was to build a simple Twitter bot that took a collection of automatically generated Trump haiku and broadcast them to the world around once an hour. The result is the Twitter account @trump575, from which a sample appears below (yes, @trumphaiku was already taken). I also converted a few popular names and words into hashtags, in a desperate bid for attention.
The bot has a backlog of a few thousand haiku to get through, so like Trump himself, it isn’t going away particularly soon. When I run out I may build a smarter bot. The obvious next step is to use a Markov chain to generate text, with the twist that there must be word breaks at the appropriate numbers of syllables. Someone also suggested using Sarah Palin speeches as a corpus; interesting idea, though perhaps just printing words at random would suffice for that.