Said Horror To Hearer
Now that I am starting to grope into my PhD in great fistfuls, and work out exactly how I might start to build my digital godlet, I think it’s time to focus on specifics. At the moment I don’t have many of those; my notebooks are filled with misty calls to ’empathy’ between the human agent1 and the creature, as well as long, hopeful lists of potential interactions that will, I am sure, prove very difficult to code into being. However, one specific has stood since the beginning of this project, and persists as the mechanic upon which everything else must rely; the ability of the human agent to read the poem, which tells the creature’s story, aloud to the digital creature itself.
Fundamentally, I am talking about natural language processing; the now-banal ability of computers to take the waveform input of a human voice and understand that input as words and sentences, in a computerish sort of way; this is not of course the same as ‘understanding’ in a human sense, but rather involves transcribing those words into parseable text, or ‘strings’, which the computer can use for something else. Most people’s everyday interaction with this technology is limited at best; despite Google’s best advertising efforts, not many of us walk down the street and boss our phone out loud, as if we had a small staff of lice understairing in our fingercreases. If we do use NLP at all, it is in clipped, unbroken phrases; to dictate a text message when we are using our other hand to balance a ziggurat of biscuits, or to ask our phone something provocative in front of all our friends who have not heard its reply. I haven’t yet found much evidence of computers being read anything beyond functional, spare instructions.
Therefore last night, as I am now a Computer Scientist and I do things with a shining, experimental rigour, I lay in bed with some Medjool dates and a cup of oily tea2 and read my phone some poetry.
I decided on W.H. Auden because of his personal significance to my work, his varied and testing use of meter and rhythm, and because his was the first volume that fell when I flailed my morningstar of a hand across the shelf. I opened the Simplenote app on my phone, turned on the microphone, and started reading. I did not try to choose poems that were simple or regular or full of small words, nor did I read them in a way other than I would read them to a human audience. I just pressed the icon and starting speaking.
Below are the results. I think that you can tell which are the originals and which are the transcriptions, but I don’t mean to be dismissive in that. This was a crass experiment, and the phone did far better than I anticipated.
The Ogre does what ogres can,
Deeds quite impossible for Man,
But one prize is beyond his reach,
The Ogre cannot master Speech:
About a subjugated plain,
Among its desperate and slain,
The Ogre stalks with hands on hips,
While drivel gushes from his lips.
August 1968 the ogre does what oh geez can deeds quite Impossible 4 man but one price is beyond his reach the ogre cannot master speech about a subjugated playing among its desperate and slain the Argus talks with hands on hips while drivel gushes from his lips
Who could possibly approve of Metternich
and his Thought Police? Yet in a liberal
milieu would Adalbert Stifter have written
his noble idylls?
Vice-versa, what God-fearing Magistrate
would dream of shaking hands with a financial
crook and Anti-Semite? Yet Richard Wagner
Wild horses could not drag me to debates on
Art and Society; critics with credos,
Christian or Marxist, should keep their trap shut,
lest they spout nonsense.
Pseudo questions you could possibly approve of matinee and his thought Police yes in a liberal milia with apple box sister have written his Noble levels vice versa what god fearing magistrate would dream of shaking hands with a financial crook and anti-semite Richard Wagner wrote masterpieces wild horses couldn't drag me to debates on Art and Society critics with Kratos Christian or Marxist should keep their trap shut less they spelt nonsense
No, Plato, No
I can’t imagine anything
that I would less like to be
than a disincarnate Spirit,
unable to chew or sip
or make contact with surfaces
or breathe the scents of summer
or comprehend speech and music
or gaze at what lies beyond.
No, God has placed me exactly
where I’d have chosen to be:
the sub-lunar world is such fun,
where Man is male or female
and gives Proper Names to all things.
I can, however, conceive
that the organs Nature gave Me,
my ductless glands, for instance,
slaving twenty-four hours a day
with no show of resentment
to gratify Me, their Master,
and keep Me in decent shape
(not that I give them their orders,
I wouldn’t know what to yell),
dream of another existence
than that they have known so far:
yes, it well could be that my Flesh
is praying for ‘Him’ to die,
so setting Her free to become
No Plato no I can't imagine anything but I would less like to be in a disc incarnates spirit unable to chew or sip or make contact with surfaces Aubrey the sense of summer or comprehensive Picchu musical gaze at what lies beyond no God has place me exactly where I have chosen to be the sub Luna world is such fun where man is male or female and gives proper names to all things I can however can see that the organs nature gave me my ductless glands for instance sleeping 24 hours a day with no show of resentment to gratify me the master and keep me in decent shape not that I give them their orders I wouldn't know what do you dream of another existence and that they have known so far yes it well could be that My Flesh is praying for him to die so sitting her free to become a responsible master
‘Here war is simple like a monument’
Here war is simple like a monument:
A telephone is speaking to a man;
Flags on a map assert that troops were sent;
A boy brings milk in bowls. There is a plan
For living men in terror of their lives,
Who thirst at nine who were to thirst at noon,
And can be lost and are, and miss their wives,
And, unlike an idea, can die too soon.
But ideas can be true although men die,
And we can watch a thousand faces
Made active by one lie:
And maps can really point to places
Where life is evil now:
Here war is simple like a monument here war is simple like a monument a telephone is speaking to a man flags on a map of a cert that treats for sent a boy brings milk in bowls there is a plan for living men in terror of their lives who first at 9 who were the first at noon and can be lost an arm and miss their wives and I'm like an idea and I too soon but ideas can be true although men died and we can watch a Thousand Faces made active by 1LY and maps can really point to places where life is evil now Nanking Dachau.
In my view these are not Auden’s best poems, but the phone is not concerned with that and in this instance neither am I; what we are concerned with is comprehension in terms of computation. In short:
- Does the computer transcribe the poem accurately from the diction3?
- Does the computer produce something, in that transcription, that can then be used in further computation as defined by me, the designer?
The answer to both of these questions is broadly ‘yes’, which is important for my ongoing work with knole. I am not a native programmer, and my PhD is not concerned with the technical possibilities of NLP. Instead, I wish to find the easiest path to using it in my work, in order to do interesting things subsequently, artistically. For my purposes, this means technology that works with little back-end input from myself; it does not interest me, other than in the way that all things are broadly interesting, how the words change from sound input to process-ready strings. What matters to me is what those strings look like when they do reach the computer. If there is too much discrepancy between the intended input by the speaker and the received input by the computer (in my case, the creature), authorial intent and narrative is made gobbledygook. I would still have a program that hears, that is interactive with one’s voice, but there would be no defined understanding of the voice, even if that understanding were defined by myself the designer.
These quick, initial tests are promising, however; it doesn’t seem that I will need to modify existing technologies very much to get what I want. I do have some ongoing thoughts:
- This technology, as we can see, is not designed for long stretches of input; I was lucky that the microphone did not automatically turn off after a few seconds, as I have known it do before. Ideally my creature would ‘listen’ more of less constantly, processing language or other aural input as it occurred; however, I have not done any research into how this might affect the performance or battery life of whichever device my creature ends up occupying. As it is not designed for lengthy input, so we see it struggling with punctuation, which tends to be absent from the shorter command-language of the digital assistant, in which one must must manually say ‘comma’ to have the corresponding glyph appear in the text. Whether this will be a problem I am not entirely sure, as often in oral poetry punctuation is a signal to the reader, rather than the listener; a comma tells the reader to pause, and so the listener hears a pause. The same applies to line breaks; if a poem’s audience ‘hears’ line breaks, it is in the vocal interpretation of the reader. In fact, it might be interesting to see if an audience could accurately delineate these breaks on the page just by listening. In any case, my creature does not need to display an accurate textual output of the poem, only respond to elements within its reading.
- However, this does then bring up the question of prosody; that is, the performance of a poem encompassing tone, rhythm and voice, or more accurately the meaning inherent in a poem outside the bald vocabulary and syntax. As we can see above, this is not something that could easily be encoded into NLP. Perhaps something could be jury-rigged involving differing responses based on the time between units of speech, or the volume of that speech serving as a crude barometer of emotion. I suppose I will find out when I start trying to build this little bleeder.
- There are some interesting errors on single words; for example, in August 1968 there are three separate interpretations of the spoken word ‘ogre’ in one, very short poem. This is an ongoing problem of NLP, and how much it affects my project will depend on which units of speech I rely on for my creature to compute. Will the creature be listening for ‘keywords’ to react to, or rather ‘keyphrases’? Will it react to several words in juxtaposition, proximity or comparison? If I rely on single nouns or verbs, the errors could become a problem, unless I build in a simplistic ‘spellcheck’ in which common errors are automatically normalised to the intended word. This will be more simple for me to do than most other users of NLP, as I will be curating a fairly small possibility space of speech for the creature to react to.
- This leads onto the question of homonyms, an important stylistic device in the poetry of many of my favourites, including Dylan Thomas, Auden and Ted Hughes. How can the creature distinguish between ‘hair’ and ‘hare’ if it has no way to ‘see’ the written version of the word, only the reader’s reading of it? I suppose context will be important to understanding what is meant.
- Obviously, the speech recognition struggles with some of Auden’s proper references, the pretentious old river delta.
- The speed at which the poem processed on-screen whilst I read it was fairly impressive; perhaps only lagging one second or two behind my voice. However, this is a fairly unacceptable delay when one is trying to represent a living creature listening, organically, in real time. I am not sure how this delay could be improved; again, perhaps I will be saved by my teensy possibility space.
Of course, I must not worry about errors too much, or treat them merely as ridiculous failures of my authorial intent; as I have found in my own work, the mishearing of words can open entirely new worlds within a text, and allow a degree of rebellious, fruitful interaction with the intended meaning. Such is the power of interpretation, in analogue works, and perhaps there could be an interesting correlation with interaction in digital works, as well.
If the creature mishears a reader, or the reader deliberately misreads a word in that delightful, bloody-minded way that people like to interact with computers, how can a designer create an environment in which that mistake has value, both in the creature’s response and the reader’s?