a rising mist
The Goodly Mist
A Workingblog for Rob Sherman
#books   #code   #d&d   #essay   #food   #games   #link   #music   #nature   #news   #notes   #poetry   #projects   #prose   #spoken   #tech   #travel   #videos   #visart

knole Prototypes #2 & #3 Functioning Speech Recognition & Text Input

April 4, 2016

I completed two prototypes this week. Compared to previous efforts they were trivial to put together, but they represent a developmental milestone. For the first time my godlets can receive, interpret and process human language, albeit to a primitive degree. As the plural suggests, however, my attempts have bifurcated my project into two separate branches; in parts because of my lackings as a programmer, the restrictions of the engines that I am using, and the narrative metaphor right at the centre of this project.

Each of the two prototypes receives language input from the user in a different way; the first uses voice recognition, provided as a boxed feature by the IDE Construct 2. I have had prior misgivings about Construct, primarily around the paradigms that its ‘no-coding-required’ ontology imposes upon my work, but my reasons for using it have always centred around the ease of eventually implementing voice recognition. As I’ve now found, it can be deployed in minutes, with a brief clicking-through of sub-menus, and because Construct is an HTML5 wrapper it leeches Google’s API for fast, accurate results. Since popping it into my project I’ve received a lot of rumbling, cooed admiration for the results that I do not really deserve; as well as having little understanding of the processing involved, I used Aaron Clifford’s excellent demo-project Speech Commander as a blueprint, or rather as a source of plagiarism. My ignorance of this technology versus my desire to include it is a dynamic that does worry me; I feel that I should understand its functionality before making it so central to the godlet’s functioning.

This first prototype displays the speech that has been recognised by the microphone in a string beneath the godlet’s chin, so that I can make sure that the feature is working properly. On my first pass, the results were very slow to appear, sometimes taking as much as five seconds to materialise. This is obviously not ideal for a simulation which is meant to represent the pricked ears and instant reactions of a skittish animal. Switching the vocal monitoring to Interim Results rather than Final traded speed for accuracy, and all at once my godlet seemed to become more attentive; even though there is not actually any bodily reaction yet, just the ability to have my meaning transferred into the program fills me with a sort of sympathy for the beast. All the time, not as a designer but as a user or an observer, I am straining to provide this agent with intelligence. I want it to understand me, to be alive, and I will perform severe mental contortions to make this happen. This is something that I realise lies beneath the mien of even the most cynical technologist, game player or academic; whatever their surface desires (to undermine a simulation, to point out flaws, to see the join), they cannot help but naively, basically seek intelligence, life, agency and logic within a system like this, using any hint provided to aid their imagination in the task. They may refute as imperfect, but they cannot ignore it entirely.

I am not even sure if the loss of accuracy in this approach is a problem, either; how good a listener does my godlet have to be? You only need to read a few books of mythology to know that gods are notoriously bad listeners anyway. And most organisms, even without a human grasp of language, can still react to outside stimuli in elemental and unequivocal ways; loud sounds are alarming, soft sounds comforting. Perhaps such reactions would provide meaning enough, without knowing whether I just said ‘queen’ or ‘cream’.

The second prototype is an attempt to find a way to include language input using Gamemaker: Studio (for which I finally have a professional license, courtesy of my university). Gamemaker has been a far more robust engine for my prototypes, aside from voice recognition which it does not natively support. According to the developers it might be possible to implement as an extension, and I am certainly interested in the interfacing possibilities with Windows 10’s Cortana software. Indeed, this is where my formal training in voice recognition might arise, once I find the time to experiment with it. The Gamemaker community seems oddly hostile to the idea of developing this feature for the platform; the various topics posted on the official forums asking about it are answered with exhortations to ‘leave Gamemaker to what it is good for’; by which is meant, I think, 2D shmups and platformers. I disagree, of course. I am using Gamemaker to create a chatbot, a tabernacle, an AI, a simulator, a research discourse; far divorced from the commercial videogames which made the software well-known. I have submitted to the metaphors that it engenders (‘rooms’, ‘objects’, ‘steps’, its peculiar way of handling rotation) in order to use it for my own purposes, and For now this means that my prototype Gamemaker godlet, with its BOD architecture and growing complexity, receives the typed input of a user rather than the oral. In the pursuit of textual aesthetics I decided to limit these inputs to eight characters or fewer. Round the back the godlet has a new piece of anatomy, mListener, which receives the user’s typed phrase as a string variable, before alerting the POSH plan of the fact that there is a new input to consider; the variable is then passed to a currently-zygotic bListening behaviour module, and it is here that the actual processing of each phrase will take place.

In both prototypes, I decided that this new functionality should assume a non-diegetic, biological form. For now a red circle hangs pendulous between the godlet’s eyes, like a Hindu bindi or the ‘muddy pellet’ of Taoism. It swells and shrinks in time with the godlet’s breath like a wen or an artery. It is jumpy at the touch of a mouse or a finger, but looks so sore that it cannot help but invite a prod. It also serves practical diegetic functions; in Construct, it indicates (more clearly than Google Chrome’s in-built notification) whether the microphone is receiving or not. In Gamemaker, it engorges to accept the player’s portentous eight characters, before subsuming and digesting the word once the Enter key is pressed. The boundaries between internal and external representations of the godlet’s state as a formal system are of continuing interest to me; how much of the system is made apparent to the user? What organic indicators are there of internal state? How much of a ‘beast’, with a beast’s attendant tone of coat, flush of skin and meaty appurtenances, will the godlet be?

It is obvious that the split between the two prototypes will become more and more annoying as time goes on. Some features will rely on vocal recognition, and others will not. I will need to maintain two different, partial versions of the godlet, and even then will eventually have to reconcile the limitations of my approaches in some way. Despite Gamemaker’s shortcomings in its lack of voice recognition support, I am so keen on its advantages that I am considering serious alterations to the project in order to accommodate it. Suppose that I step away from voice recognition entirely so that I can keep using Gamemaker and only Gamemaker; what does this do to my narrative, to the nature of my godlet? It certainly becomes more removed from the organic, from the zoomorphic; can it be a creature if it has no ability to hear? The fewer sensors it has, the less alive it might seem. Then again, is this being a creature at all, or just a representation of elemental forces and social functions, as all gods truly are? Perhaps a god does not need to listen like a thing with cochlea and nerves; perhaps that is too prosaic, too everyday, for a spiritual avatar.

Instead, perhaps my compromising approach in Gamemaker of deliberate, almost-ritualistic input of portentous typed phrases comes closer to humanity’s interaction with the divine. Throughout our religious traditions we can see that human relationships with otherworldly beings cannot be conducted naturally or in real time, as they might be with our family and friends. Instead, they are meditated by narrative processes that require delay, brooding, stricture, ceremony and delimitation. I have mentioned the Japanese animist tradition of Shinto elsewhere; despite being supposedly soaked in the deities of the natural world, all around them, its practitioners cannot merely call out for luck or riches or happiness, as if they were asking for another cup of tea or a bag at the supermarket. An interaction with a kami, a Shinto spirit, must be deliberately non-trivial. Rules must be followed, so as not to make the spirits feel insulted, as well as to support and legitimise the vast furniture of Japan’s state religion, to provide bonding between worshippers, and to enforce a way of living. In Shinto, one of the manners in which this is achieved is through ema; small wooden plaques on which wishes are carved. They are governed by their own devotional grammar and aesthetics, and restricted in repository to specific shrines and holy places. The dialogue between divine and mundane is heavily mediated; it is not merely a chat over a fence.

My approach in Gamemaker is similarly mediated, initially as a result of limitations in the technology; perhaps, however, there is a narrative expediency to it. This allies with my changing thoughts around the godlet’s accompanying analogue text, the other half of the knole project. I had at first conceived it as a lyrical poem, though very little thought went into this decision. It was an arbitrary choice, one merely designed to provide me with a linear, literary counterpoint to the computed, non-linear godlet, and to talk critically about the differences between ‘physical’ and ‘digital’. To be honest, I have been frustrated and uninspired by it over the last few months, and have hated being wedded to its approach. Nothing I write has any bite nor body, and seems so tangential to the main representation of the godlet as simulation that I cannot make it stick. Instead, I have been thinking more about better aligning this text with the narrative metaphors of the godlet’s fiction, and using it in a way that honours the unique qualities of linear texts as opposed to multifarious ones. In my diversions into religion, ritual and tradition, I have come to think that perhaps the task to which a linear text is most suited is that of sacrament, of religious ceremony. A set of unimpeachable instructions, of chronological imperative, of holy writ, a code by which to act; a program to execute in the name of the numinous.

So now I am thinking that a better approach for this text might be to create some sort of holy scripture for the godlet; a mystic manual for whatever religion this being engenders. Rather than one long narrative, it would instead provide discrete, poetic instructions for interacting with the godlet in different situations; proscribing the ceremonial architecture between ‘worshipper’ and ‘worshipped’. Such a text, with its abracadabras, thou-shalt-nots and in-the-beginnings (though all longer than eight characters), will provide a more narratively contingent manner of exploring the godlet, of experimentation within its defined set of rules; the essences of both interactivity and religion. A tie will then be created between more traditional modes of narrative interactivity (the instructional religious text) and modern modes (the videogame or simulation). In conducting deliberate, playful, investigatory liturgy with the godlet, in making conscious, weighty prayers instead of organic conversation, and in seeing the godlet’s heirophantic apparatus swell and wither in response; that might be the trinity of mechanics, metaphor and relationship that I have been looking for.

Or perhaps I’m just huffy and lazy at doubling my workload.