One big issue with my research was transcribing interviews. I have about 70 hours worth of interviews so far, with still a few more to go. (I have to contact Joan Carruth, Carolyn Clark, and the members of Real Magic someday.)
There have been speech-to-text systems around for quite a few years, but they weren’t set up to handle multiple speakers. I knew I could never afford the cost of having the material professionally transcribed. I didn’t really need the transcriptions right away, since there was so much other organizing, scanning, and research to do, so I let the problem slide. I hoped that the technology for transcribing interviews would emerge.
It looks like my hopes have been realized. Otter Voice Notes is a program for both Android and iOS devices, and can be run in a web browser for desktop systems. It can transcribe an interview “live”, which I have not tried, or work with previously-recorded audio files.
As a transcription service, the results from Otter Voice Notes are far from perfect; on the other hand, my audio recordings are further from perfect. The software does a reasonable job given what I provide as input. Here’s a small excerpt from an interview I did with Oberon Zell-Ravenheart back in 2011:
(The person we’re discussing is Ron Wright, whom I interviewed a few months later.)
There are some issues that are both inevitable and unavoidable. For example, there is no way Otter Voice Notes could have correctly interpreted the name Carl Weschcke or the name of the publication Gnostica. I have yet to see OVN correctly interpret “Gardnerian”, “Druidry”, or “Phaedra”. There’s no mechanism, at least in the web interface, for training OVN to learn new words. And there are the occasional wrong guesses (Oberon said “Let me see now” not “Let me see how you know”.)
What OVN does fairly well is identify the different speakers. In the above excerpt, I did not explicitly tag the sections spoken by me or Oberon. I tagged a few sections earlier in the conversation with our names, and Otter Voice Notes scanned and assigned the tags for the rest of the conversation. Again, it’s not perfect, especially when I’m slurring words or speakers interrupt one another, but it’s far better than tools like Dragon Dictate which offer no form of speaker recognition.
Another feature of the web interface (I assume the phone apps are similar) is that you can play back the audio file and Otter Voice Notes will highlight each word in the text as it’s being said. You can click on any word in the transcription and start playing from that point. That’s particularly handy: you can search text for a particular topic, then listen to that one paragraph to get the words that were too faint or too obscure for OVN to transcribe.
You can edit text via the web interface and merge paragraphs. That seems simple enough, until you see a long stretch of one- or two-word paragraphs. That happens because OVN will insert a new paragraph every time there’s a pause. For those of us who say “Um” a lot (modesty forbids me from mentioning any names) it can make for a lot of paragraphs.
Otter Voice Notes offers 600 minutes of transcription for free so you can see if you like the service. Once I tested it, I didn’t need the rest of those minutes. I immediately subscribed and uploaded 70 hours worth of interviews in MP3 and M4A format. Within a half hour the transcriptions were ready for me to tag the speakers and export the transcribed text.
I’m glad I waited until technology caught up with my needs. Now all I need is a program that will code and tag 10GB worth of scanned files for me…