Audio Transcription on Linux Don't put anything here Don't put anything here Page Content Page Name (for URL) Page Title Breadcrumb Text A large part of my work involves working with recorded interviews: transcribing them, verifying transcriptions done by others, analysing passages, looking for quotes to illustrate specific points. I am doing this on Linux and have tried a few different combinations of tools in the past, for a while settling on emacs or OpenOffice in combination with curses version of mplayer, switching between the two windows with Alt-Tab. This worked, but not so well. So, having recently upgraded to Ubuntu 7.10 (Gutsy), I decided to look for what new options are available. It turned out that Gutsy has pretty much all the pieces for a good transcription setup. The rest of this document tells how to do the following on Gutsy: 1. Control audio (play/pause and skip +/- 5 or 30 seconds) with the keyboard from any application, i.e., without having to switch between windows. (I mapped F1 through F4 for those functions.) 2. Insert time stamps into any document. (In my setup I now need to press F6 to put the timestamp into the clipboard and then Ctrl-V to paste the timestamp.) 3. Play audio at a timestamp. (In my setup, I just need to select a piece of text like "[00:43:10]" in any document and press F6 - the audio will then jump to 43 minutes 10 seconds into the current audio file.) ## Controlling Audio First of all we need a way to play audio. We'll use XMMS2, which has a client-server architecture. There is a daemon ("`xmms2d`") that actually plays the audio and maintains a playlist and there are several front-ends that you can choose from. The seemingly standard one is GXMMS2, but I didn't like it too much. The playlist interface is a bit too complicated, and the minimized version is too tall. Instead, I settled on *Esperanza.* What I liked most about Esperanza is that it minimizes to a very slim bar, just like a toolbar. I adjusted the size and position of my OpenOffice window so that it fits right under the Esperanza bar. This way I can see where my audio is at (and which file was loaded) while typing into OpenOffice: <center><img src="/etc/lintrans/esperanza-minimized.png"/></center> ("Rafael" is a pseudonym.) Esperanza and XMMS2 can be installed with apt-get: sudo apt-get install xmms2 esperanza Even though the Esperanza bar is visible while OpenOffice is active, it doesn't have *focus* and thus will not catch keyboard events. (And if it did, it wouldn't help, since it doesn't have a keyboard shortcut for going back or skipping by 5 seconds - a key feature when doing transcription.) However, we don't need to use it for controlling the audio - just for selecting a file and for showing where we are at. Instead, we can control the XMMS2 daemon using "xmms2" - a command-line client. For instance, typing "xmms2 pause" at the command line would pause the audio and "xmms2 seek -5" will take us 5 seconds back. But you won't actually have to type out those commands every time. *Gnome 2 lets us bind those commands to keys, and those bindings will be active in any application.* Open `gconf-editor `(press Alt-F2 and type "gconf-editor), and navigate to `apps/metacity/global_keybindings`. <center><img src="/etc/lintrans/global_keybindings.png"/></center> We'll pick keys for "runcommand1" through "runcommand7", typing "F1," "<Shift>F1," etc. After that, navigate to apps/metacity/keybinding_commands and set the commands that correspond to "runcommand1" through "runcommand7". I.e., if we set "runcommand1" to "F1" in `global_keybindings` and set `command1` to "xmms2 seek -5" in `keybinding_commands`, pressing F1 will have the same effect as calling `xmms2 seek -5` at the command line - it will move the audio back 5 seconds. In my case I configure them as follows: <table style="border: 1px solid gray" width="500px"> <tr style="border-bottom: 1px solid gray; background-color: #eeeeee"> <td> runcommand# </td> <td> key </td> <td> action </td> <td> command </td> </tr> <tr> <td> runcommand1 </td> <td> F1 </td> <td> go 5 seconds back </td> <td> xmms2 seek -5 </td> </tr> <tr> <td> runcommand2 </td> <td> F2 </td> <td> pause </td> <td> xmms2 pause </td> </tr> <tr> <td> runcommand3 </td> <td> F3 </td> <td> play </td> <td> xmms2 play </td> </tr> <tr> <td> runcommand4 </td> <td> F4 </td> <td> forward 5 seconds </td> <td> xmms2 seek +5 </td> </tr> <tr> <td> runcommand5 </td> <td> <Shift>F1 </td> <td> go 30 seconds back </td> <td> xmms2 seek -30 </td> </tr> <tr> <td> runcommand6 </td> <td> <Shift>F4 </td> <td> forward 30 seconds </td> <td> xmms2 seek +30 </td> </tr> </table> I skipped F5 since I use it often in OpenOffice - to bring up the Navigator. F1 - F4 and F6, on the other hand, didn't serve any function that I could remember. Of course, if you use those for something else, you should pick different keys. After making this configuration, I can control audio playback from any application, including OpenOffice. ## Getting the timestamp and playing audio at the time stamp. Another thing that comes in handy in transcription is being able to insert into the document the current audio position and to be able to play the audio at some recorded position. Luckily, one can control xmms2 from python, using python-xmmsclient ("sudo apt-get install python-xmmsclient"). To get the timestamps in and out of documents (OpenOffice or other), we can use Python bindings for GTK ("sudo apt-get install pygtk gtk") to get text in and of the clipboard. The following script will check if the current selection contains something that looks like a timestamp, and if so will advance audio to that position. (It can handle timestamps that look like "hh:mm:ss" or "mm:ss", with or without brackets around them.) Otherwise, it would capture the current position and save it in the clipboard, so that it could then be pasted into any document. # get the clipboard import pygtk pygtk.require('2.0') import gtk # get XMMS import xmmsclient xmms = xmmsclient.XMMS() xmms.connect() class Controller : def __init__ (self) : self.xmms = xmmsclient.XMMS() self.xmms.connect() self.clipboard = gtk.clipboard_get() self.selection_clipboard = gtk.clipboard_get(selection="PRIMARY") self.selection = self.selection_clipboard.wait_for_text() if self.selection : self.selection = self.selection.strip().replace("[", "").replace("]","") def get_time_from_xmms(self) : w = self.xmms.playback_playtime() w.wait() t = w.value() / 1000 return "[%02d:%02d:%02d]" % (t/3600, (t % 3600) / 60, t % 60) def seek_to_timestamp(self) : parts = self.selection.split(":") h, m, s = 0, 0, 0 if len(parts) == 2 : m, s = parts s = parts[1] elif len(parts) == 3 : h, m, s = parts h, m, s = int(h), int(m), int(s) ms = (h*3600 + m*60 + s) * 1000 w = self.xmms.playback_start() w.wait() w = self.xmms.playback_seek_ms(ms) w.wait() def push_time_to_clipboard(self) : time = self.get_time_from_xmms() self.clipboard.set_text(time) self.clipboard.store() def dispatch(self) : if self.selection and self.selection[0] in "[0123456789" : self.seek_to_timestamp() else : self.push_time_to_clipboard() Controller().dispatch() Save this script somewhere (e.g., in "~/clipboard2xmms.py"). Now we just need to bind this script to a command, in my case I use F6: <center><img src="/etc/lintrans/global_keybindings-F6.png"/></center> <center><img src="/etc/lintrans/keybinding_commands-F6.png"/></center> After that, pressing F6 either advances audio to the timestamp corresponding to the current selection (in *any* text, e.g. you should be able to select *this timestamp* - [00:10:15] - to advance start playing the audio at 10 minutes 15 seconds) or puts the current audio position into the clipboard, from where you can paste it into any document. ## Remaining issues Ideally, I would also want to have a different audio file associated with each OpenOffice document, and to be able to automatically load the right one. So far I haven't figured out how to do that, partly because I can't get Python xmms client to change audio files. For the time being, Esperanza provides a reasonable interface for selecting audio files. I just load all of my interviews into a playlist, which makes it relatively easy to switch between them. (Note that the playlist is preserved even between reboots of the system and I don't use XMMS2 for playing actual music.) Advanced Fields Category 2002200320042005200620072008200920102011201220132014E. AsiaE. EuropeL's FamilyL's FriendsN. AmericaN. EuropeS. AmericaS. AsiaW. EuropeY & LY's FamilyY's Friends Prototype Redirect Permissions0 Actions Config Markup Module HTML/Meta/Keywords HTML/Meta/Description Save Hook HTML Fields Main Head Body Header Menu Logo Content Template Page Sidebar Footer Tags Allowed for XSSFilter HTTP Fields Cache-Control Expires Guru Fields Templates Translations Fields Edit UI Admin Edit UI A summary of your changes Edit Summary Don't put anything here Don't put anything here Don't put anything here Don't put anything here save preview cancel