VX-Mode

Last updated: September 2005

VX Mode allows you to use Dragon NaturallySpeaking with XEmacs. It is a rewrite of Barry Jaspan's VR-mode. I started with small modifications to VR-mode, the goal being to make it work with XEmacs, but I ended up rewriting most of it.

The main goal of this project was to provide continuous dictation support for natural text in XEmacs. Command and programming support were considered secondary. I wanted to have a tool that would enable me to dictate my e-mail in Gnus, as well as write long documents.

Another important goal was to be able to dictate without using Windows. Obviously, Dragon NaturallySpeaking only runs under Windows, so one has to use it in some way -- but I wanted to be able to run my Windows installation in a VMware for Linux emulator or on a separate machine, with my Emacs communicating with this machine over the network. I'm trying to achieve as much as possible without actually interacting with the Windows desktop, which might not even be visible in normal circumstances.

Please don't report bugs in this code to Barry. The code has really been extensively changed and any problems with this package are my fault only. Please report problems directly to me.

Screenshots

No web page about a piece of software would be complete without screenshots!

Prerequisites

  • Dragon NaturallySpeaking, any recent version should work. I have tested with DNS 7.3 Preferred, DNS 8.0 Preferred and DNS 8.1 Preferred.
  • XEmacs, either the latest stable version, or the development version. I have tested with 21.4.15 and 21.5.17. Please note that this software is XEmacs-specific and at present does not work with GNU Emacs. This might change in the future, if someone ports it to GNU Emacs and submits clean changes.
  • Temporary: if you only use Windows and run XEmacs under Windows, too, then XEmacs must not run on the same machine that VX.EXE is being run on. Or, more precisely, the VX.exe window must have focus while dictating. This problem will go away in the future. It is a result of DNS providing its own dictation "support", which we can't currently turn off. I intend to fix this.

An example set up that I personally use is Microsoft Windows XP running in a VMware window on a Linux host. My XEmacs runs under Linux and connects to the VX.exe process via a network connection (it all happens on the same physical machine, of course). This setup does require a fair bit of physical memory, but seems to work very well in practice. I'm using it on my Pentium III laptop (866MHz CPU, 368MB of RAM) with 164MB of RAM dedicated to the VMware virtual machine running Dragon NaturallySpeaking.

Download

The most recent version is 0.08: vx-mode-0.08.tar.bz2

For those who would prefer not to build a VX.exe executable, here is a prebuilt version: vx.exe

Main features

Works over the network -- your XEmacs does not have to run under Windows, nor does it have to run on the same machine that Dragon NaturallySpeaking runs on.

Supports Select-and-Say and an Emacs-based corrections menu.

New features

Partial resynchronization: after the buffer has been synchronized for the first time, we keep track of what has been changed. We do not actually inform DNS of every change, instead when the next recognition starts, we synchronize the affected region. That seems to be much more effective than resynchronizing the entire buffer. It also seems to make more sense than reporting each change character by character.

Corrections menu: after something has been dictated into a buffer, you can now use the third mouse button to get a pop up corrections menu. You can either mark a region first (to get a corrections list for that region), or simply click on any word (to get a list of choices for this particular word). 80 lines of code to support all this -- why do people still want to program in C/C++?!

Known problems and limitations

  • no command support (coming soon!)
  • isearch doesn't work
  • we probably don't deal with multiple frames very gracefully
  • we probably get confused when the same buffer is visible in two windows
  • If you turn on overwrite mode, VX Mode gets confused.
  • only supports XEmacs for now. Support for Emacs is definitely possible and shouldn't actually be too difficult, but it wasn't very high on my list of priorities. If somebody volunteers to do it, I will be very glad to incorporate it.
  • (FIXED) abbrev-mode doesn't work
  • (FIXED) read-only buffers confuse vx-mode

Since there is no locking, strange things might happen if one tries to do things simultaneously on Emacs side and on DNS side. An example of this would be typing while dictating, or picking one of the corrections while dictating. A simple locking scheme should probably be implemented to prevent these kinds of problems.

How to use

Brief notes on how to use this software:

  • Drop the Windows executable into your Windows installation, and run it. I run it with the "-port 7000" parameter, so that it listens on a known port.
  • Load vx.el in your XEmacs. You might want to optionally byte-compile it, but it isn't required.
  • Set an appropriate host/port address for the machine that runs VX.exe:
      (setq vx-host "192.168.1.17") ;; or 127.0.0.1 for localhost
      (setq vx-port 7000)
    
  • Start VX Mode by doing M-x vx-mode.
  • Make sure your microphone is activated and start dictating. Dictated text should start appearing in XEmacs.
  • If you want to make corrections, mark the text that you want to be corrected and click the right mouse button. You can also click the right mouse button on any word, without marking it -- this will display the corrections menu for this particular word. And remember to turn off the option "Select brings up corrections menu" in DNS configuration.
  • Please make sure that you turn off the "Show VX Mode protocol trace" checkbox, unless you are actively debugging VX Mode. This option can slow VX Mode down a lot -- Windows edit controls are extremely slow and CPU hungry.

Have fun!

Developers

If you want to build the VR.exe executable, you will need two additional files: speech.h from Microsoft's speech SDK and dnssdk.h from the Dragon SDK. Both SDKs are available as free downloads.