Looking for typing accommodations while my wrist is giving me trouble. I love how Google Voice recognition works on Android, you’d think that it would be even easier and better on a desktop PC. But the UI doesn’t quite seem to be there.
Update. I originally wrote this post using Windows 10. Turns out Windows 11 has an entire new thing called Voice Access and it’s quite good, just about what I need. I’ve edited the post to reflect all this, but it’s probably a little confusing.
I’m fortunate that I can use both hands still and mousing gives me no trouble, just trying to take some load off my left hand.
Windows 10
Windows 10 seems to have two different systems for voice recognition. There’s an older speech control system that is mostly focused on UI control like opening the start menu or closing windows. It has some speech to text dictation but the language model is terrible and not useful.
There’s also the newer windows dictation accessible via Win-H. It’s dictation only, not really system control, but reasonably good. However the UI’s awkward. Also the language model is not as good as what I get on my pixel eight. But it is usable.
The biggest problem is activating dictation mode. I have to click in the window I want to type in, then click this weird microphone icon that’s hovering in a fixed width window on the desktop. And even then it does not work reliably. I would dearly love to assign the “take dictation” function to something like a hot key or mouse button four. Haven’t found a way to script that. Quick searches make it look like it’s not possible.
I had thought Microsoft put more work into usability. The full speech control system does work, but it’s pretty awkward and is something you would only use if you had no alternative. And the fact it’s basic dictation speech model is no good makes it useless to me. Maybe it’s better in Windows 11? I think the full control Voice Access system is much better.
Windows 11
Microsoft did put more work into usability! Windows 11 is significantly better. The old speech control system that I didn’t like is basically gone. Windows Dictation has been extended to Windows Voice Access and gives you fairly reasonable control over the whole UI.
For my purposes, the most important thing is that the speech to text in the new Windows voice access is much better. The UI is also better. You can basically leave it running all the time, no need to press a button to activate the microphone every time you need it. It doesn’t do anything until you start talking. Then it will just insert what you say as if you typed it. That means I can leave the thing running all the time. And when I have something more extensive to say, I can talk instead of type. (Although it seems like it picks up audio from the speaker so that’s a little annoying if you are playing a video with someone talking.) The speech model is pretty good, other than the automatic punctuation system which I’ve had to turn off. Totally usable though, it’s closer to Google Board speech detection than not.
There’s a good little voice control demo that I ran through and I’m impressed that I could actually use this to control a PC with my voice if I’m patient enough. It works by you speaking voice commands. It has some generic controls for clicking named buttons or narrowing in on a region of the screen to click on. It also has some specific app integration. For instance, “close tab” seems to be something it understands explicitly to close tabs in a web browser. It doesn’t work in Chrome but it works great in Edge. Maybe Edge has extra code to interface with voice access? One drawback is there is some ambiguity. If I start a sentence with the word start or close it may interpret it as a command and not me trying to type those words. There is a dictation only mode but I haven’t figured out how to turn it on (saying “switch to dictation mode” should do it but does not for me.)
Voice In extension
Now I’m wondering why Chrome on desktop doesn’t have voice typing. It seems like it would be easy to build it into the browser. I’m finding some third party hacks like Voice In.
I gave Voice In a try and it seems to mostly work and be a lot simpler trigger than the windows built-in thing. You can bind it to any windows hotkey but in Chrome that means control or alt + a letter. I really want just a mouse button. I probably can use Autohotkey to make that happen. Being a Chrome extension makes it pretty limited but then also probably easier to integrate.
I gave it more of a try and it’s mixed results. The language model these things recognize to is so important. I was writing an email to a friend about gay stuff and it kept missing very basic words that would be common in a gay context but not in some other. Also it has a habit of capitalizing random words like extreme or City. Mostly it just made me wish I could use Google Voice typing model because I am very good at getting what I want to out of it.
Having the extension be Chrome only is a little irritating particularly when I switch to slack. I guess I could run slack in the browser. The free version works reasonably well but really you’re going to pay $60 a year if you use it all the time. It’s not a big problem not having a keystroke to activate, mostly you just leave it active all the time.
Google could absolutely build a good Windows desktop or Chrome product for voice typing. I suspect the market just isn’t big enough for them