I’ve been going down a rabbit hole trying to find a good way to do transcription on Windows so I could do voice-to-text on Windows using local AI processing.
Local processing is cool since it doesn’t destroy our environment ….. as much XD.
After a bit of searching I found this program called Vibe and I like it. It’s not perfect but it’s pretty damn good. He provides great documentation and direct installers for a bunch of models. It provides a medium model on startup, which is great, but you can use whatever model you want.
I’m using the Distil Whisper model which is highly accurate, yet runs very well and is highly optimized for lower-end hardware. Just download the model and go to settings- open model folder and copy it there.
It’s currently running off my GPU on my laptop and seems to do a fantastic job without completely freezing the computer or causing too much overhead. Albeit the GPU is working pretty hard.
The one thing I was hoping for, which I already made a pull request from the developers, is that it could automatically copy into the clipboard.
I’m also looking at ways to integrate it into PowerToys’ new launcher.
I am harassing Gemini online to see if it can make it for me. Though it seems like a really cool project if anyone wants to pick it up.
This whole transcription took less than a minute on my Intel 1235u and Iris Xe graphics and it was a minute and a half of actual audio. That is pretty damn impressive on a integrated GPU and my Laptops mics.
Actual Direct transcription and audio – 0 Editing
” I’ve been going down a rabbit hole trying to find a good way to do transcription on Windows so I could do voice detects on Windows using local AI.
So that it’s processed on my PC as opposed to burning tons of gallons of water at a data center or and cost or and costing me money.
I finally ended up after trying a couple of different programs. I don’t even remember what I was trying, using this Vibe program.
I’m using the Distilled Whisper model which is highly accurate yet runs very well and is highly optimized for lower end hardware.
It’s currently running off my GPU on my laptop and seems to do a fantastic job without freezing the computer completely or causing too much overhead.
Albeit the GPU is working pretty hard. The one thing I was hoping, which I already made a pull request from the developers, is that it could automatically copy into the clipboard.
I’m also looking at ways to integrate it into PowerToys’ launcher which their new one is called Command Palette.
I am harassing Gemini online to see if it can make it for me. Though it seems like a really cool project if anyone wants to pick it up. “
Overall I am quite pleased.
PS: If you are looking to do this on android then use FUTO Voice Input, yes its better than Gboad!
This is day 13 of #100DaysToOffload