Generative AI: Speech-to-text with ‘Whisperfile’

I discussed the ‘llamafile’ project before, the single-file run-it-almost-anywhere way to launch a llama.cpp instance with LLM weights, allowing one to deploy and use an LLM pretty much anywhere. Well, Justine Tunney responded to someone inquiring about whether whisper.cpp, the speech-to-text software using the Whisper speech recognition models, might be able to be put in ‘llamafile’. She dug into it, found that there were some changes to be made in her Cosmopolitan software to help support that that would show up in the next release, and also linked to her own version of the request with a link to download ‘whisperfile.gz’.

On Linux, ‘gunzip whisperfile.gz’ extracts the executable, then do a ‘chmod +x whisperfile’, and it is ready to run. On Windows, one has to

ren whisperfile whisperfile.exe

and then it is ready to run.

whisperfile –help

provides the usage and options information.

whisperfile delivers good quality generic speech-to-text for everyone, well, everyone interested in English language speech-to-text. Stream in audio, feed it an audio file (mono 16ksps WAV format, please), and it will emit a text version of whatever is in there if it is even modestly well-recorded. Well before whisperfile, I’ve been using whisper.cpp to transcribe all sorts of audio I have on hand. whisperfile just makes it that much simpler to get going.

I look forward to the next release of Tunney’s Cosmopolitan and an official ‘whisperfile’ release. Things keep progressing on this front.

Wesley R. Elsberry

Falconer. Interdisciplinary researcher: biology and computer science. Data scientist in real estate and econometrics. Blogger. Speaker. Photographer. Husband. Christian. Activist.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.