Generative AI: Speech-to-text with ‘Whisperfile’
I discussed the ‘llamafile’ project before, the single-file run-it-almost-anywhere way to launch a llama.cpp instance with LLM weights, allowing one to deploy and use an LLM pretty much anywhere. Well, Justine Tunney responded to someone inquiring about whether whisper.cpp, the speech-to-text software using the Whisper speech recognition models, might be able to be put in ‘llamafile’. She dug into it, found that there were some changes to be made in her Cosmopolitan software to help support that that would show up in the next release, and also linked to her own version of the request with a link to download ‘whisperfile.gz’.
On Linux, ‘gunzip whisperfile.gz’ extracts the executable, then do a ‘chmod +x whisperfile’, and it is ready to run. On Windows, one has to
ren whisperfile whisperfile.exe
and then it is ready to run.
whisperfile –help
provides the usage and options information.
whisperfile delivers good quality generic speech-to-text for everyone, well, everyone interested in English language speech-to-text. Stream in audio, feed it an audio file (mono 16ksps WAV format, please), and it will emit a text version of whatever is in there if it is even modestly well-recorded. Well before whisperfile, I’ve been using whisper.cpp to transcribe all sorts of audio I have on hand. whisperfile just makes it that much simpler to get going.
I look forward to the next release of Tunney’s Cosmopolitan and an official ‘whisperfile’ release. Things keep progressing on this front.