Homelab Adventure: Figuring Out Capabilities

Given the CUDA Capability of 6.1 that the NVIDIA Tesla P40 has, I had assumed when I got an error message when trying to run Karpathy’s nanoGPT code, that I was out of luck for using that or other language model code with the P40. It turns out the error message wasn’t fatal, and the P40 did handle the Shakespeare corpus example training that is described in the nanoGPT project. It just did it considerably slower than Karpathy described running it on an NVIDIA A100 (closer to an hour rather than the 3 minutes Karpathy said it took the A100 to do it; I think the P40 simulates FP16 operations, and that may account for the difference).

I hadn’t read Karpathy’s text closely enough, because I tried starting the GPT-2-like training using the OpenWebText corpus. I commented out a call to ‘wandb’, which appears to do logging, but wants an API code, and let it run. I got a report of progress for the first epoch of training, and then it errored out on a call to the now-undefined ‘wandb’. Let’s assume I can either get the wandb happy with an API code or finish disabling it. If the P40 is 20x slower than an A100, and Karpathy’s report of 4 days of training on a node with 8 of the A100s is accurate, then even if 24GB of VRAM were enough to handle the job, I’d be looking at 4*8*20 = 640 days of training to get to the same validation loss level Karpathy reports at the end of his training.

Still, I am encouraged that my options aren’t quite as limited as I thought.

There’s the Coqui.AI ‘TTS’ (for text-to-speech) project, for example. This provides several models of AI voices for converting text to speech, and even has voice-cloning capabilities. This seems to be compatible with the P40, too.

Then there are some blog posts concerning doing things with nanoGPT at small scale. One is a blog post about training nanoGPT on the author’s blog posts. The results don’t look too materially different from what ‘travesty’ generators were able to do at larger n-gram analysis values (n>=7 or so, I think) back in the 1980s. But another blog post from the same author describes adding API usage capability to his nanoGPT model, and that seems utterly cool and not something that was approached decades ago.

There is another project worth a mention, though it doesn’t rely on the NVIDIA P40. The Whisper.cpp project is a speech-to-text project, based on an OpenAI model. It does generic English transcription. The underlying model does well on most spoken English. The Whisper.cpp code makes that so it can be used via CPU power on most common operating systems, including IOS and Android. I cloned the repo, compiled as in the example for the “base_en” model, and have been trying it out for a few hours. I have a bunch of ‘home video’ files, so I set up a Python script to get the list of files, use FFMPEG to extract audio to the 16KSPS WAV format Whisper.cpp requires, and then applies the Whisper.cpp executable to obtain a transcription. I specified using 6 cores and 2 threads per core so my 8-core box is not entirely busy, and that seems to be getting through the set at a pretty good clip. Looking at the console for an example, it looks like with those settings, Whisper.cpp is able to transcribe over 90 minutes of a video in under 13 minutes of processing time.

I have a lot of cassette tapes to get digitized and run through Whisper.cpp, too.

Wesley R. Elsberry

Falconer. Interdisciplinary researcher: biology and computer science. Data scientist in real estate and econometrics. Blogger. Speaker. Photographer. Husband. Christian. Activist.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.