A lot of what you said is true.
Since the TPU is a matrix processor instead of a general purpose processor, it removes the memory access problem that slows down GPUs and CPUs and requires them to use more processing power.
Just no. Flat out no. Just so much wrong. How does the TPU process data? How does the data get there? It needs to be shuttled back and forth over the bus. Doing this for a 1080p image with of data several times a second is fine. An uncompressed 1080p image is about 8MB. Entirely manageable.
Edit: it’s not even 1080p, because the image would get resized to the input size. So again, 300x300x3 for the past model I could find.
/Edit
Look at this repo. You need to convert the models using the TFLite framework (Tensorflow Lite) which is designed for resource constrained edge devices. The max resolution for input size is 224x224x3. I would imagine it can’t handle anything larger.
https://github.com/jveitchmichaelis/edgetpu-yolo/tree/main/data
Now look at the official model zoo on the Google Coral website.
Not a single model is larger than 40MB. Whereas LLMs start at well over a big for even smaller (and inaccurate) models. The good ones start at about 4GB and I frequently run models at about 20GB. The size in parameters really makes a huge difference.
You likely/technically could run an LLM on a Coral, but you’re going to wait on the order of double-digit minutes for a basic response, of not way longer.
It’s just not going to happen.
when comparing apples to apples.
But this isn’t really easy to do, and impossible in some cases.
Historically, Nvidia has done better than AMD in gaming performance because there’s just so much game specific optimizations in the Nvidia drivers, whereas AMD didn’t.
On the other hand, AMD historically had better raw performance in scientific calculation tasks (pre-deeplearning trend).
Nvidia has had a stranglehold on the AI market entirely because of their CUDA dominance. But hopefully AMD has finally bucked that tend with their new ROCm release that is a drop-in replacement for CUDA (meaning you can just run CUDA compiled applications on AMD with no changes).
Also, AMD’s new MI300X AI processor is (supposedly) wiping the floor with Nvidia’s H100 cards. I say “supposedly” because I don’t have $50k USD to buy both cards and compare myself.
And you can add as many TPUs as you want to push it to whatever level you want
No you can’t. You’re going to be limited by the number of PCI lanes. But putting that aside, those Coral TPUs don’t have any memory. Which means for each operation you need to shuffle the relevant data over the bus to the device for processing, and then back and forth again. You’re going to be doing this thousands of times per second (likely much more) and I can tell you from personal experience that running AI like is painfully slow (if you can get it to even work that way in the first place).
You’re talking about the equivalent of buying hundreds of dollars of groceries, and then getting everything home 10km away by walking with whatever you can put in your pockets, and then doing multiple trips.
What you’re suggesting can’t work.
ATI cards (while pretty good) are always a step behind Nvidia.
Ok, you mean AMD. They bought ATI like 20 years ago now and that branding is long dead.
And AMD cards are hardly “a step behind” Nvidia. This is only true if you buy the 24GB top card of the series. Otherwise you’ll get comparable performance from AMD at a better value.
Plus, most distros have them working out of the box.
Unless you’re running a kernel <6.x then every distro will support AMD cards. And even then, you could always install the proprietary blobs from AMD and get full support on any distro. The kernel version only matters if you want to use the FOSS kernel drivers for the cards.
getting a few CUDA TPUs
Those aren’t “CUDA” anything. CUDA is a parallel processing framework by Nvidia and for Nvidia’s cards.
Also, those devices are only good for inferencing smaller models for things like object detection. They aren’t good for developing AI models (in the sense of training). And they can’t run LLMs. Maybe you can run a smaller model under 4B, but those aren’t exactly great for accuracy.
At best you could hope for is to run a very small instruct model trained on very specific data (like robotic actions) that doesn’t need accuracy in the sense of “knowledge accuracy”.
And completely forgot any kind of generative image stuff.
Are CUDAs something that I can select within pcpartpicker?
I’m not sure what they were trying to say, but there’s no such thing as “getting a couple of CUDA’s”.
CUDA is a framework that runs on Nvidia hardware. It’s the hardware that will have “CUDA cores” which are large amounts of low power processing units. AMD calls them “stream processors”.
I know how debugging works. I’ve been a developer for a couple decades.
I know for a fact that the lines I removed are normal verbose messages and entirely unrelated to my issue. I know not only because I’m a developer and understand the messages, but also because those lines show up every second of every minute of every day. They are some of the most verbose lines in the logs. The scheduled task for the subtitles only runs once a day and finishes within a few minutes.
Also, they weren’t indicative of any code path because of how frequent they were. At such a high frequency it becomes impossible to determine which line came first in multi-threaded or asynchronous tasks.
I literally have a pinned tab for a Whisper implementation on github! It’s on definitely my radar to check out. My only concern is how well does it do things like multiple speakers and does it generate SDH subtitles? It’s the type that has those extra bits like “Suspenseful music” and “[groans]”, “[screams]”, etc. All the stuff someone hard of hearing would benefit from.
even though they explicitly told you why and to not edit them
LMAO. Literally nowhere in a single screenshot did anyone say “don’t edit the logs”.
I think you’re smoking up waaay too much, my dude. Either that or you definitely are the person in the other end of my email convo. I’m getting more and more convinced of it. No one else would be so driven to make me out to be the bad guy here. Each of your comments are getting downvoted because the stuff you’re saying is bonkers.
So again, either you’re growing and smoking way too much weed. Or you really are the kind worded support person that deleted my account so unceremoniously. It’s one of the two.
Then when they asked for logs you just shot right to refund.
No, I provided logs, twice. Then they ghosted me for almost a month. I’m not complaining, all I did was reply again asking if they could do the refund.
You seem to be missing a hugely important point here. I didn’t want tech support, just a refund. The core tech issue did not matter. They were pushing for logs, and I went along with it. Regardless if the logs I provided were complete or not, I got told off for asking (not demanding) a refund NOT tech support.
Edit: why are you assuming that I deleted the “vast majority” of the log? Where did I mention the total size of the log?
I did, because I know they weren’t relevant. They were part of Jellyfin itself and not the plugin. It’s just a warning saying that a database query was slow (12ms). Since I wasn’t doing much on the server for the past few days, half the log was the warning (not an error).
So no, they weren’t part of the problem. I know they aren’t.
Edit: grammar
You stated that you are a Dev yourself, but then I was expecting that you should have tried to check their API and make the calls with curl, Postman, Insomnia or whatever, but apparently you never tried.
You’re absolutely right. I didn’t. Because I wasn’t invested in troubleshooting it. I have a full-time job, a family, etc.
The issue here is not about what wasn’t working. The issue here is being told off when simply asking for a refund.
The support person has even acknowledged that my profile was showing no downloads.
I am pretty sure they have monitoring on their API backend and can spot a problem
They are, as evidenced by the screenshot the support person shared showing the number of API calls. And they actually did have a problem with the API, which required an update to the plugin, which is all laid out at the start of my post.
FreeNAS is a deprecated version now. The successor (which is basically the same thing) is TrueNAS. They also have a version based purely on Linux called TrueNAS Scale. Both Community and Enterprise versions are available. The Community version is entirely free. It supports VMs through KVM and containerization, as well as all the network sharing options out there.
Another option is Proxmox. It’s Debian based and is more focused on virtualization than storage, but it has whatever you would really need for storage (including full ZFS support). You might find yourself in the command line for some things with Proxmox over TrueNAS, but if you were willing to go full Ubuntu I imagine that wouldn’t be an issue.
That being said, if you want to just go the manual route, then I suggest Debian. It’s leaner and considered more stable than Ubuntu, and doesn’t have some of the cruft that Ubuntu has (like Snaps), which may be a positive or a negative depending on what your needs are.
Edit: just to add, since you’re going to run Jellyfin and Nextcloud on these systems, my recommendation is Proxmox as it has great tooling for managing VMs, like automatic backups. I personally run both Nextcloud and Jellyfin in their own VMs. I like the workflow of backing up the entire VM and being able to restore it to the exact state when it was saved. Containers require a bit more knowledge to run them to be truly stateless, and then you have to worry about backing up your stateful data (like configuration files, etc) separately.
The traffic goes through a wireguard connection. Tailscale is just a facilitator to initiate the connections. There’s more to it than that, but that’s the basic gist of it.
The core technology is wireguard, and you could set everything up yourself, but plain wireguard can be a chore and pain to get all setup. Tailscale is honestly 5 minutes to get a basic connection going.
The best/easiest way to get started with a self-hosted LLM is to check out this repo:
https://github.com/oobabooga/text-generation-webui
Its goal is to be the Automatic1111 of text generators, and it does a fair job at it.
A good model that’s said to rival gpt-3.5 is the new Falcon model. The full sized version is too big to run on a single GPU, but the 7b version “only” needs about 16GB.
https://huggingface.co/tiiuae/falcon-7b
There’s also the Wizard-uncensored model that is popular.
https://huggingface.co/ehartford/Wizard-Vicuna-13B-Uncensored
There are a ton of models out there with new ones popping up every day. You just need to search around. The oobabooga repo has a few models linked in the readme also.
Edit: there’s also h20gpt, which seems really promising. I’m going to try it out in the next couple days.
That’s incredibly rude. At no point was I angry or enraged. What you’re trying to do is minimize my criticism of your last comment by intentionally making it seem like I was unreasonably angry.
I was going to continue with you in a friendly manner, but screw you. You’re an ass (and also entirely wrong).