• 5 Posts
  • 5 Comments
Joined 1Y ago
cake
Cake day: Jun 09, 2023

help-circle
rss
Is there a Ben Eater’s Bread Board Computer/6502 type of content creator for home networks?
I've been watching some One Marc Fifty stuff on YouTube. I can follow him well, and I'm decent at much of the hardware stuff. At least I can compile OpenWRT or do a basic Gentoo install with a custom kernel. I dread staring at NFTables, but can hack around some. I don't fully understand networking from the abstract fundamentals. Are there any good sources that break down the subject like Ben Eater did with the 8 bit bread board computer, showing all the basic logic, buses, and registers surrounding the Arithmetic Logic Unit? I'm largely looking for a more fundamental perspective on what are the core components of the stack and what elements are limited to niche applications. I just realized I want to use self signed client certificates between devices. It was one of those moments where I feel dumb for the limited scope of my knowledge about the scale of various problems and solutions.
fedilink

Open source project device support pages are always my first stop. If you have access to a git repo for the project, use gource to visualize who is doing what and where within the project recently. This will make it obvious what hardware the main devs are invested in the most, and therefore what will have the best support and user experience.

https://gource.io/


Is there a way to run old bare metal hardware on LAN for a dedicated computing task like AI?
This is an abstract curiosity. Let's say I want to use an old laptop to run a LLM AI. I assume I would still need pytorch, transformers, etc. What is the absolute minimum system configuration required to avoid overhead such as schedulers, kernel threads, virtual memory, etc. Are there options to expose the bare metal and use a networked machine to manage overhead? Maybe a way to connect the extra machine as if it is an extra CPU socket or NUMA module? Basically, I want to turn an entire system into a dedicated AI compute module.
fedilink

There may be other out of the box type solutions. This setup really isn’t bad. You can find info on places like YT that are step by step for Windows.

If you are at all interested in learning about software and how to get started using a command line, this would be a good place to start.

Oobabooga is well configured to make installation easy. It just involves a few commands that are unlikely to have catastrophic errors. All of the steps required are detailed in the README.md file. You don’t actually need to know or understand everything I described in the last message. I described why the model is named like x/y/z if you care to understand. This just explained details I learned by making lots of mistakes. The key here is that I linked to the model you need specifically and tried to explain how to choose the right file from the linked model. If you still don’t understand, feel free to ask. Most people here remember what it was like to learn.


Originally posted this to beehaw on another account:

Oobabooga is the main GUI used to interact with models.

https://github.com/oobabooga/text-generation-webui

FYI, you need to find checkpoint models. In the available chat models space, naming can be ambiguous for a few reasons I’m not going to ramble about here. The main source of models is Hugging Face. Start with this model (or get the censored version):

https://huggingface.co/TheBloke/llama2_7b_chat_uncensored-GGML

First, let’s break down the title.

  • This is a model based in Meta’s Llama2.
  • This is not “FOSS” in the GPL/MIT type of context. This model has a license that is quite broad in scope with the key point stipulating it can not be used commercially for apps that have more than 700 million users.
  • Next, it was quantized by a popular user going by “The Bloke.” I have no idea who this is IRL but I imagine this is a pseudonym or corporate alias given how much content is uploaded by this account on HF.
  • This model is based on a 7 Billion parameter dataset, and is fine tuned for chat applications.
  • This is uncensored meaning it will respond to most inputs as best it can. It can get NSFW, or talk about almost anything. In practice there are still some minor biases that are likely just over arching morality inherent to the datasets used, or it might be coded somewhere obscure.
  • Last part of the title is that this is a GGML model. This means it can run on CPU or GPU or a split between the two.

As for options on the landing page or “model card”

  • you need to get one of the older style models that have “q(numb)” as the quantization type. Do not get the ones that say “qK” as these won’t work with the llama.cpp file you will get with Oobabooga.
  • look at the guide at the bottom of the model card where it tells you how much ram you need for each quantization type. If you have a Nvidia GPU with the CUDA API, enabling GPU layers makes the model run faster, and with quite a bit less system memory from what is stated on the model card.

The 7B models are about like having a conversation with your average teenager. Asking technical questions yielded around 50% accuracy in my experience. A 13B model got around 80% accuracy. The 30B WizardLM is around 90-95%. I’m still working on trying to get a 70B running on my computer. A lot of the larger models require compiling tools from source. They won’t work directly with Oobabooga.


This is a general list that was shared recently (has google analytics though):

PrivateGPT is on my list to try after someone posted about it weeks ago with this how to article (that has a view limit embedded before a pay wall)/github project repo:


What software do you want to run?

I’ve been doing a lot of research on this over the last 2 weeks. I have my machine in the mail, but have not tried anything myself on my own hardware.

For Stable Diffusion, 8GBV is usually considered absolute minimum to do very basic stuff only. 16GBV or more is the basic need for a decent workflow.

For AMD I have seen multiple sources saying to avoid it, but there are a few people that have working examples in the wild. Apparently, AMD only supports the 7k series of GPUs officially with ROCm/hips/AI stuff.

Officially with Stable Diffusion, only nvidia is supported.


Is it practical to use containers on an OS like Silverblue only for Nvidia GPU stuff while using the APU for a Wayland only desktop?
(Asking here bc this is the containers expert community and understanding containers has been challenging for me) Is it practical to use containers on an OS like Silverblue only for Nvidia GPU stuff like Stable Diffusion/Steam+external screen, while using the APU for a Wayland only desktop? ...or should I just stick to X11 for everything because nvidia sucks? I got a new laptop in transit (i7 12th gen with 3080ti) mostly for messing with Stable Diffusion and privateGPT. I will probably try reconnecting with an old friend with a Steam game too at some point. I've been on Silverblue for a couple of years (Wayland only). I don't know if I should just go back to Fedora Workstation to use X11 for everything, or if it is really practical to take a deeper dive into doing all of the nvidia stuff with toolbx/podman. Is there another distro that has full binary blob nvidia support and does containers/immutable well?
fedilink

What’s the best open hardware cheap DIY NAS to toss on a router like OpenWRT?
I was thinking it would be nice to have for backups, but maybe even going as far as mounting ~/home to be able to run from multiple machines. I'm really asking where to get started looking without crashing into marketing department nonsense and search engine steering bias.
fedilink

Is there a goto conference talk about Activity Pub, Fediverse, and Lemmy yet?
I'm just looking for an overview, like, what goes where to do what. I'm helping with some basic testing on the beta, looking at the client page source/github, and would like to ground my understanding. I know this is borderline for a "self hosting" post. I hope the mods will let it fly because this is such a large community. I'm sure people looking into hosting Lemmy instances are also taking a look under the hood and can maybe share good references. I searched on YT, and watched a few, but nothing stood out as good. There are only half a dozen results before the YT salting algorithm goes ADHD on irrelevant nonsense. If you happen to have a ref on YT, can you please share titles and content creator? I'd rather avoid their tracker links to my account here. Edited title to be more clear that I am looking for informative content. Sorry for any misunderstanding.
fedilink