For me just the convenience of having everything in one box. Simplifies networking too. I run home assistant, openwrt, OMV, an ubuntu dtop VM and a wordpress LXC on a little m93 I jacked up with 32Gb RAM. Backups are dead simple and it’s all on one little UPS.
Some might prefer metal for other reasons but simplicity and convenience are priorities for me, at least in my homelab.
I’m in the early stages of this myself and haven’t actually run an LLM locally but the term that steered me in the right direction for what I was trying to do was ‘RAG’ Retrieval-Augmented Generation.
ragflow.io (terrible name but good product) seems to be a good starting point but is mainly set up for APIs at the moment though I found this link for local LLM integration and I’m going to play with it later today. https://github.com/infiniflow/ragflow/blob/main/docs/guides/deploy_local_llm.md