Hello selfhosters.
We all have bare-metal servres, VPS:es, containers and other things running. Some of them may be exposed openly to the internet, which is populated by autonomous malicious actors, and some may reside on a closed-off network since they contain sensitive data.
And there is a lot of solutions to monitor your servers, since none of us want our resources to be part of a botnet, or mine bitcoins for APTs, or simply have confidential data fall into the wrong hands.
Some of the tools I’ve looked at for this task are check_mk, netmonitor, monit: all of there monitor metrics such as CPU, RAM and network activity. Other tools such as Snort or Falco are designed to particularly detect suspicious activity. And there also are solutions that are hobbled together, like fail2ban actions together with pushover to get notified of intrusion attempts.
So my question to you is - how do you monitor your servers and with what tools? I need some inspiration to know what tooling to settle on to be able that detect unwanted external activity on my resources.
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.
Rules:
Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
Resources:
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Prometheus, Loki and Grafana.
I used zabbix at some point, but I never looked at the data so I stopped. Zabbix shows all kind of stuff.
I have cockpit on my bare-metal that has some stats, and netdata on my firewall, I do not track any of my VM’s (except vnstat that runs on everything device).
Reduce your threat profile. Run sslh, 443 handles both SSL and ssh. Adjust your host based firewall to just 443 Attack yourself on that port, identify the logs Add the new profiles to fail2ban Enable fail2ban email If you don’t like email, use a service that translates email to notification. Ntfy.sh is free notifications Or… Use something like tailscale and don’t offer a remote login to the general Internet.
I submitted your post to got here’s what it thought
https://shareg.pt/Tz0El4k
Netdata (agent only/not the cloud-based features), and a bunch of scanners running from cron/systemd timers, rsyslog for logs (and graylog for larger setups)
My base ansible role for monitoring.
Since your question is also related to securing your setup, inspect and harden the configuration of all running services and the OS itself. Here is my common ansible role for basic stuff. Find (prefereably official) hardening guides for your distribution and implement hardening guidelines such as DISA STIG, CIS benchmarks, ANSSI guides, etc.
I’ve dabbled with some monitoring tools in the past, but never really stuck with anything proper for very long. I usually notice issues myself. I self-host my own custom new-tab page that I use across all my devices and between that, Nextcloud clients, and my home-assistant reverse proxy on the same vps, when I do have unexpected downtime, I usually notice within a few minutes.
Other than that I run fail2ban, and have my vps configured to send me a text message/notification whenever someone successfully logs in to a shell via ssh, just in case.
Based on the logs over the years, most bots that try to login try with usernames like admin or root, I have root login disabled for ssh, and the one account that can be used over ssh has a non-obvious username that would also have to be guessed before an attacker could even try passwords, and fail2ban does a good job of blocking ips that fail after a few tries.
If I used containers, I would probably want a way to monitor them, but I personally dislike containers (for myself, I’m not here to “yuck” anyone’s “yum”) and deliberately avoid them.