Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Stefano Marinelli
@stefano@mastodon.bsd.cafe  ·  activity timestamp 3 weeks ago

Spent my morning figuring out why Nginx was dead on a server with many days of uptime. No reboot, no kernel panic. Just... down. Ubuntu 24.04.

The cause? An automatic unattended-upgrade of libc6. This prompted systemd to work its magic, wisely deciding to restart every running service to apply the patch. Fine.

The problem is, in the exact same minute, the systemd timer for certbot decided it was time to renew certificates.

The result:

- systemd stops Nginx.
- Port 80 becomes free.
- certbot, in standalone mode, immediately grabs it for validation.
- systemd tries to restart Nginx, which fails with "Address already in use".

The web server was knocked offline by its own certificate renewal script.

I swear, this is the kind of cascading failure that has never happened to me in years of running *BSD. With a classic cron job, certbot would have failed, logged an error, and tried again the next day. The web server would have remained untouched.

systemd was doing its job, but something failed because of the interactions.

Sometimes, too much automation and too many interconnected parts just create more spectacular ways for things to break.

#SysAdmin #Linux #SystemD #Rant #KISS

  • Copy link
  • Flag this post
  • Block
stratacast
@stratacast@mastodon.bsd.cafe replied  ·  activity timestamp 2 weeks ago

@stefano Stuff like this is why I disable unattended upgrades on my servers 😊 thank you for the reminder

  • Copy link
  • Flag this comment
  • Block
Dave Polaschek (he/him)
@davepolaschek@writing.exchange replied  ·  activity timestamp 2 weeks ago

@stefano I knew running systemd was a bad idea for me. Sticking with BSDs for the foreseeable future.

  • Copy link
  • Flag this comment
  • Block
chebra
@chebra@mstdn.io replied  ·  activity timestamp 2 weeks ago

@stefano How did you even trace this down??

  • Copy link
  • Flag this comment
  • Block
Ricardo Martín :bsdhead:
@ricardo@mastodon.bsd.cafe replied  ·  activity timestamp 2 weeks ago

@stefano Call me ol'fashioned ☺️
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Unattended-Upgrade "0";

  • Copy link
  • Flag this comment
  • Block
Tom
@pertho@mastodon.bsd.cafe replied  ·  activity timestamp 2 weeks ago

@stefano Wow.. talk about the worst timing!

  • Copy link
  • Flag this comment
  • Block
lbky
@lbky@chaos.social replied  ·  activity timestamp 2 weeks ago

@stefano I'm sorry, but systemd has nothing to do with this, as it's only an intermediary when things get restarted, i.e. the postinst script of the deb package told systemd to restart the service. On a Debian-based system this would have happened with any other service manager (known to the debhelper magic generating the script) as well.

  • Copy link
  • Flag this comment
  • Block
lbky
@lbky@chaos.social replied  ·  activity timestamp 2 weeks ago

@stefano Also, whether certbot is run by a timer or a cron job, will not make a difference, unless surprisingly they run different commands, which I doubt. You might need to check your certbot config for it to use a specific auth method or not use a one.

  • Copy link
  • Flag this comment
  • Block
Exa :calim:
@Exagone313@share.elouworld.org replied  ·  activity timestamp 3 weeks ago

@stefano I use dehydrated instead of certbot, which never binds any port (acme.sh is also recommended). You need to configure your HTTP server yourself and use a directory for ACME challenges.

Also, edit your nginx service so that it restarts automatically, which is usually not the case with the default service file:

systemctl edit nginx

[Service]
Restart=always
RestartSec=3

(This creates an override file so it shouldn't conflict with your package manager)

  • Copy link
  • Flag this comment
  • Block
Haelwenn /элвэн/ :triskell:
@lanodan@queer.hacktivis.me replied  ·  activity timestamp 3 weeks ago
@stefano Says more about certbot than systemd though.
Like web server can just stay up with using the other ACME challenges (which can be DNS or reverse-proxying the acme client), so web server never has to go down.
  • Copy link
  • Flag this comment
  • Block
Monospace Mentor
@monospace@floss.social replied  ·  activity timestamp 3 weeks ago

@stefano I read this as a simple race condition for port 80, and can't see how this is an "only on Linux" thing.

  • Copy link
  • Flag this comment
  • Block
Joel Carnat ♑ 🤪
@joel@gts.tumfatig.net replied  ·  activity timestamp 3 weeks ago

@stefano this will not happen anymore when the Linux kernel will ship with AI.

  • Copy link
  • Flag this comment
  • Block
Sheogorath
@sheogorath@microblog.shivering-isles.com replied  ·  activity timestamp 3 weeks ago

@stefano systemd doesn't restart services on its own after updates, AFAIK that's an apt thing. Maybe even a thing specific to deb package macros for debian systems.

You could even have systemd handle the entire socket for nginx making sure that even during a restart of nginx that port is bound:

https://systemd.io/DAEMON_SOCKET_ACTIVATION/

TL;DR: pinning this one on systemd is wrong.

  • Copy link
  • Flag this comment
  • Block
Dr.Kidpixo 🔢☄🔨💻 ⌨️ 🐍🐧
@kidpixo@mastodon.uno replied  ·  activity timestamp 3 weeks ago

@stefano "automatic unattended-upgrade of libc6" ?? some down stream security patch? don't those updates needs sudo rights?
I run Arch since years and did my fair share of disaster, but at least I knew it was me (like not reading arch news EVERY TIME) and I was able to recover.

  • Copy link
  • Flag this comment
  • Block
Farooq | فاروق
@farooqkz@cr8r.gg replied  ·  activity timestamp 3 weeks ago

@stefano

hmm I think the problem's here using certbot in standalone mode. Don't you think so?

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.0-rc.3.13 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login