November 08, 2023

Raspberry Pi watchdog made simple

Updated 2023-11-08 for Bookworm !

The purpose of a watchdog timer(WD) is very well described here but a more practical answer is;

If your Raspberry Pi(RPi) "freezes", or hangs, it will automagically reboot within a very short time.

This is made with a combination of hardware and software which is all ready available, on the RPi, and just have to be enabled.

After some hours Googling and testing these are my findings !

Intro
There are different ways activating a WD function, including installing packages, but this "native" way seems to be the most simple.

You can check the devices involved with

$ ls -al /dev/watchdog*
crw------- 1 root root 10, 130 Oct 1 14:17 /dev/watchdog
crw------- 1 root root 250, 0 Oct 1 14:17 /dev/watchdog0

and the hardware involved are the processors on the RPI.

The WD is a systemd process so you can use the "systemctl" command to show the default setup.

$ systemctl show | grep -i watchdog
WatchdogLastPingTimestampMonotonic=18446744073709551615
RuntimeWatchdogUSec=0
RuntimeWatchdogPreUSec=0
RebootWatchdogUSec=10min
KExecWatchdogUSec=0
ServiceWatchdogs=yes

The info "RuntimeWatchdogUSec=0tells us that the WD isn't active. Must be greater than 0.

Enable the Watchdog
The WD is enabled by editing

$ sudo nano /etc/systemd/system.conf

where the different parameters are described here and in my case I choose

RuntimeWatchdogSec=10
RebootWatchdogSec=2min

which means that if the WD hardware will not be contacted within 10 seconds the reboot process will start and the second parameter ensures that the reboot takes place even if a clean reboot attempt times out.

Please note that the value "RuntimeWatchdogSec" must be <= 15, since the RPi processors can't handle a greater value.

Start the Watchdog
After a reboot you can check the new values with the above command and also confirm that the WD process have started. Use

$ dmesg | grep watchdog
[ 3.125165] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
[ 5.200029] systemd[1]: Hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0

or using the above command again

$ systemctl show | grep -i watchdog
WatchdogDevice=/dev/watchdog0
WatchdogLastPingTimestamp=Wed 2023-11-08 23:09:24 CET
WatchdogLastPingTimestampMonotonic=336868708
RuntimeWatchdogUSec=10s
RuntimeWatchdogPreUSec=0
RebootWatchdogUSec=2min
KExecWatchdogUSec=0
ServiceWatchdogs=yes

or on Bookworm 
 
$ journalctl -b --no-pager | grep Watchdog
Nov 09 10:14:41 test systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Nov 09 10:14:41 test systemd[1]: Watchdog running with a timeout of 10s.
Nov 09 10:15:05 test rtkit-daemon[802]: Watchdog thread running.

Testing ?
If you dare ๐Ÿ˜‰ you can test the WD function, creating a kernel panic, with the commands

$ sudo su
/home/pi# echo 1 > /proc/sys/kernel/sysrq
/home/pi# echo "c" > /proc/sysrq-trigger
client_loop: send disconnect: Broken pipe

It's said that a fork bomb also can test the WD but it seems that process limit parameters are set, in later RPi OS, so it limits the amount of forks. Some processes are terminated and restarted but no reboot.

Remark
As mentioned above there also is a watchdog package with extended features available, edited in watchdog.conf, such as;
  • Temperature Sensors
  • Load Averages
  • Network Monitoring
  • Network "ping"
  • File Monitoring
  • Memory Test
where set thresholds for these will restart the RPi.

Install with 

$ sudo apt install watchdog

and check out a very nice site explaining it all.

A second thought
I was first very interested in this watchdog package, handling a RPi Zero WiFi going down, but thought that a reboot was a little "to much" if just the WiFi is disconnected. 

This post handles how to check the WiFi and, if not connected, turn the interface down/up.


Jessie info 
Default setup

$ systemctl show | grep -i watchdog
RuntimeWatchdogUSec=0
ShutdownWatchdogUSec=10min
ServiceWatchdogs=yes

Parameters in /etc/systemd/system.conf

RuntimeWatchdogSec=10
ShutdownWatchdogSec=2min

8 comments:

Feel free to leave a comment ! ... but due to a lot of spam comments I have to moderate them. Will reply ASAP !