Updated 2023-11-08 for Bookworm !
The purpose of a watchdog timer(WD) is very
well described here
but a more practical answer is;
If your Raspberry Pi(RPi) "freezes", or hangs, it will automagically
reboot within a very short time.
This is made with a combination of hardware and software which is all ready
available, on the RPi, and just have to be enabled.
After some hours Googling and testing these are my findings !
There are different ways activating a WD function, including installing
packages, but
this "native" way
seems to be the most simple.
You can check the devices involved with
$ ls -al /dev/watchdog*
crw------- 1 root root 10, 130 Oct 1 14:17 /dev/watchdog
crw------- 1 root root 250, 0 Oct 1 14:17 /dev/watchdog0
crw------- 1 root root 10, 130 Oct 1 14:17 /dev/watchdog
crw------- 1 root root 250, 0 Oct 1 14:17 /dev/watchdog0
and the hardware involved are
the processors on the RPI.
$ systemctl show | grep -i watchdog
WatchdogLastPingTimestampMonotonic=18446744073709551615
RuntimeWatchdogUSec=0
RuntimeWatchdogPreUSec=0
RebootWatchdogUSec=10min
KExecWatchdogUSec=0
ServiceWatchdogs=yes
The info "RuntimeWatchdogUSec=0" tells us that the WD isn't active. Must be greater than 0.
Enable the Watchdog
The WD is enabled by editing
$ sudo nano /etc/systemd/system.conf
where the different parameters are described here and in my case I choose
RuntimeWatchdogSec=10
RebootWatchdogSec=2min
RebootWatchdogSec=2min
which means that if the WD hardware will not be contacted within 10
seconds the reboot process will start and the second parameter ensures
that the reboot takes place even if a clean reboot attempt times out.
Please note that the value "RuntimeWatchdogSec" must be <= 15, since the RPi processors can't handle a greater
value.
Start the Watchdog
After a reboot you can check the new values with the above command
and also confirm that the WD process have started. Use
$ dmesg | grep watchdog
[ 3.125165] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
[ 5.200029] systemd[1]: Hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0
[ 3.125165] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
[ 5.200029] systemd[1]: Hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0
or using the above command again
$ systemctl show | grep -i watchdog
WatchdogDevice=/dev/watchdog0
WatchdogLastPingTimestamp=Wed 2023-11-08 23:09:24 CET
WatchdogLastPingTimestampMonotonic=336868708
RuntimeWatchdogUSec=10s
RuntimeWatchdogPreUSec=0
RebootWatchdogUSec=2min
KExecWatchdogUSec=0
ServiceWatchdogs=yes
or on Bookworm
$ journalctl -b --no-pager | grep Watchdog
Nov 09 10:14:41 test systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Nov 09 10:14:41 test systemd[1]: Watchdog running with a timeout of 10s.
Nov 09 10:15:05 test rtkit-daemon[802]: Watchdog thread running.
Testing ?
If you dare ๐ you can test the WD function, creating a kernel
panic, with the commands
$ sudo su
/home/pi# echo 1 > /proc/sys/kernel/sysrq
/home/pi# echo "c" > /proc/sysrq-trigger
client_loop: send disconnect: Broken pipe
/home/pi# echo 1 > /proc/sys/kernel/sysrq
/home/pi# echo "c" > /proc/sysrq-trigger
client_loop: send disconnect: Broken pipe
It's said that a
fork bomb
also can test the WD but it seems that process limit parameters are set,
in later RPi OS, so it limits the amount of forks. Some processes are
terminated and restarted but no reboot.
Remark
As mentioned above there also is a
watchdog package
with extended features available, edited in watchdog.conf, such as;
- Temperature Sensors
- Load Averages
- Network Monitoring
- Network "ping"
- File Monitoring
- Memory Test
where set thresholds for these will restart the RPi.
Install with
$ sudo apt install watchdog
and check out
a very nice site
explaining it all.
A second thought
I was first very interested in this watchdog package, handling a RPi Zero
WiFi going down, but thought that a reboot was a little "to much" if just
the WiFi is disconnected.
This post
handles how to check the WiFi and, if not connected, turn the
interface down/up.
Jessie info
Default setup
$ systemctl show | grep -i watchdog
RuntimeWatchdogUSec=0
ShutdownWatchdogUSec=10min
ServiceWatchdogs=yes
RuntimeWatchdogUSec=0
ShutdownWatchdogUSec=10min
ServiceWatchdogs=yes
Parameters
in /etc/systemd/system.conf
RuntimeWatchdogSec=10
ShutdownWatchdogSec=2min
ShutdownWatchdogSec=2min
Great work, thanks a ton!
ReplyDeleteThx !
DeleteThis worked great thank you.
ReplyDelete๐
DeleteThanks!
ReplyDelete๐
DeleteGood bon, thanks
ReplyDeleteNice you liked it !
Delete