=^.^=

Recover a Solaris Service Stuck in Maintenance Mode

karma

Something updated, a configuration file changed, there was a power outage, aliens descended from the heavens... and now a service that was running yesterday is nowhere to be found. Even the last bastion of an unraveling mind - turning it off and on again - has failed you. Welcome to SMF's maintenance mode.
# svcs | grep gdm maintenance 9:33:00 svc:/application/graphical-login/gdm:default

According to the official documentation (Managing Services (Overview) Introduction to SMF > SMF Concepts > Service States) a service enters the maintenance state when:

The service instance has encountered an error that must be resolved by the administrator.

This is done to prevent a critical error from running out of control or unnecessarily suspending system or service startup procedures when a service fails to start multiple times. The documentation (How to Restore a Service That Is in the Maintenance State) further recommends:

Determine if any process that are dependent to the service have not stopped.

Normally, when a service instance is in a maintenance state, all processes associated with that instance have stopped. However, you should make sure before you proceed. The following command lists all of the processes that are associated with a service instance as well as the PIDs for those processes.
# svcs -p service-name

(Optional) Kill any remaining processes.

Repeat this step for all processes that are displayed by the svcs command:

# pkill -9 process-name

Before we can clear our problem service we need to figure out what went wrong. We can take a detailed look with svcs -xv service-name:
# svcs -xv gdm svc:/application/graphical-login/gdm:default (GNOME Display Manager) State: maintenance since Sat Nov 20 09:33:00 2021 Reason: Method failed repeatedly. See: http://support.oracle.com/msg/SMF-8000-8Q See: man -M /usr/share/man -s 8 gdm See: /var/svc/log/application-graphical-login-gdm:default.log Impact: This service is not running.

It is so delightful to have a pertinent log file suggested in the middle of a potential crisis instead of having to sort through reams of journalctl vomit.

Once the issue has been resolved bring the service out of maintenance mode:
# svcadm clear service-name

Start it from the top:
# svcadm start service-name

And verify all is well:
# svcs -x gdm svc:/application/graphical-login/gdm:default (GNOME Display Manager) State: online since Sat Nov 20 10:06:16 2021 See: gdm(8) See: /var/svc/log/application-graphical-login-gdm:default.log Impact: None.

Comments

There are no comments for this item.