Hard disk self testing on Linux

Your hard disk is where your data lives, so it is prudent to monitor it with periodic self testing.

For this task we’ll be using smartmontools. I use Linux and these instructions reflect that, but smartmontools can also be used on other operating systems.

GETTING STARTED
Set up postfix or another local mail transport agent. The hard disk self tests we will set up in a moment will notify us of problems via local mail.

Use your distribution’s package manager to install smartmontools. I also like to install gsmartcontrol, which lets you view the self test results in an easy to use GUI.

SET UP HARD DISK SELF-TESTING
Follow these instructions to set up daily automatic testing of the hard disk and have smartd mail you if a test detects a problem. In doing so, take note of the following:

  • When you first edit /etc/smartd.conf, you should confirm that error messages will be sent and received correctly. Do so by adding -M test to the end of the smartd configuration line. Then restart smartd (as root, service smartd restart. You should immediately receive a test error message. Remove -M test once all is working.
  • The instructions above add -M exec /usr/share/smartmontools/smartd-runner to smartd.conf; I find it unnecessary even on Debian. Your mileage may vary.
  • Once smartd.conf is working to your liking, add it to your backup routine.

And you’re done. For reference, here’s my smartd.conf:

# Schedule short tests daily at 8 am and long tests Monday at 1 pm
/dev/sda -a -o on -S on -s (S/../.././08|L/../../1/13) -m warren@verdi

RESPONDING TO ERROR MESSAGES
Once set up, no news is good news. But one sad day you may receive a message like this:

From: root <root@verdi.home.invalid>
To: root@verdi.home.invalid
Subject: SMART error (CurrentPendingSector) detected on host: verdi.home.invalid
Date: Tue, 18 May 2010 17:56:45 -0600

This email was generated by the smartd daemon running on:

host name: verdi.home.invalid
DNS domain: home.invalid
NIS domain: (none)

The following warning/error was logged by the smartd daemon:

Device: /dev/sda, 1 Currently unreadable (pending) sectors

What to do? First, view the test results. You can do this from a terminal (as root, smartctl -l selftest /dev/sda, changing the path to the device as appropriate) or (again, as root) with the GUI tool gsmartcontrol. In the particular case cited above, I saw:

# smartctl -l selftest /dev/sda
smartctl version 5.38 [i586-mandriva-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     17664         -
# 2  Short offline       Completed without error       00%     17663         -
# 3  Extended offline    Completed without error       00%     17615         -
# 4  Extended offline    Completed without error       00%     17435         -
# 5  Extended offline    Completed without error       00%     17287         -
# 6  Extended offline    Completed without error       00%     17163         -
# 7  Extended offline    Completed without error       00%     17016         -
# 8  Extended offline    Completed: read failure       80%     16867         43131442
# 9  Extended offline    Completed: read failure       80%     16837         43131442
#10  Extended offline    Completed without error       00%     16686
[snipped rest of tests]

This is a list of the 21 most recent self-tests, most recent first. This particular report shows that two tests had errors, but since then seven more recent tests completed without error. So this particular report does not worry me. Some errors, however, are serious. Make sure your backups are up to date and consult the smartmontools FAQ for information on specific errors.

Each time a self-test is run, the most recent test becomes #1 in the list and the oldest test is discarded. So in the example above the two failed tests will be flushed from the list over time. Be forewarned that when the last failed test is removed, a bug in smartmontools generates the warning message “new Self-Test Log error at hour timestamp 0”. This message can be safely ignored.

NEXT STEPS
I don’t use it, but there is smart-nofifier, which when added to the session of the user will display hard disk error messages on the user’s screen — something that would be easily missed if the computer is running unattended.

Self testing your hard disk is just one aspect of hardware monitoring. Don’t overlook the bigger picture.

REFERENCES
Google’s Disk Failure Experience

These notes were last updated 26 August 2014 with reference to smartmontools 6.2.

Advertisements

About Warren Post

So far: Customer support guy, jungle guide, IT consultant, beach bum, entrepreneur, teacher, diplomat, over-enthusiastic cyclist. Tomorrow: who knows?
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

One Response to Hard disk self testing on Linux

  1. Pingback: Hardware monitoring on Linux | A maze of twisty little passages

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s