Table of Contents

Linux HDD Diagnostics & Health Check

This guide outlines the steps to diagnose hard drive issues, ranging from system freezes (I/O Wait) to physical hardware failure.

Step 1: Is the Disk Slowing Down the System?

Before checking the disk physically, check if the disk is the bottleneck causing system lag or “freezes”.

Check I/O Wait (vmstat)

Run this command to see system activity in real-time:

vmstat 1

What to look for:

Check Kernel Logs (dmesg)

If the disk is disconnecting or timing out, the kernel will log it.

dmesg | grep -i "error\|fail\|ata\|scsi"

Red Flags:

Step 2: S.M.A.R.T. Health Analysis

Use ``smartctl`` to query the drive's internal health logs.

The "Quick Filter" Command

This command filters out the noise and shows only the critical health indicators:

smartctl -a /dev/sdX | grep -E "(Health|Error|Reallocated|Pending|Uncorrectable|CRC|Load_Cycle|Power_On)"

Replace /dev/sdX with your drive (e.g., /dev/sda)

Step 3: Interpreting the Results

Here is how to read the attributes based on our diagnosis:

The "Certificate of Death" (Critical Failures)

If any of these are greater than 0, the drive is dying and must be replaced immediately.

The "Silent Killers" (Performance & Wear)

These indicate why a drive might be slow or unreliable, even if “Healthy”.

A Note on Seagate Drives

Ignore high raw values for:

On Seagate drives, these are internal counters, not error counts. Only worry if the “VALUE” drops below “THRESH”.

Step 4: Identifying SMR Drives (The RAID Killer)

If a drive is healthy but causes RAID arrays to freeze during sync (speed drops to ~700KB/s), it is likely SMR (Shingled Magnetic Recording).

Symptoms:

Solution:

Summary Checklist

Attribute Value Verdict
Reallocated / Pending > 0 REPLACE IMMEDIATELY (Dead)
Load Cycle Count > 600k WARNING (Mechanical Wear)
CRC Errors > 0 CHECK CABLE
Resync Speed < 1MB/s SMR DRIVE (Unsuitable for RAID)