smart checkin #2 for 3ware / twe

Well, just because I did need that today, here’s my smart checkin script enhanced by checking discs behind a 3ware controller which has the /dev/twe devices… one day I’ll add adaptec.

Here’s the script again, and don’t forget to install smartmontools and bc:

#!/bin/bash
 
while read -r disk; do
  ret=0
  echo "~~ checking ${disk} ~~"
  health=$(smartctl -H ${disk} | awk '/result: /{print $6}');
  if [ $health != "PASSED" ]; then
    echo "Check the disc, it failed the overall smart health check..."
    ret=1
  fi
 
  while read -r line; do
    id=$(echo $line | awk '{print $1}' | bc);
    title=$(echo $line | awk '{print $2}');
    thresh=$(echo $line | awk '{print $6}' | bc);
    worst=$(echo $line | awk '{print $5}' | bc);
    value=$(echo $line | awk '{print $4}' | bc);
    raw=$(echo $line | awk '{print $10}' | bc);
 
    if [ $value -lt $thresh ]; then
      echo "$title: value($value) is less than thresh($thresh)";
      ret=1
    fi
 
    if [ $id -eq 5 ] || [ $id -eq 183 ] || [ $id -eq 187 ] || [ $id -eq 197 ] || [ $id -eq 198 ]; then
      if [ $raw -gt 0 ]; then
        echo "$title: raw value($value) is greater than zero";
        ret=1
      fi
    fi
 
    if [ $id -eq 9 ]; then
      years=$(echo "scale=0; $raw / 24 / 365" | bc);
      if [ $years -ge 4 ]; then
        echo "$title: disk is older($years) than 4 years";
        ret=1
      fi
    fi
 
    if [ $id -eq 194 ]; then
      if [ $raw -ge 45 ]; then
        echo "$title: disk is hotter($raw°C) than 45°C"
        ret=1
      elif [ $raw -le 25 ]; then
        echo "$title: disk is colder($raw°C) than 25°C"
        ret=1
      fi
    fi
  done< <(smartctl -A ${disk} | tail -n +8 | head -n -1);
  if [ $ret -eq 1 ]; then
    echo -e "  - \e[91mcheck ${disk} manually and monitor it closely.\e[39m";
  else
    echo -e "  + \e[92meverything is fine with ${disk}\e[39m";
  fi
done< <(ls /dev/sd[a-z])

The 3ware raid controller/kernel driver creates some device nodes in /dev which are called twe0 to twe15. I just need a quick and dirty solution, so I could just issue smartctl -H -d3ware,0-9 on twe0 to twe15 or something and check if $? is 0. Something which might as well work is:

tw_cli /c0 show | awk '/p[0-9]/{if($2 != "NOT-PRESENT") print $1}' | sed 's_p__g'

This will tell me the port numbers where a disc is located. Not sure how good this works if you have multiple controllers. However, for the usual case this should suffice. So, let’s go for the following:

DEVICES=$(ls /dev/sd[a-z])
if [ -x "/usr/sbin/tw_cli" ]; then
  while read -r PORT; do
    DEVICES=($DEVICES "/dev/twe0 -d 3ware,$PORT")
  done< <(/usr/sbin/tw_cli /c0 show | awk '/p[0-9]/{if($2 != "NOT-PRESENT") print $1}' | sed 's_p__g');
fi

this results in:

root@psv1:~# for i in "${DEVICES[@]}"; do echo $i; done
/dev/sda
/dev/twe0 -d 3ware,1

note the „“ which I’ve put around ${DEVICES[@]}. Without that, it’ll place -d 3ware,1 on their own lines. Now, let’s adjust our script from above:

#!/bin/bash
 
DEVICES=$(ls /dev/sd[a-z])
if [ -x "/usr/sbin/tw_cli" ]; then
  while read -r PORT; do
    DEVICES=($DEVICES "/dev/twe0 -d 3ware,$PORT")
  done< <(/usr/sbin/tw_cli /c0 show | awk '/p[0-9]/{if($2 != "NOT-PRESENT") print $1}' | sed 's_p__g');
fi
 
for disk in "${DEVICES[@]}"; do
  ret=0
  echo "~~ checking ${disk} ~~"
  health=$(smartctl -H ${disk} | awk '/result: /{print $6}');
  if [ "$health" != "PASSED" ]; then
    echo "Check the disc, it failed the overall smart health check..."
    ret=1
  fi
 
  while read -r line; do
    id=$(echo $line | awk '{print $1}' | bc);
    title=$(echo $line | awk '{print $2}');
    thresh=$(echo $line | awk '{print $6}' | bc);
    worst=$(echo $line | awk '{print $5}' | bc);
    value=$(echo $line | awk '{print $4}' | bc);
    raw=$(echo $line | awk '{print $10}' | bc);
 
    if [ $value -lt $thresh ]; then
      echo "$title: value($value) is less than thresh($thresh)";
      ret=1
    fi
 
    if [ $id -eq 5 ] || [ $id -eq 183 ] || [ $id -eq 187 ] || [ $id -eq 197 ] || [ $id -eq 198 ]; then
      if [ $raw -gt 0 ]; then
        echo "$title: raw value($value) is greater than zero";
        ret=1
      fi
    fi
 
    if [ $id -eq 9 ]; then
      years=$(echo "scale=0; $raw / 24 / 365" | bc);
      if [ $years -ge 4 ]; then
        echo "$title: disk is older($years) than 4 years";
        ret=1
      fi
    fi
 
    if [ $id -eq 194 ]; then
      if [ $raw -ge 45 ]; then
        echo "$title: disk is hotter($raw°C) than 45°C"
        ret=1
      elif [ $raw -le 25 ]; then
        echo "$title: disk is colder($raw°C) than 25°C"
        ret=1
      fi
    fi
  done< <(smartctl -A ${disk} | tail -n +8 | head -n -1);
  if [ $ret -eq 1 ]; then
    echo -e "  - \e[91mcheck ${disk} manually and monitor it closely.\e[39m";
  else
    echo -e "  + \e[92meverything is fine with ${disk}\e[39m";
  fi
done

and a test run…

root@psv1:~# ./test.sh
~~ checking /dev/sda ~~
Check the disc, it failed the overall smart health check...
  - check /dev/sda manually and monitor it closely.
~~ checking /dev/twe0 -d 3ware,1 ~~
Power_On_Hours: disk is older(8) than 4 years
Reported_Uncorrect: raw value(100) is greater than zero
Temperature_Celsius: disk is colder(23°C) than 25°C
Current_Pending_Sector: raw value(95) is greater than zero
  - check /dev/twe0 -d 3ware,1 manually and monitor it closely.

sda fails because thats the exported drive by the raidcontroller. That is fine and can be ignored.

No Comments

Post a Comment