HDSentinel hard drive monitoring tool


queeg

Recommended Posts

Has anyone tried HDSentinel for hard drive health tests?  I just came across it today and they have a free command line version for linux that I downloaded and ran on unRAID.  It's telling me one of my drives has a health rating at 40%.  I'm looking for reviews of this tool to see how reliable it is. 

Also, I downloaded the windows version and it's really interesting so far.

 

http://www.hdsentinel.com/download_hard_disk_sentinel.php

Link to comment

Below is the output of HDSentinel.  Attached is the smartctl output for /dev/sdb.

 

 

root@Queeg:~#  /mnt/user/Queeg/downloads/hdSentinel/HDSentinel

Hard Disk Sentinel for LINUX console 0.03 © 2008-2009 [email protected]

Start with -r [reportfile] to save data to report, -h for help

 

Examining hard disk configuration ...

 

HDD Device  0: /dev/sda

HDD Model ID : ST3500320AS

HDD Serial No: 9QM0EAHW

HDD Revision : SD15

HDD Size     : 476940 MB

Interface    : S-ATA

Temperature  : 29 ??C

Health       : 100 %

Performance  : 100 %

Power on time: 279 days, 12 hours

Est. lifetime: more than 1000 days

 

HDD Device  1: /dev/sdb

HDD Model ID : ST3500320AS

HDD Serial No: 9QM26B35

HDD Revision : SD15

HDD Size     : 476940 MB

Interface    : S-ATA

Temperature  : 30 ??C

Health       : 40 %

Performance  : 100 %

Power on time: 443 days, 17 hours

Est. lifetime: 221 days

 

HDD Device  2: /dev/sdc

HDD Model ID : ST3500320AS

HDD Serial No: 5QM01S58

HDD Revision : SD04

HDD Size     : 476940 MB

Interface    : S-ATA

Temperature  : 30 ??C

Health       : 97 %

Performance  : 100 %

Power on time: 490 days, 5 hours

Est. lifetime: more than 1000 days

 

HDD Device  3: /dev/sdd

HDD Model ID : Lexar   JD FireFly

HDD Serial No: ?

HDD Revision : 1100

HDD Size     : 1911 MB

Interface    : SCSI

Temperature  : Unknown ??C

Health       : Unknown %

Performance  : Unknown %

Power on time:

Est. lifetime:

 

 

root@Queeg:~# cp  /mnt/user/Queeg/downloads/hdSentinel/HDSentinel /boot

root@Queeg:~#

smartctl.txt

Link to comment

I used the windows version to monitor my netbook domain server's head parking totals. It was happening 6 times a minute, 60,000 times a week, so I ended up disabling that with smartctl. The power usage went up 1w to ... 6w total, with gigabit and 2gb of ram!  :D

Link to comment

For grin and giggles I went ahead and ran this tool on my server just a little while ago.

 

Here is the output:

root@Tower:~# /boot/hdsentinel 
Hard Disk Sentinel for LINUX console 0.03 (c) 2008-2009 [email protected]
Start with -r [reportfile] to save data to report, -h for help

Examining hard disk configuration ...

HDD Device  0: /dev/sda
HDD Model ID : LEXAR   JD FIREFLY
HDD Serial No: ?
HDD Revision : 1100
HDD Size     : 967 MB
Interface    : SCSI
Temperature  : Unknown °C
Health       : Unknown %
Performance  : Unknown %
Power on time: 
Est. lifetime: 

HDD Device  1: /dev/sdb
HDD Model ID : ST31500341AS
HDD Serial No: 6VS04GWN
HDD Revision : CC3G
HDD Size     : 1430799 MB
Interface    : S-ATA II
Temperature  : 24 °C
Health       : 79 %
Performance  : 100 %
Power on time: 373 days, 7 hours
Est. lifetime: 906 days

HDD Device  2: /dev/sdc
HDD Model ID : ST31000333AS
HDD Serial No: 6TE0G93C
HDD Revision : CC1F
HDD Size     : 953870 MB
Interface    : S-ATA II
Temperature  : 31 °C
Health       : 99 %
Performance  : 100 %
Power on time: 36 days, 6 hours
Est. lifetime: more than 1000 days

HDD Device  3: /dev/sdd
HDD Model ID : Hitachi HDS722020ALA330
HDD Serial No: JK1121YAGABMMS
HDD Revision : JKAOA20N
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 30 °C
Health       : 100 %
Performance  : 100 %
Power on time: 97 days, 5 hours
Est. lifetime: more than 1000 days

HDD Device  4: /dev/sde
HDD Model ID : ST3750640AS
HDD Serial No: 5QD5ELLF
HDD Revision : 3.AAE
HDD Size     : 715405 MB
Interface    : S-ATA II
Temperature  : 35 °C
Health       : 100 %
Performance  : 100 %
Power on time: 428 days, 12 hours
Est. lifetime: more than 1000 days

HDD Device  5: /dev/sdf
HDD Model ID : WDC WD5000AAKS-00TMA0
HDD Serial No: WD-WCAPW2595673
HDD Revision : 12.01C01
HDD Size     : 476940 MB
Interface    : S-ATA II
Temperature  : 26 °C
Health       : 100 %
Performance  : 100 %
Power on time: 852 days, 10 hours
Est. lifetime: more than 972 days

HDD Device  6: /dev/sdg
HDD Model ID : WDC WD5000AAKS-00TMA0
HDD Serial No: WD-WCAPW2132942
HDD Revision : 12.01C01
HDD Size     : 476940 MB
Interface    : S-ATA II
Temperature  : 23 °C
Health       : 100 %
Performance  : 100 %
Power on time: 763 days, 23 hours
Est. lifetime: more than 1000 days

HDD Device  7: /dev/sdh
HDD Model ID : SAMSUNG HD753LJ
HDD Serial No: S13UJ1MQ330294
HDD Revision : 1AA01110
HDD Size     : 774090 MB
Interface    : S-ATA II
Temperature  : 17 °C
Health       : 97 %
Performance  : 100 %
Power on time: 600 days, 1 hours
Est. lifetime: more than 1000 days

HDD Device  8: /dev/sdi
HDD Model ID : WDC WD1600JD-40HBC0
HDD Serial No: WD-WCAL94036151
HDD Revision : 21.02J21
HDD Size     : 178861 MB
Interface    : S-ATA
Temperature  : 31 °C
Health       : 100 %
Performance  : 100 %
Power on time: 1363 days, 20 hours
Est. lifetime: more than 461 days

HDD Device  9: /dev/sdj
HDD Model ID : Hitachi HDS722020ALA330
HDD Serial No: JK1131YAG93KBV
HDD Revision : JKAOA20N
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 33 °C
Health       : 100 %
Performance  : 100 %
Power on time: 97 days, 5 hours
Est. lifetime: more than 1000 days

 

It looks like it considers all my drives good.

Link to comment

For another comparison I have another tower (TestTower) that I have been messing with and have been moving very very old drives in and out of.  I ran HDSentinel on that tower and here is the output I got back:

 

root@TestTower:~# /boot/hdsentinel 
Hard Disk Sentinel for LINUX console 0.03 (c) 2008-2009 [email protected]
Start with -r [reportfile] to save data to report, -h for help

Examining hard disk configuration ...

HDD Device  0: /dev/hda
HDD Model ID : WDC WD300EB-75CPF0
HDD Serial No: WD-WMAATA036162
HDD Revision : 06.04G06
HDD Size     : 33550 MB
Interface    : IDE/ATA
Temperature  : Unknown °C
Health       : 93 %
Performance  : 100 %
Power on time: 419 days, 3 hours
Est. lifetime: more than 1000 days

HDD Device  1: /dev/hdb
HDD Model ID : ST360015A
HDD Serial No: 3KC21QHW
HDD Revision : 3.33
HDD Size     : 57242 MB
Interface    : IDE/ATA
Temperature  : 35 °C
Health       : 100 %
Performance  : 100 %
Power on time: 919 days, 8 hours
Est. lifetime: more than 905 days

HDD Device  2: /dev/sda
HDD Model ID : WDC WD1200JD-75GBB0
HDD Serial No: WD-WMAET1372951
HDD Revision : 02.05D02
HDD Size     : 134110 MB
Interface    : S-ATA
Temperature  : 35 °C
Health       : 4 %
Performance  : 100 %
Power on time: 1215 days, 13 hours
Est. lifetime: 4 days

HDD Device  3: /dev/sdb
HDD Model ID : Verbatim STORE N GO
HDD Serial No: PMAP1234
HDD Revision : 5.00
HDD Size     : 7631 MB
Interface    : SCSI
Temperature  : Unknown °C
Health       : Unknown %
Performance  : Unknown %
Power on time: 
Est. lifetime: 

 

As you can see the one drive is basically dying.  I know this to begin with though I was hoping to save it so that I could at least use it in the this test machine so I had as close to a "complete" test system as possible.  I am really hoping that a 5.0 beta is coming soon but I realize there is more important things to catch up on. All the orders and now the inclusion of the Supermicro card to support to name a few.

Link to comment

For another comparison I have another tower (TestTower) that I have been messing with and have been moving very very old drives in and out of.  I ran HDSentinel on that tower and here is the output I got back:

 

HDD Device  2: /dev/sda
HDD Model ID : WDC WD1200JD-75GBB0
HDD Serial No: WD-WMAET1372951
HDD Revision : 02.05D02
HDD Size     : 134110 MB
Interface    : S-ATA
Temperature  : 35 °C
Health       : 4 %
Performance  : 100 %
Power on time: 1215 days, 13 hours
Est. lifetime: 4 days

 

As you can see the one drive is basically dying.  I know this to begin withclusion of the Supermicro card to support to name a few.

 

It's funny how it estimates its remaining lifetime to be 4 days. :)

 

Call us back in four days, will you?

 

Link to comment

For another comparison I have another tower (TestTower) that I have been messing with and have been moving very very old drives in and out of.  I ran HDSentinel on that tower and here is the output I got back:

 

HDD Device  2: /dev/sda
HDD Model ID : WDC WD1200JD-75GBB0
HDD Serial No: WD-WMAET1372951
HDD Revision : 02.05D02
HDD Size     : 134110 MB
Interface    : S-ATA
Temperature  : 35 °C
Health       : 4 %
Performance  : 100 %
Power on time: 1215 days, 13 hours
Est. lifetime: 4 days

 

As you can see the one drive is basically dying.  I know this to begin withclusion of the Supermicro card to support to name a few.

 

It's funny how it estimates its remaining lifetime to be 4 days. :)

 

Call us back in four days, will you?

 

 

It was close, and it may have lasted had I not tried to run prelcear on it.  I had it running, and it was taking its good sweet time.  The drive was so SOL that it was going on the 48 hour mark and was only done with 50% of the first step in preclear. I gave up on it and cancelled the preclear.  I am going to take it apart tonight to see what is inside the drive.

Link to comment

It was close, and it may have lasted had I not tried to run prelcear on it.  I had it running, and it was taking its good sweet time.  The drive was so SOL that it was going on the 48 hour mark and was only done with 50% of the first step in preclear. I gave up on it and cancelled the preclear.  I am going to take it apart tonight to see what is inside the drive.

 

Can you post the smartlogs on this drive. I would be interested in seeing how it calculates 4% health.

Link to comment

Can you post the smartlogs on this drive. I would be interested in seeing how it calculates 4% health.

 

Your wish is my command:

 

root@TestTower:~# smartctl -a -d ata /dev/sda
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar SE Serial ATA family
Device Model:     WDC WD1200JD-75GBB0
Serial Number:    WD-WMAET1372951
Firmware Version: 02.05D02
User Capacity:    120,000,000,000 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Mar  8 10:40:48 2010 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121)	The previous self-test completed having
				the read element of the test failed.
Total time to complete Offline 
data collection: 		 (3796) seconds.
Offline data collection
capabilities: 			 (0x79) SMART execute Offline immediate.
				No Auto Offline data collection support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  53) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   001   001   051    Pre-fail  Always   FAILING_NOW 31473
  3 Spin_Up_Time            0x0007   146   144   021    Pre-fail  Always       -       3241
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       388
  5 Reallocated_Sector_Ct   0x0033   187   187   140    Pre-fail  Always       -       195
  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   060   060   000    Old_age   Always       -       29218
10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       387
194 Temperature_Celsius     0x0022   107   253   000    Old_age   Always       -       43
196 Reallocated_Event_Count 0x0032   178   178   000    Old_age   Always       -       22
197 Current_Pending_Sector  0x0012   165   165   000    Old_age   Always       -       695
198 Offline_Uncorrectable   0x0012   176   176   000    Old_age   Always       -       485
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0009   134   155   051    Pre-fail  Offline      -       2136

SMART Error Log Version: 1
ATA Error Count: 27583 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 27583 occurred at disk power-on lifetime: 807 hours (33 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 01 00 00 e0  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 08 00 00   1d+09:48:57.450  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+09:48:57.450  NOP [Abort queued commands]
  00 00 ef 00 00 45 00 00   1d+09:48:57.450  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+09:48:57.450  NOP [Abort queued commands]

Error 27582 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 ed 1d 4a e5  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 08 00 00   1d+05:40:44.100  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+05:40:44.100  NOP [Abort queued commands]
  00 00 ef 00 00 44 00 00   1d+05:40:44.100  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+05:40:44.100  NOP [Abort queued commands]
  00 00 c8 00 00 08 00 00   1d+05:40:44.100  NOP [Abort queued commands]

Error 27581 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 ed 1d 4a e5  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 08 00 00   1d+05:40:42.000  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+05:40:42.000  NOP [Abort queued commands]
  00 00 ef 00 00 44 00 00   1d+05:40:42.000  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+05:40:42.000  NOP [Abort queued commands]
  00 00 c8 00 00 08 00 00   1d+05:40:42.000  NOP [Abort queued commands]

Error 27580 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 ed 1d 4a e5  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 08 00 00   1d+05:40:39.850  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+05:40:39.850  NOP [Abort queued commands]
  00 00 ef 00 00 44 00 00   1d+05:40:39.850  NOP [Abort queued commands]
  00 00 27 00 00 00 00 00   1d+05:40:39.850  NOP [Abort queued commands]
  05 00 4a 00 00 e8 1d 00   1d+05:40:39.850  [RESERVED]

Error 27579 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 ed 1d 4a e5  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 c8 00 00 08 00 00   1d+05:40:37.750  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+05:40:37.750  NOP [Abort queued commands]
  00 00 ef 00 00 44 00 00   1d+05:40:37.750  NOP [Abort queued commands]
  00 00 00 00 00 00 00 00   1d+05:40:37.750  NOP [Abort queued commands]
  00 00 c8 00 00 08 00 00   1d+05:40:37.750  NOP [Abort queued commands]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%       774         122951383
# 2  Short offline       Completed without error       00%       430         -
# 3  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment
  • 5 years later...

Yup, it's 5 years since the last post  :)

 

I'm running HDSentinel Enterprise to monitor SMART info for all my Windows machines and would like to monitor unraid drives too.

 

I'm struggling to install HDSentinel under unraid 5.0-rc12a and woudl be grateful for guidance.

 

Instructions at http://www.hdsentinel.com/hard_disk_sentinel_linux.php are terminal-based but suggest

chmod 755 ./HDS.zip

sudo ./HDS.zip

etc.

HDS.zip is in /boot, chmod works then running it gives 'cannot execute binary file'

So what am I missing please?

And how would I make it auto-start at boot time please?

 

Many thanks, Judy

Link to comment

1) Please update to a supported version of unRAID. You are running beta software, and the released version is readily available.

2) The file you downloaded needs to be decompressed. The .zip indicates a compressed file. This step is listed on the website you linked.

Link to comment

Thanks for the rapid reply.

The linked website says to 'double click to open and decompress it to any folder '

 

I tried that first under windows then copied the unzipped file to /boot

chmod 755 .HDS

sudo ./HDS

but get the same 'cannot execute binary file'.

 

And how does one arrange for it to auto-start at boot please?

Link to comment

Thanks for your help. My error was thinking unraid was 64-bit rather than 32-bit. It turns out that HDSentinel has execute bits set when copied over from Windows, and it runs fine from the USB key - a relief!

 

If someone could advise how to make HDSentinel start automatically, that would be cool.  :D

 

To see its output in HDSentinel server on a Windows machine, I need to find out how to make it talk on port 61230. Does unraid ship with a firewall active?

 

Sample output from an out of date unraid test server attached.

 

Cheers Judy

HDS.txt

Link to comment

Yup, it's 5 years since the last post  :)

A lot has happened in those 5 years. Many of the people who were on the thread back then are still around, and they are getting some of the same functionality by just using the tools built-in to the latest v6 beta and its webGUI. It's also 64-bit.
Link to comment
A lot has happened in those 5 years
Yes, I'll cut over to v6 eventually too :)

 

With 20+ machines / 70+ HDDs, I need an HDSentinel-based solution as it's tough to monitor all that, and the enterprise variant shows all drives on one screen.

 

HDS rocks by the way - surface testing shows soft sectors and their locations so I can watch the drives age and replace before they die. Two drives have soft cylinders so have repartitioned them to only use the good areas - not on unraid though!

Link to comment

HDSentinel is a great program. I use it in all 10 of my computers. There is a Pro Windows version, an Enterprise version (server), a Linux version, a Linux daemon version, a home version, and a free trial version that never expires (it just lacks some of the higher order functions). The guy who runs the company and develops the program, Janos, is a gem of a human being. He is as valuable as the product itself. Finding nice, helpful technical experts who will actually talk with you, is just refreshing. Even back when I was only using the trial version, Janos was more than happy to help me problem solve. He is why I went so strongly with HDS, that and the program itself and it's capabilities. And there are others here who feel the same way. As JudyZ's comments will attest -- I liked Janos' HDS program so much that I installed it on my Linux computers too. But Installation and use was strictly manual, command-line stuff in Linux. And it overwrites the report each time you run it, unless you manually rename the old report before you run HDS again. So I wrote a couple simple scripts, to help automate the installation, put launchers in a couple logical places, and give you an automated way to run the program, generate a new report, and save it automatically, with the date/time in the filename, thereby building a library of searchable reports.  HDSentinel is a gem, it does a great many basic tasks really well. Plus it helps you avoid catastrophic failures, premature retirements of drives, and gives you options for reclaiming or salvaging a disk with issues.

 

http://www.hdsentinel.com/  -- general info and links

http://www.hdsentinel.com/add-on-linux-installers.php -- the program bundled with my installers and scripts

 

Now Janos, Judy and I are working on a way to include the HDS Daemon on an unRAID USB drive, have it run  automatically at startup, and have its output be monitored by the Enterprise (server) version of HDS either in Docker or remotely.

Link to comment

What I am about to describe might be purely a coincidence, but it does not feel like it.

 

I ran HDSentinal on my 5.0.6 box and afterwards a disk showed as "disabled" in the devices list.  Stopping and restarting the array did not fix it, nor did rebooting.

 

Here is the complete output from running HDSentinal - you can see that it could not read temp data from /dev/hdh the first time, and the second time it failed to read any data at all:

 

root@BIGBOX:/mnt/cache/sentintel# ./HDSentinel
Hard Disk Sentinel for LINUX console 0.08 (c) 2008-2011 [email protected]
Start with -r [reportfile] to save data to report, -h for help

Examining hard disk configuration ...

HDD Device  0: /dev/sda
HDD Model ID : SanDisk Cruzer Edge
HDD Serial No: 20051740710F5C01ADC9
HDD Revision : 1.26
HDD Size     : 7633 MB
Interface    : SCSI
Temperature  : Unknown °C
Highest Temp.: Unknown °C
Health       : Unknown %
Performance  : Unknown %
Power on time:
Est. lifetime:

HDD Device  1: /dev/sdb
HDD Model ID : ST3000DM001-9YN166
HDD Serial No: W1F101TJ
HDD Revision : CC9F
HDD Size     : 2861588 MB
Interface    : S-ATA II
Temperature  : 26 °C
Highest Temp.: 48 °C
Health       : 100 %
Performance  : 100 %
Power on time: 565 days, 15 hours
Est. lifetime: more than 1000 days

HDD Device  2: /dev/sdc
HDD Model ID : WDC WD20EARX-00MMMB0
HDD Serial No: WD-WCAWZ2783588
HDD Revision : 80.00A80
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 25 °C
Highest Temp.: 48 °C
Health       : 100 %
Performance  : 100 %
Power on time: 637 days, 9 hours
Est. lifetime: more than 1000 days

HDD Device  3: /dev/sdd
HDD Model ID : SAMSUNG HD204UI
HDD Serial No: S2H7J1BZB29336
HDD Revision : 1AQ10001
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 24 °C
Highest Temp.: 45 °C
Health       : 100 %
Performance  : 100 %
Power on time: 1154 days, 14 hours
Est. lifetime: more than 670 days

HDD Device  4: /dev/sde
HDD Model ID : ST4000DM000-1F2168
HDD Serial No: S300PVJ0
HDD Revision : CC54
HDD Size     : 3815448 MB
Interface    : S-ATA II
Temperature  : 27 °C
Highest Temp.: 49 °C
Health       : 100 %
Performance  : 100 %
Power on time: 121 days, 2 hours
Est. lifetime: more than 1000 days

HDD Device  5: /dev/sdf
HDD Model ID : Samsung SSD 840 EVO 250GB
HDD Serial No: S1DBNSADA51208J
HDD Revision : EXT0BB0Q
HDD Size     : 238475 MB
Interface    : S-ATA II
Temperature  : 24 °C
Highest Temp.: 40 °C
Health       : 100 %
Performance  : 100 %
Power on time: 2 days, 12 hours, 40 minutes
Est. lifetime: more than 1000 days

HDD Device  6: /dev/sdg
HDD Model ID : Hitachi HDS5C3020ALA632
HDD Serial No: ML0220F30R1MXD
HDD Revision : ML6OA580
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 24 °C
Highest Temp.: 42 °C
Health       : 100 %
Performance  : 100 %
Power on time: 1108 days, 16 hours
Est. lifetime: more than 716 days

HDD Device  7: /dev/sdh
HDD Model ID : WDC WD20EADS-00R6B0
HDD Serial No: WD-WCAVY1187400
HDD Revision : 01.00A01
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : Unknown °C
Highest Temp.: Unknown °C
Health       : Unknown %
Performance  : Unknown %
Power on time:
Est. lifetime:

HDD Device  8: /dev/sdi
HDD Model ID : ST4000DM000-1F2168
HDD Serial No: S300N8RB
HDD Revision : CC54
HDD Size     : 3815448 MB
Interface    : S-ATA II
Temperature  : 22 °C
Highest Temp.: 49 °C
Health       : 100 %
Performance  : 100 %
Power on time: 121 days, 8 hours
Est. lifetime: more than 1000 days

HDD Device  9: /dev/sdj
HDD Model ID : ST4000DM000-1F2168
HDD Serial No: S3008WD3
HDD Revision : CC54
HDD Size     : 3815448 MB
Interface    : S-ATA II
Temperature  : 23 °C
Highest Temp.: 40 °C
Health       : 100 %
Performance  : 100 %
Power on time: 36 days, 10 hours
Est. lifetime: more than 1000 days

HDD Device 10: /dev/sdk
HDD Model ID : ST2000DL003-9VT166
HDD Serial No: 5YD6G1TX
HDD Revision : CC3C
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 24 °C
Highest Temp.: 43 °C
Health       : 100 %
Performance  : 100 %
Power on time: 1963 days, 0 hours
Est. lifetime: more than 100 days

HDD Device 11: /dev/sdl
HDD Model ID : WDC WD30EZRX-00D8PB0
HDD Serial No: WD-WCC4N0115460
HDD Revision : 80.00A80
HDD Size     : 2861588 MB
Interface    : S-ATA II
Temperature  : 24 °C
Highest Temp.: 42 °C
Health       : 100 %
Performance  : 100 %
Power on time: 315 days, 9 hours
Est. lifetime: more than 1000 days

HDD Device 12: /dev/sdm
HDD Model ID : ST3000DM001-9YN166
HDD Serial No: W1F1A2J9
HDD Revision : CC9F
HDD Size     : 2861588 MB
Interface    : S-ATA II
Temperature  : 24 °C
Highest Temp.: 48 °C
Health       : 100 %
Performance  : 100 %
Power on time: 513 days, 17 hours
Est. lifetime: more than 1000 days

HDD Device 13: /dev/sdn
HDD Model ID : TOSHIBA DT01ACA300
HDD Serial No: 531SBGZGS
HDD Revision : MX6OABB0
HDD Size     : 2861588 MB
Interface    : S-ATA II
Temperature  : 27 °C
Highest Temp.: 47 °C
Health       : 100 %
Performance  : 100 %
Power on time: 394 days, 1 hours
Est. lifetime: more than 1000 days


root@BIGBOX:/mnt/cache/sentintel# ./HDSentinel
Hard Disk Sentinel for LINUX console 0.08 (c) 2008-2011 [email protected]
Start with -r [reportfile] to save data to report, -h for help

Examining hard disk configuration ...

HDD Device  0: /dev/sda
HDD Model ID : SanDisk Cruzer Edge
HDD Serial No: 20051740710F5C01ADC9
HDD Revision : 1.26
HDD Size     : 7633 MB
Interface    : SCSI
Temperature  : Unknown °C
Highest Temp.: Unknown °C
Health       : Unknown %
Performance  : Unknown %
Power on time:
Est. lifetime:

HDD Device  1: /dev/sdb
HDD Model ID : ST3000DM001-9YN166
HDD Serial No: W1F101TJ
HDD Revision : CC9F
HDD Size     : 2861588 MB
Interface    : S-ATA II
Temperature  : 28 °C
Highest Temp.: 48 °C
Health       : 100 %
Performance  : 100 %
Power on time: 565 days, 15 hours
Est. lifetime: more than 1000 days

HDD Device  2: /dev/sdc
HDD Model ID : WDC WD20EARX-00MMMB0
HDD Serial No: WD-WCAWZ2783588
HDD Revision : 80.00A80
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 27 °C
Highest Temp.: 48 °C
Health       : 100 %
Performance  : 100 %
Power on time: 637 days, 9 hours
Est. lifetime: more than 1000 days

HDD Device  3: /dev/sdd
HDD Model ID : SAMSUNG HD204UI
HDD Serial No: S2H7J1BZB29336
HDD Revision : 1AQ10001
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 25 °C
Highest Temp.: 45 °C
Health       : 100 %
Performance  : 100 %
Power on time: 1154 days, 14 hours
Est. lifetime: more than 670 days

HDD Device  4: /dev/sde
HDD Model ID : ST4000DM000-1F2168
HDD Serial No: S300PVJ0
HDD Revision : CC54
HDD Size     : 3815448 MB
Interface    : S-ATA II
Temperature  : 27 °C
Highest Temp.: 49 °C
Health       : 100 %
Performance  : 100 %
Power on time: 121 days, 2 hours
Est. lifetime: more than 1000 days

HDD Device  5: /dev/sdf
HDD Model ID : Samsung SSD 840 EVO 250GB
HDD Serial No: S1DBNSADA51208J
HDD Revision : EXT0BB0Q
HDD Size     : 238475 MB
Interface    : S-ATA II
Temperature  : 28 °C
Highest Temp.: 40 °C
Health       : 100 %
Performance  : 100 %
Power on time: 2 days, 12 hours, 40 minutes
Est. lifetime: more than 1000 days

HDD Device  6: /dev/sdg
HDD Model ID : Hitachi HDS5C3020ALA632
HDD Serial No: ML0220F30R1MXD
HDD Revision : ML6OA580
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 25 °C
Highest Temp.: 42 °C
Health       : 100 %
Performance  : 100 %
Power on time: 1108 days, 16 hours
Est. lifetime: more than 716 days

HDD Device  7: /dev/sdh
HDD Model ID : ?
HDD Serial No: ?
HDD Revision : ?
HDD Size     : 0 MB
Interface    : SCSI
Temperature  : Unknown °C
Highest Temp.: Unknown °C
Health       : Unknown %
Performance  : Unknown %
Power on time:
Est. lifetime:

HDD Device  8: /dev/sdi
HDD Model ID : ST4000DM000-1F2168
HDD Serial No: S300N8RB
HDD Revision : CC54
HDD Size     : 3815448 MB
Interface    : S-ATA II
Temperature  : 23 °C
Highest Temp.: 49 °C
Health       : 100 %
Performance  : 100 %
Power on time: 121 days, 8 hours
Est. lifetime: more than 1000 days

HDD Device  9: /dev/sdj
HDD Model ID : ST4000DM000-1F2168
HDD Serial No: S3008WD3
HDD Revision : CC54
HDD Size     : 3815448 MB
Interface    : S-ATA II
Temperature  : 24 °C
Highest Temp.: 40 °C
Health       : 100 %
Performance  : 100 %
Power on time: 36 days, 10 hours
Est. lifetime: more than 1000 days

HDD Device 10: /dev/sdk
HDD Model ID : ST2000DL003-9VT166
HDD Serial No: 5YD6G1TX
HDD Revision : CC3C
HDD Size     : 1907729 MB
Interface    : S-ATA II
Temperature  : 25 °C
Highest Temp.: 43 °C
Health       : 100 %
Performance  : 100 %
Power on time: 1963 days, 0 hours
Est. lifetime: more than 100 days

HDD Device 11: /dev/sdl
HDD Model ID : WDC WD30EZRX-00D8PB0
HDD Serial No: WD-WCC4N0115460
HDD Revision : 80.00A80
HDD Size     : 2861588 MB
Interface    : S-ATA II
Temperature  : 24 °C
Highest Temp.: 42 °C
Health       : 100 %
Performance  : 100 %
Power on time: 315 days, 9 hours
Est. lifetime: more than 1000 days

HDD Device 12: /dev/sdm
HDD Model ID : ST3000DM001-9YN166
HDD Serial No: W1F1A2J9
HDD Revision : CC9F
HDD Size     : 2861588 MB
Interface    : S-ATA II
Temperature  : 25 °C
Highest Temp.: 48 °C
Health       : 100 %
Performance  : 100 %
Power on time: 513 days, 17 hours
Est. lifetime: more than 1000 days

HDD Device 13: /dev/sdn
HDD Model ID : TOSHIBA DT01ACA300
HDD Serial No: 531SBGZGS
HDD Revision : MX6OABB0
HDD Size     : 2861588 MB
Interface    : S-ATA II
Temperature  : 28 °C
Highest Temp.: 47 °C
Health       : 100 %
Performance  : 100 %
Power on time: 394 days, 1 hours
Est. lifetime: more than 1000 days


root@BIGBOX:/mnt/cache/sentintel# ./HDSentinel -solid
/dev/sda  ?   ?     ? SanDisk_Cruzer_Edge       20051740710F5C01ADC9    7633
/dev/sdb 29 100 13575 ST3000DM001-9YN166        W1F101TJ             2861588
/dev/sdc 27 100 15297 WDC_WD20EARX-00MMMB0      WD-WCAWZ2783588      1907729
/dev/sdd 25 100 27710 SAMSUNG_HD204UI           S2H7J1BZB29336       1907729
/dev/sde 27 100  2906 ST4000DM000-1F2168        S300PVJ0             3815448
/dev/sdf 29 100    60 Samsung_SSD_840_EVO_250GB S1DBNSADA51208J       238475
/dev/sdg 25 100 26608 Hitachi_HDS5C3020ALA632   ML0220F30R1MXD       1907729
/dev/sdh  ?   ?     ? ?                         ?                          0
/dev/sdi 24 100  2912 ST4000DM000-1F2168        S300N8RB             3815448
/dev/sdj 24 100   874 ST4000DM000-1F2168        S3008WD3             3815448
/dev/sdk 26 100 47112 ST2000DL003-9VT166        5YD6G1TX             1907729
/dev/sdl 25 100  7569 WDC_WD30EZRX-00D8PB0      WD-WCC4N0115460      2861588
/dev/sdm 26 100 12329 ST3000DM001-9YN166        W1F1A2J9             2861588
/dev/sdn 29 100  9457 TOSHIBA_DT01ACA300        531SBGZGS            2861588
root@BIGBOX:/mnt/cache/sentintel# cd ..
root@BIGBOX:/mnt/cache# cd ..
root@BIGBOX:/mnt# cd ..
root@BIGBOX:/#
Broadcast message from root (Fri Mar 13 21:46:41 2015):

The system is going down for reboot NOW!

Link to comment

I have removed the drive and connected it to a Windows PC - it powers up, partitions are still there, and I can access the data using RFSTool - so basically the drive seems fine.

 

However it refuses to work in unRAID.  It powers up and is detected but the array config page says it is disabled.  The temperature is also not displayed.

 

There is no option to rebuilt data to the drive, so essentially it now seems to be useless in unRAID.

 

Edit:

 

Followed the procedure at http://lime-technology.com/wiki/index.php/Troubleshooting#Re-enable_the_drive

 

Now the data is being rebuilt.  But this seems silly, since the data was still OK!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.