HDSentinel hard drive monitoring tool


queeg

Recommended Posts

I've sent a link to this thread, to Janos, and asked him to join the discussion. Hopefully he'll be able to make some time to get on this forum. But I also know he works like a madman in his secret laboratory in Hungary :), so it may take him some time to get here.   

 

I'll leave the problem solving to him and the unRAID gurus.

 

But if I were having this issue, I'd immediately suspect a cable problem. I know Sata cable problems will cause all sorts of weird results and transient or progressive issues. I also know cables that look fine, have worked fine for ages, and seem to be well seated, can start having issues.

 

Given that it sounds like unRAID took the drive out of service "unRAID does not take a disk out of service casually, but if a disk experiences a write failure, it will do exactly that, it will take the disk out of service" I'd think you had a write problem as well as the HDS detection/data collection problems. Add to that the fact that after removing the drive and connecting it to another computer (and thereby exercising/changing the cables), the problem seems to have gone away (well other than the fact that the drive was OOS in unRAID, which you were able to correct). And all of this points me to a cable issue. But I may well be completely off the mark here, which is why I asked Janos to chime in.

 

Normally my HDS reports include a lot more data than I'm seeing with yours flaggart. Did you edit the reports or are you not getting any additional data?

Link to comment

The console log is verbatim.

 

With regards to having removed the drive, connected to another PC etc, I think this step was totally unnecessary.  I was unaware of the procedure to re-enable a disabled device. 

 

The drive red-balled at the exact same moment as I ran HDSentinal so it would be an amazing coincidence if a cable became loose!

 

Data has finished rebuilding and all seems OK.

 

Here is the syslog in case anyone is interested:

 

Mar 13 21:34:00 BIGBOX in.telnetd[24059]: connect from 192.168.0.10 (192.168.0.10)
Mar 13 21:34:03 BIGBOX login[24060]: ROOT LOGIN  on '/dev/pts/0' from 'FRACTAL.home.net'
Mar 13 21:34:53 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sde drive state is: active/idle
Mar 13 21:35:54 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sde drive state is: active/idle
Mar 13 21:36:58 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle
Mar 13 21:37:17 BIGBOX kernel: sd 2:0:1:0: command f7250600 timed out
Mar 13 21:37:17 BIGBOX kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
Mar 13 21:37:17 BIGBOX kernel: sas: trying to find task 0xe2b1d900
Mar 13 21:37:17 BIGBOX kernel: sas: sas_scsi_find_task: aborting task 0xe2b1d900
Mar 13 21:37:17 BIGBOX kernel: sas: sas_scsi_find_task: task 0xe2b1d900 is aborted
Mar 13 21:37:17 BIGBOX kernel: sas: sas_eh_handle_sas_errors: task 0xe2b1d900 is aborted
Mar 13 21:37:17 BIGBOX kernel: sas: ata8: end_device-2:1: cmd error handler
Mar 13 21:37:17 BIGBOX kernel: sas: ata7: end_device-2:0: dev error handler
Mar 13 21:37:17 BIGBOX kernel: sas: ata8: end_device-2:1: dev error handler
Mar 13 21:37:17 BIGBOX kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 13 21:37:17 BIGBOX kernel: ata8.00: failed command: SMART
Mar 13 21:37:17 BIGBOX kernel: ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0
Mar 13 21:37:17 BIGBOX kernel:          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 13 21:37:17 BIGBOX kernel: ata8.00: status: { DRDY }
Mar 13 21:37:17 BIGBOX kernel: ata8: hard resetting link
Mar 13 21:37:17 BIGBOX kernel: sas: ata9: end_device-2:2: dev error handler
Mar 13 21:37:17 BIGBOX kernel: sas: ata10: end_device-2:3: dev error handler
Mar 13 21:37:17 BIGBOX kernel: sas: ata11: end_device-2:4: dev error handler
Mar 13 21:37:17 BIGBOX kernel: sas: ata12: end_device-2:5: dev error handler
Mar 13 21:37:17 BIGBOX kernel: sas: ata13: end_device-2:6: dev error handler
Mar 13 21:37:17 BIGBOX kernel: sas: ata14: end_device-2:7: dev error handler
Mar 13 21:37:17 BIGBOX kernel: sas: sas_form_port: phy1 belongs to port1 already(1)!
Mar 13 21:37:19 BIGBOX kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[1]:rc= 0
Mar 13 21:37:25 BIGBOX kernel: ata8.00: qc timeout (cmd 0x27)
Mar 13 21:37:25 BIGBOX kernel: ata8.00: failed to read native max address (err_mask=0x4)
Mar 13 21:37:25 BIGBOX kernel: ata8.00: HPA support seems broken, skipping HPA handling
Mar 13 21:37:25 BIGBOX kernel: ata8.00: revalidation failed (errno=-5)
Mar 13 21:37:25 BIGBOX kernel: ata8: hard resetting link
Mar 13 21:37:25 BIGBOX kernel: sas: sas_form_port: phy1 belongs to port1 already(1)!
Mar 13 21:37:27 BIGBOX kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[1]:rc= 0
Mar 13 21:37:32 BIGBOX kernel: ata8.00: qc timeout (cmd 0xef)
Mar 13 21:37:32 BIGBOX kernel: ata8.00: failed to set xfermode (err_mask=0x4)
Mar 13 21:37:32 BIGBOX kernel: ata8.00: limiting speed to UDMA/133:PIO3
Mar 13 21:37:32 BIGBOX kernel: ata8: hard resetting link
Mar 13 21:37:32 BIGBOX kernel: sas: sas_form_port: phy1 belongs to port1 already(1)!
Mar 13 21:37:34 BIGBOX kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[1]:rc= 0
Mar 13 21:37:44 BIGBOX kernel: ata8.00: qc timeout (cmd 0xef)
Mar 13 21:37:44 BIGBOX kernel: ata8.00: failed to set xfermode (err_mask=0x4)
Mar 13 21:37:44 BIGBOX kernel: ata8.00: disabled
Mar 13 21:37:44 BIGBOX kernel: ata8: hard resetting link
Mar 13 21:37:45 BIGBOX kernel: sas: sas_form_port: phy1 belongs to port1 already(1)!
Mar 13 21:37:47 BIGBOX kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[1]:rc= 0
Mar 13 21:37:47 BIGBOX kernel: ata8: EH complete
Mar 13 21:37:47 BIGBOX kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] READ CAPACITY(16) failed
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh]  
Mar 13 21:37:47 BIGBOX kernel: Result: hostbyte=0x04 driverbyte=0x00
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Sense not available.
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] READ CAPACITY failed
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh]  
Mar 13 21:37:47 BIGBOX kernel: Result: hostbyte=0x04 driverbyte=0x00
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Sense not available.
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Truncating mode parameter data from 3330 to 512 bytes
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Got wrong page
Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Assuming drive cache: write through
Mar 13 21:37:47 BIGBOX kernel: sdh: detected capacity change from 2000398934016 to 0
Mar 13 21:37:48 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Mar 13 21:38:03 BIGBOX last message repeated 14 times
Mar 13 21:38:10 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle
Mar 13 21:38:11 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Mar 13 21:38:56 BIGBOX last message repeated 32 times
Mar 13 21:38:59 BIGBOX last message repeated 9 times
Mar 13 21:39:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle
Mar 13 21:40:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle
Mar 13 21:41:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle
Mar 13 21:42:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle
Mar 13 21:43:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle
Mar 13 21:43:18 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Mar 13 21:43:23 BIGBOX last message repeated 46 times
Mar 13 21:43:38 BIGBOX kernel: md: disk10 read error, sector=1871085816
Mar 13 21:43:39 BIGBOX kernel: md: disk10 write error, sector=1871085816
Mar 13 21:43:39 BIGBOX kernel: md: recovery thread woken up ...
Mar 13 21:43:39 BIGBOX kernel: md: recovery thread has nothing to resync
Mar 13 21:44:06 BIGBOX emhttp: shcmd (69): /usr/local/sbin/emhttp_event stopping_svcs
Mar 13 21:44:06 BIGBOX kernel: mdcmd (88): nocheck 
Mar 13 21:44:06 BIGBOX kernel: md: nocheck_array: check not active

Link to comment

Okay, I believe the HDS report you posted is what one sees in the terminal window after HDS has finished. It is not the report it would generate and save if you started HDS with the -r switch. The report you're posting is a basic report. The saved report includes a lot more data. Here is a saved report from my old Linux machine so you'll see what I mean (I've only included 1 drive because it was so long).

 

  -- General Information --

   Application Information
   -----------------------
    Installed Version . . . . . . . . . . . . . . . .  Hard Disk Sentinel 0.08
    Current Date And Time . . . . . . . . . . . . . .  20-1-14 20:05:15

   Computer Information
   --------------------
    Computer Name . . . . . . . . . . . . . . . . . .  zorin7-tower
    MAC Address . . . . . . . . . . . . . . . . . . .  90:E6:BA:CD:F9:CD

   System Information
   ------------------
    OS Version. . . . . . . . . . . . . . . . . . . .  Linux : 3.8.0-35-generic (#50-Ubuntu SMP Tue Dec 3 01:24:59 UTC 2013)
    Process ID. . . . . . . . . . . . . . . . . . . .  6726
    Uptime. . . . . . . . . . . . . . . . . . . . . .  162183 sec (1 days, 21 hours, 3 min, 3 sec)



  -- Physical Disk Information - Disk: #0: Hitachi HUA721010KLA330 --

   Hard Disk Summary
   -----------------
    Hard Disk Number. . . . . . . . . . . . . . . . .  0
    Hard Disk Device. . . . . . . . . . . . . . . . .  /dev/sda
    Interface . . . . . . . . . . . . . . . . . . . .  S-ATA
    Hard Disk Model ID. . . . . . . . . . . . . . . .  Hitachi HUA721010KLA330
    Hard Disk Revision. . . . . . . . . . . . . . . .  GKAOAB0A
    Hard Disk Serial Number . . . . . . . . . . . . .  GTF002PBJDDAPF
    Hard Disk Total Size. . . . . . . . . . . . . . .  953870 MB
    Current Temperature . . . . . . . . . . . . . . .  29 °C (84 °F)
    Maximum Temperature (during Entire Lifespan). . .  47 °C (117 °F)
    Power On Time . . . . . . . . . . . . . . . . . .  311 days, 8 hours
    Estimated Remaining Lifetime. . . . . . . . . . .  more than 1000 days
    Health. . . . . . . . . . . . . . . . . . . . . .  #################### 100 % (Excellent)
    Performance . . . . . . . . . . . . . . . . . . .  #################### 100 % (Excellent)

    The hard disk status is PERFECT. Problematic or weak sectors not found and there are no spin up or data transfer errors. 
      No actions needed.

   ATA Information
   ---------------
    Hard Disk Cylinders . . . . . . . . . . . . . . .  1938021
    Hard Disk Heads . . . . . . . . . . . . . . . . .  16
    Hard Disk Sectors . . . . . . . . . . . . . . . .  63
    Total Sectors . . . . . . . . . . . . . . . . . .  1953525168
    ATA Revision. . . . . . . . . . . . . . . . . . .  7
    Bytes Per Sector. . . . . . . . . . . . . . . . .  512
    Buffer Size . . . . . . . . . . . . . . . . . . .  31157 KB
    Multiple Sectors. . . . . . . . . . . . . . . . .  16
    Error Correction Bytes. . . . . . . . . . . . . .  52
    Unformatted Capacity. . . . . . . . . . . . . . .  953870 MB
    Maximum PIO Mode. . . . . . . . . . . . . . . . .  4
    Maximum Multiword DMA Mode. . . . . . . . . . . .  2
    Maximum UDMA Mode . . . . . . . . . . . . . . . .  150 MB/s (6)
    Active UDMA Mode. . . . . . . . . . . . . . . . .  150 MB/s (6)
    Minimum Multiword DMA Transfer Time . . . . . . .  120 ns
    Recommended Multiword DMA Transfer Time . . . . .  120 ns
    Minimum PIO Transfer Time Without IORDY . . . . .  120 ns
    Minimum PIO Transfer Time With IORDY. . . . . . .  120 ns
    ATA Control Byte. . . . . . . . . . . . . . . . .  Valid
    ATA Checksum Value. . . . . . . . . . . . . . . .  Valid

   Acoustic Management Configuration
   ---------------------------------
    Acoustic Management . . . . . . . . . . . . . . .  Supported
    Acoustic Management . . . . . . . . . . . . . . .  Enabled
    Current Acoustic Level. . . . . . . . . . . . . .  Max performance and volume (FEh)
    Recommended Acoustic Level. . . . . . . . . . . .  Min performance and volume (80h)

   EIDE Properties
   ---------------
    Read Ahead Buffer . . . . . . . . . . . . . . . .  Supported
    DMA . . . . . . . . . . . . . . . . . . . . . . .  Supported
    Ultra DMA . . . . . . . . . . . . . . . . . . . .  Supported
    S.M.A.R.T.. . . . . . . . . . . . . . . . . . . .  Supported
    Power Management. . . . . . . . . . . . . . . . .  Supported
    Write Cache . . . . . . . . . . . . . . . . . . .  Supported
    Host Protected Area . . . . . . . . . . . . . . .  Supported
    Advanced Power Management . . . . . . . . . . . .  Supported
    Power Up In Standby . . . . . . . . . . . . . . .  Supported
    48-bit LBA Addressing . . . . . . . . . . . . . .  Supported
    Device Configuration Overlay. . . . . . . . . . .  Supported
    IORDY Support . . . . . . . . . . . . . . . . . .  Supported
    Read/Write DMA Queue. . . . . . . . . . . . . . .  Not supported
    NOP Command . . . . . . . . . . . . . . . . . . .  Not supported
    Trusted Computing . . . . . . . . . . . . . . . .  Not supported
    64-bit World Wide ID. . . . . . . . . . . . . . .  0050A2CCE11ECDD1
    Streaming . . . . . . . . . . . . . . . . . . . .  Supported
    Media Card Pass Through . . . . . . . . . . . . .  Not supported
    General Purpose Logging . . . . . . . . . . . . .  Supported
    Error Logging . . . . . . . . . . . . . . . . . .  Supported
    CFA Feature Set . . . . . . . . . . . . . . . . .  Not supported
    Long Physical Sectors (1) . . . . . . . . . . . .  Not supported
    Long Logical Sectors. . . . . . . . . . . . . . .  Not supported
    Write-Read-Verify . . . . . . . . . . . . . . . .  Not supported
    NV Cache Feature. . . . . . . . . . . . . . . . .  Not supported
    NV Cache Power Mode . . . . . . . . . . . . . . .  Not supported
    NV Cache Size . . . . . . . . . . . . . . . . . .  Not supported
    Free-fall Control . . . . . . . . . . . . . . . .  Not supported
    Free-fall Control Sensitivity . . . . . . . . . .  Not supported

   SSD Features
   ------------
    Data Set Management . . . . . . . . . . . . . . .  Not supported
    TRIM Command. . . . . . . . . . . . . . . . . . .  Not supported
    Deterministic Read After TRIM . . . . . . . . . .  Not supported

   S.M.A.R.T. Details
   ------------------
    Off-line Data Collection Status . . . . . . . . .  Successfully Completed
    Self Test Execution Status. . . . . . . . . . . .  Successfully Completed
    Total Time To Complete Off-line Data Collection .  15354 seconds
    Execute Off-line Immediate. . . . . . . . . . . .  Supported
    Abort/restart Off-line By Host. . . . . . . . . .  Not supported
    Off-line Read Scanning. . . . . . . . . . . . . .  Supported
    Short Self-test . . . . . . . . . . . . . . . . .  Supported
    Extended Self-test. . . . . . . . . . . . . . . .  Supported
    Conveyance Self-test. . . . . . . . . . . . . . .  Not supported
    Selective Self-Test . . . . . . . . . . . . . . .  Supported
    Save Data Before/After Power Saving Mode. . . . .  Supported
    Enable/Disable Attribute Autosave . . . . . . . .  Supported
    Error Logging Capability. . . . . . . . . . . . .  Supported
    Short Self-test Estimated Time. . . . . . . . . .  1 minutes

   Security Mode
   -------------
    Security Mode . . . . . . . . . . . . . . . . . .  Supported
    Security Erase. . . . . . . . . . . . . . . . . .  Supported
    Security Erase Time . . . . . . . . . . . . . . .  170 minutes
    Security Enhanced Erase Feature . . . . . . . . .  Not supported
    Security Enhanced Erase Time. . . . . . . . . . .  Not supported
    Security Enabled. . . . . . . . . . . . . . . . .  No
    Security Locked . . . . . . . . . . . . . . . . .  No
    Security Frozen . . . . . . . . . . . . . . . . .  Yes
    Security Counter Expired. . . . . . . . . . . . .  No
    Security Level. . . . . . . . . . . . . . . . . .  High

   Serial ATA Features
   -------------------
    S-ATA Compliance. . . . . . . . . . . . . . . . .  Yes
    S-ATA I Signaling Speed (1.5 Gps) . . . . . . . .  Supported
    S-ATA II Signaling Speed (3 Gps). . . . . . . . .  Not supported
    Receipt Of Power Management Requests From Host. .  Supported
    PHY Event Counters. . . . . . . . . . . . . . . .  Supported
    Non-Zero Buffer Offsets In DMA Setup FIS. . . . .  Supported, Disabled
    DMA Setup Auto-Activate Optimization. . . . . . .  Supported, Enabled
    Device Initiating Interface Power Management. . .  Supported, Disabled
    In-Order Data Delivery. . . . . . . . . . . . . .  Supported, Disabled
    Asynchronous Notification . . . . . . . . . . . .  Not supported
    Software Settings Preservation. . . . . . . . . .  Supported, Enabled
    Native Command Queuing (NCQ). . . . . . . . . . .  Supported
    Queue Length. . . . . . . . . . . . . . . . . . .  32

   S.M.A.R.T.
   ----------
No.  Attribute                Thre.. Value  Worst  Data                Status                   Flags                                                  
1    Raw Read Error Rate      16     100    100    000000000000        OK                       Error-Rate, Statistical, Critical
2    Throughput Performance   54     130    130    000000000096        OK                       Performance, Critical
3    Spin Up Time             24     148    148    000701E801C6        OK                       Performance, Statistical, Critical
4    Start/Stop Count         0      100    100    0000000003EC        OK (Always passing)      Event Count, Statistical
5    Reallocated Sectors Co.. 5      100    100    000000000000        OK                       Self Preserving, Event Count, Statistical, Critical
7    Seek Error Rate          67     100    100    000000000000        OK                       Error-Rate, Statistical, Critical
8    Seek Time Performance    20     132    132    000000000021        OK                       Performance, Critical
9    Power On Time Count      0      99     99     000000001D30        OK (Always passing)      Event Count, Statistical
10   Spin Retry Count         60     100    100    000000000000        OK                       Event Count, Statistical, Critical
12   Drive Power Cycle Count  0      100    100    0000000000CF        OK (Always passing)      Self Preserving, Event Count, Statistical
192  Power off Retract Cycl.. 0      100    100    0000000004A3        OK (Always passing)      Self Preserving, Event Count, Statistical
193  Load/Unload Cycle Count  0      100    100    0000000004A3        OK (Always passing)      Event Count, Statistical
194  Disk Temperature         0      206    206    002F000E001D        OK (Always passing)      Statistical
196  Reallocation Event Count 0      100    100    000000000000        OK (Always passing)      Self Preserving, Event Count, Statistical
197  Current Pending Sector.. 0      100    100    000000000000        OK (Always passing)      Self Preserving, Statistical
198  Off-Line Uncorrectable.. 0      100    100    000000000000        OK (Always passing)      Error-Rate
199  Ultra ATA CRC Error Co.. 0      200    200    000000000000        OK (Always passing)      Error-Rate, Statistical

  -- Partition Information --

Logical Drive                           Total Space         Free Space          Free Space               Used Space
    <Partition Drive="/" Total_Space="237,886 MB" Free_Space="180,904 MB" Free_Space_Percent=" 76 %" Disk="/" BlockSize="4096" Files="15482880" FileSystem="61267" />
    <Partition Drive="/media/BA728B3C728AFBFF (Disk #2)" Total_Space="703,766 MB" Free_Space="501,427 MB" Free_Space_Percent=" 71 %" Disk="/dev/sdc2" BlockSize="4096" Files="513756292" FileSystem="1702057286" />


 

And yes it would seem unlikely that a cable issue would suddenly develop just as you ran HDS. As I said that would just be my first guess as cable/communicaiton problems are fairly common and do suddenly pop up sometimes. Plus it did fit the parameters of how your issues presented. Is drive 10 is your last syslog the same drive, port, or cabling as the troubled drive/port/cable from your first post? If so, that drive/port/cable is definitely having problems.

 

If Janos does not manage to get on this forum, I suggest you contact him about this. One of the great things about HDS is Janos' support. 

Link to comment

I ran an extended SMART test from unRAID gui on drive 10.  Not sure if there is a log for that, but the GUI reports no errors:

 

Num Test Description Status Remaining LifeTime(hours) LBA of first error

1 Extended offline Completed without error 00% 16812 None

2 Short offline Completed without error 00% 16800 None

3 Short captive Completed without error 00% 24 None

Link to comment

I ran an extended SMART test from unRAID gui on drive 10.  Not sure if there is a log for that, but the GUI reports no errors:

 

Num  Test Description  Status  Remaining  LifeTime(hours)  LBA of first error

1  Extended offline  Completed without error  00%  16812  None

2  Short offline  Completed without error  00%  16800  None

3  Short captive  Completed without error  00%  24  None

 

 

Post a full smart report, let's see if there are pending or re-allocated sectors.

There could have been some kind of controller issue, invalid ioctl that caused grief for short period.

Link to comment
root@BIGBOX:~# smartctl -a -A /dev/sdh
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD20EADS-00R6B0
Serial Number:    WD-WCAVY1187400
LU WWN Device Id: 5 0014ee 2ae3ff9f8
Firmware Version: 01.00A01
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sun Mar 15 12:51:26 2015 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (41580) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 473) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   149   147   021    Pre-fail  Always       -       9508
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1637
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       16822
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       481
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       75
193 Load_Cycle_Count        0x0032   194   194   000    Old_age   Always       -       18501
194 Temperature_Celsius     0x0022   121   102   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     16812         -
# 2  Short offline       Completed without error       00%     16800         -
# 3  Short captive       Completed without error       00%        24         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

Drive looks good.

This one of those older WD EADS drives. I had a number of them start to show weak sectors as they aged.

I would suggest a periodic routine of smart long tests so you are informed early.

 

I would suggest double checking the cables and make sure everything is tight.

I've had issues in years gone by where they would creep out over time due to vibrations.

 

Something happened and the controller and/or drive reset.

 

Given that there are two programs accessing the smart data, HDSentinel and smartctl, one could have caused conflict with the other at a crucial time.

 

There's an awful lot of 'program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO'

 

Mar 13 21:37:48 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Mar 13 21:38:03 BIGBOX last message repeated 14 times

...

Mar 13 21:38:11 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Mar 13 21:38:56 BIGBOX last message repeated 32 times

Mar 13 21:38:59 BIGBOX last message repeated 9 times

...

 

Mar 13 21:43:18 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Mar 13 21:43:23 BIGBOX last message repeated 46 times

Link to comment
  • 4 months later...

HDSentinel is a great program. I use it in all 10 of my computers. There is a Pro Windows version, an Enterprise version (server), a Linux version, a Linux daemon version, a home version, and a free trial version that never expires (it just lacks some of the higher order functions). The guy who runs the company and develops the program, Janos, is a gem of a human being. He is as valuable as the product itself. Finding nice, helpful technical experts who will actually talk with you, is just refreshing. Even back when I was only using the trial version, Janos was more than happy to help me problem solve. He is why I went so strongly with HDS, that and the program itself and it's capabilities. And there are others here who feel the same way. As JudyZ's comments will attest -- I liked Janos' HDS program so much that I installed it on my Linux computers too. But Installation and use was strictly manual, command-line stuff in Linux. And it overwrites the report each time you run it, unless you manually rename the old report before you run HDS again. So I wrote a couple simple scripts, to help automate the installation, put launchers in a couple logical places, and give you an automated way to run the program, generate a new report, and save it automatically, with the date/time in the filename, thereby building a library of searchable reports.  HDSentinel is a gem, it does a great many basic tasks really well. Plus it helps you avoid catastrophic failures, premature retirements of drives, and gives you options for reclaiming or salvaging a disk with issues.

 

http://www.hdsentinel.com/  -- general info and links

http://www.hdsentinel.com/add-on-linux-installers.php -- the program bundled with my installers and scripts

 

Now Janos, Judy and I are working on a way to include the HDS Daemon on an unRAID USB drive, have it run  automatically at startup, and have its output be monitored by the Enterprise (server) version of HDS either in Docker or remotely.

 

So this is old, but wondering if there was any traction on this? I'm actually looking for just a solution. While I love the unraid SMART if I could add those drives to a dashboard like HDSentinel that would be awesome.

Link to comment
  • 2 years later...

Just fell across HDSentinal, and realised its by far the best HD test, and monitoring software ever made.  It seems criminal I have to use my windows box/VM to get all the features!  Would love to have it all integrated in to unraid, in a docker outputtingputing to a web front end.

 

Would happily pay for that :)

Link to comment
  • 5 years later...
On 3/19/2018 at 8:34 PM, alexdodd said:

Just fell across HDSentinal, and realised its by far the best HD test, and monitoring software ever made.  It seems criminal I have to use my windows box/VM to get all the features!  Would love to have it all integrated in to unraid, in a docker outputtingputing to a web front end.

 

Would happily pay for that :)

Did you or anyone manage to find out how we can install this via the terminal or a docker container? I would really like to install this.

 

Any and all help/ advice is much appriciated.

Thanks

Link to comment
  • 5 months later...
On 10/14/2023 at 2:26 PM, ArxKnight said:

Did you or anyone manage to find out how we can install this via the terminal or a docker container? I would really like to install this.

 

Sure ,used it for some years .... in 6.12.9 have kernel trap for hdsentinel on boot

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.