WeeboTech Posted March 13, 2015 Share Posted March 13, 2015 Try to access the smart information with smartctl directly. Perhaps HDSentinel turned off or clobbered the smart functionality. Quote Link to comment
marcsayer Posted March 14, 2015 Share Posted March 14, 2015 I've sent a link to this thread, to Janos, and asked him to join the discussion. Hopefully he'll be able to make some time to get on this forum. But I also know he works like a madman in his secret laboratory in Hungary , so it may take him some time to get here. I'll leave the problem solving to him and the unRAID gurus. But if I were having this issue, I'd immediately suspect a cable problem. I know Sata cable problems will cause all sorts of weird results and transient or progressive issues. I also know cables that look fine, have worked fine for ages, and seem to be well seated, can start having issues. Given that it sounds like unRAID took the drive out of service "unRAID does not take a disk out of service casually, but if a disk experiences a write failure, it will do exactly that, it will take the disk out of service" I'd think you had a write problem as well as the HDS detection/data collection problems. Add to that the fact that after removing the drive and connecting it to another computer (and thereby exercising/changing the cables), the problem seems to have gone away (well other than the fact that the drive was OOS in unRAID, which you were able to correct). And all of this points me to a cable issue. But I may well be completely off the mark here, which is why I asked Janos to chime in. Normally my HDS reports include a lot more data than I'm seeing with yours flaggart. Did you edit the reports or are you not getting any additional data? Quote Link to comment
flaggart Posted March 14, 2015 Share Posted March 14, 2015 The console log is verbatim. With regards to having removed the drive, connected to another PC etc, I think this step was totally unnecessary. I was unaware of the procedure to re-enable a disabled device. The drive red-balled at the exact same moment as I ran HDSentinal so it would be an amazing coincidence if a cable became loose! Data has finished rebuilding and all seems OK. Here is the syslog in case anyone is interested: Mar 13 21:34:00 BIGBOX in.telnetd[24059]: connect from 192.168.0.10 (192.168.0.10) Mar 13 21:34:03 BIGBOX login[24060]: ROOT LOGIN on '/dev/pts/0' from 'FRACTAL.home.net' Mar 13 21:34:53 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sde drive state is: active/idle Mar 13 21:35:54 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sde drive state is: active/idle Mar 13 21:36:58 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle Mar 13 21:37:17 BIGBOX kernel: sd 2:0:1:0: command f7250600 timed out Mar 13 21:37:17 BIGBOX kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Mar 13 21:37:17 BIGBOX kernel: sas: trying to find task 0xe2b1d900 Mar 13 21:37:17 BIGBOX kernel: sas: sas_scsi_find_task: aborting task 0xe2b1d900 Mar 13 21:37:17 BIGBOX kernel: sas: sas_scsi_find_task: task 0xe2b1d900 is aborted Mar 13 21:37:17 BIGBOX kernel: sas: sas_eh_handle_sas_errors: task 0xe2b1d900 is aborted Mar 13 21:37:17 BIGBOX kernel: sas: ata8: end_device-2:1: cmd error handler Mar 13 21:37:17 BIGBOX kernel: sas: ata7: end_device-2:0: dev error handler Mar 13 21:37:17 BIGBOX kernel: sas: ata8: end_device-2:1: dev error handler Mar 13 21:37:17 BIGBOX kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Mar 13 21:37:17 BIGBOX kernel: ata8.00: failed command: SMART Mar 13 21:37:17 BIGBOX kernel: ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0 Mar 13 21:37:17 BIGBOX kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Mar 13 21:37:17 BIGBOX kernel: ata8.00: status: { DRDY } Mar 13 21:37:17 BIGBOX kernel: ata8: hard resetting link Mar 13 21:37:17 BIGBOX kernel: sas: ata9: end_device-2:2: dev error handler Mar 13 21:37:17 BIGBOX kernel: sas: ata10: end_device-2:3: dev error handler Mar 13 21:37:17 BIGBOX kernel: sas: ata11: end_device-2:4: dev error handler Mar 13 21:37:17 BIGBOX kernel: sas: ata12: end_device-2:5: dev error handler Mar 13 21:37:17 BIGBOX kernel: sas: ata13: end_device-2:6: dev error handler Mar 13 21:37:17 BIGBOX kernel: sas: ata14: end_device-2:7: dev error handler Mar 13 21:37:17 BIGBOX kernel: sas: sas_form_port: phy1 belongs to port1 already(1)! Mar 13 21:37:19 BIGBOX kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[1]:rc= 0 Mar 13 21:37:25 BIGBOX kernel: ata8.00: qc timeout (cmd 0x27) Mar 13 21:37:25 BIGBOX kernel: ata8.00: failed to read native max address (err_mask=0x4) Mar 13 21:37:25 BIGBOX kernel: ata8.00: HPA support seems broken, skipping HPA handling Mar 13 21:37:25 BIGBOX kernel: ata8.00: revalidation failed (errno=-5) Mar 13 21:37:25 BIGBOX kernel: ata8: hard resetting link Mar 13 21:37:25 BIGBOX kernel: sas: sas_form_port: phy1 belongs to port1 already(1)! Mar 13 21:37:27 BIGBOX kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[1]:rc= 0 Mar 13 21:37:32 BIGBOX kernel: ata8.00: qc timeout (cmd 0xef) Mar 13 21:37:32 BIGBOX kernel: ata8.00: failed to set xfermode (err_mask=0x4) Mar 13 21:37:32 BIGBOX kernel: ata8.00: limiting speed to UDMA/133:PIO3 Mar 13 21:37:32 BIGBOX kernel: ata8: hard resetting link Mar 13 21:37:32 BIGBOX kernel: sas: sas_form_port: phy1 belongs to port1 already(1)! Mar 13 21:37:34 BIGBOX kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[1]:rc= 0 Mar 13 21:37:44 BIGBOX kernel: ata8.00: qc timeout (cmd 0xef) Mar 13 21:37:44 BIGBOX kernel: ata8.00: failed to set xfermode (err_mask=0x4) Mar 13 21:37:44 BIGBOX kernel: ata8.00: disabled Mar 13 21:37:44 BIGBOX kernel: ata8: hard resetting link Mar 13 21:37:45 BIGBOX kernel: sas: sas_form_port: phy1 belongs to port1 already(1)! Mar 13 21:37:47 BIGBOX kernel: drivers/scsi/mvsas/mv_sas.c 1527:mvs_I_T_nexus_reset for device[1]:rc= 0 Mar 13 21:37:47 BIGBOX kernel: ata8: EH complete Mar 13 21:37:47 BIGBOX kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] READ CAPACITY(16) failed Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Mar 13 21:37:47 BIGBOX kernel: Result: hostbyte=0x04 driverbyte=0x00 Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Sense not available. Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] READ CAPACITY failed Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Mar 13 21:37:47 BIGBOX kernel: Result: hostbyte=0x04 driverbyte=0x00 Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Sense not available. Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Truncating mode parameter data from 3330 to 512 bytes Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Got wrong page Mar 13 21:37:47 BIGBOX kernel: sd 2:0:1:0: [sdh] Assuming drive cache: write through Mar 13 21:37:47 BIGBOX kernel: sdh: detected capacity change from 2000398934016 to 0 Mar 13 21:37:48 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Mar 13 21:38:03 BIGBOX last message repeated 14 times Mar 13 21:38:10 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle Mar 13 21:38:11 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Mar 13 21:38:56 BIGBOX last message repeated 32 times Mar 13 21:38:59 BIGBOX last message repeated 9 times Mar 13 21:39:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle Mar 13 21:40:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle Mar 13 21:41:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle Mar 13 21:42:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle Mar 13 21:43:11 BIGBOX s3_sleep: Disk activity detected. Reset all counters. /dev/sdb drive state is: active/idle /dev/sdc drive state is: active/idle /dev/sdd drive state is: active/idle /dev/sde drive state is: active/idle /dev/sdg drive state is: active/idle /dev/sdi drive state is: active/idle /dev/sdj drive state is: active/idle /dev/sdk drive state is: active/idle /dev/sdl drive state is: active/idle /dev/sdm drive state is: active/idle /dev/sdn drive state is: active/idle Mar 13 21:43:18 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Mar 13 21:43:23 BIGBOX last message repeated 46 times Mar 13 21:43:38 BIGBOX kernel: md: disk10 read error, sector=1871085816 Mar 13 21:43:39 BIGBOX kernel: md: disk10 write error, sector=1871085816 Mar 13 21:43:39 BIGBOX kernel: md: recovery thread woken up ... Mar 13 21:43:39 BIGBOX kernel: md: recovery thread has nothing to resync Mar 13 21:44:06 BIGBOX emhttp: shcmd (69): /usr/local/sbin/emhttp_event stopping_svcs Mar 13 21:44:06 BIGBOX kernel: mdcmd (88): nocheck Mar 13 21:44:06 BIGBOX kernel: md: nocheck_array: check not active Quote Link to comment
WeeboTech Posted March 14, 2015 Share Posted March 14, 2015 Mar 13 21:43:38 BIGBOX kernel: md: disk10 read error, sector=1871085816 Mar 13 21:43:39 BIGBOX kernel: md: disk10 write error, sector=1871085816 Do a smart long test on this drive and post a full smart log (elsewhere on the forum) this drive is having issues. Quote Link to comment
marcsayer Posted March 15, 2015 Share Posted March 15, 2015 Okay, I believe the HDS report you posted is what one sees in the terminal window after HDS has finished. It is not the report it would generate and save if you started HDS with the -r switch. The report you're posting is a basic report. The saved report includes a lot more data. Here is a saved report from my old Linux machine so you'll see what I mean (I've only included 1 drive because it was so long). -- General Information -- Application Information ----------------------- Installed Version . . . . . . . . . . . . . . . . Hard Disk Sentinel 0.08 Current Date And Time . . . . . . . . . . . . . . 20-1-14 20:05:15 Computer Information -------------------- Computer Name . . . . . . . . . . . . . . . . . . zorin7-tower MAC Address . . . . . . . . . . . . . . . . . . . 90:E6:BA:CD:F9:CD System Information ------------------ OS Version. . . . . . . . . . . . . . . . . . . . Linux : 3.8.0-35-generic (#50-Ubuntu SMP Tue Dec 3 01:24:59 UTC 2013) Process ID. . . . . . . . . . . . . . . . . . . . 6726 Uptime. . . . . . . . . . . . . . . . . . . . . . 162183 sec (1 days, 21 hours, 3 min, 3 sec) -- Physical Disk Information - Disk: #0: Hitachi HUA721010KLA330 -- Hard Disk Summary ----------------- Hard Disk Number. . . . . . . . . . . . . . . . . 0 Hard Disk Device. . . . . . . . . . . . . . . . . /dev/sda Interface . . . . . . . . . . . . . . . . . . . . S-ATA Hard Disk Model ID. . . . . . . . . . . . . . . . Hitachi HUA721010KLA330 Hard Disk Revision. . . . . . . . . . . . . . . . GKAOAB0A Hard Disk Serial Number . . . . . . . . . . . . . GTF002PBJDDAPF Hard Disk Total Size. . . . . . . . . . . . . . . 953870 MB Current Temperature . . . . . . . . . . . . . . . 29 °C (84 °F) Maximum Temperature (during Entire Lifespan). . . 47 °C (117 °F) Power On Time . . . . . . . . . . . . . . . . . . 311 days, 8 hours Estimated Remaining Lifetime. . . . . . . . . . . more than 1000 days Health. . . . . . . . . . . . . . . . . . . . . . #################### 100 % (Excellent) Performance . . . . . . . . . . . . . . . . . . . #################### 100 % (Excellent) The hard disk status is PERFECT. Problematic or weak sectors not found and there are no spin up or data transfer errors. No actions needed. ATA Information --------------- Hard Disk Cylinders . . . . . . . . . . . . . . . 1938021 Hard Disk Heads . . . . . . . . . . . . . . . . . 16 Hard Disk Sectors . . . . . . . . . . . . . . . . 63 Total Sectors . . . . . . . . . . . . . . . . . . 1953525168 ATA Revision. . . . . . . . . . . . . . . . . . . 7 Bytes Per Sector. . . . . . . . . . . . . . . . . 512 Buffer Size . . . . . . . . . . . . . . . . . . . 31157 KB Multiple Sectors. . . . . . . . . . . . . . . . . 16 Error Correction Bytes. . . . . . . . . . . . . . 52 Unformatted Capacity. . . . . . . . . . . . . . . 953870 MB Maximum PIO Mode. . . . . . . . . . . . . . . . . 4 Maximum Multiword DMA Mode. . . . . . . . . . . . 2 Maximum UDMA Mode . . . . . . . . . . . . . . . . 150 MB/s (6) Active UDMA Mode. . . . . . . . . . . . . . . . . 150 MB/s (6) Minimum Multiword DMA Transfer Time . . . . . . . 120 ns Recommended Multiword DMA Transfer Time . . . . . 120 ns Minimum PIO Transfer Time Without IORDY . . . . . 120 ns Minimum PIO Transfer Time With IORDY. . . . . . . 120 ns ATA Control Byte. . . . . . . . . . . . . . . . . Valid ATA Checksum Value. . . . . . . . . . . . . . . . Valid Acoustic Management Configuration --------------------------------- Acoustic Management . . . . . . . . . . . . . . . Supported Acoustic Management . . . . . . . . . . . . . . . Enabled Current Acoustic Level. . . . . . . . . . . . . . Max performance and volume (FEh) Recommended Acoustic Level. . . . . . . . . . . . Min performance and volume (80h) EIDE Properties --------------- Read Ahead Buffer . . . . . . . . . . . . . . . . Supported DMA . . . . . . . . . . . . . . . . . . . . . . . Supported Ultra DMA . . . . . . . . . . . . . . . . . . . . Supported S.M.A.R.T.. . . . . . . . . . . . . . . . . . . . Supported Power Management. . . . . . . . . . . . . . . . . Supported Write Cache . . . . . . . . . . . . . . . . . . . Supported Host Protected Area . . . . . . . . . . . . . . . Supported Advanced Power Management . . . . . . . . . . . . Supported Power Up In Standby . . . . . . . . . . . . . . . Supported 48-bit LBA Addressing . . . . . . . . . . . . . . Supported Device Configuration Overlay. . . . . . . . . . . Supported IORDY Support . . . . . . . . . . . . . . . . . . Supported Read/Write DMA Queue. . . . . . . . . . . . . . . Not supported NOP Command . . . . . . . . . . . . . . . . . . . Not supported Trusted Computing . . . . . . . . . . . . . . . . Not supported 64-bit World Wide ID. . . . . . . . . . . . . . . 0050A2CCE11ECDD1 Streaming . . . . . . . . . . . . . . . . . . . . Supported Media Card Pass Through . . . . . . . . . . . . . Not supported General Purpose Logging . . . . . . . . . . . . . Supported Error Logging . . . . . . . . . . . . . . . . . . Supported CFA Feature Set . . . . . . . . . . . . . . . . . Not supported Long Physical Sectors (1) . . . . . . . . . . . . Not supported Long Logical Sectors. . . . . . . . . . . . . . . Not supported Write-Read-Verify . . . . . . . . . . . . . . . . Not supported NV Cache Feature. . . . . . . . . . . . . . . . . Not supported NV Cache Power Mode . . . . . . . . . . . . . . . Not supported NV Cache Size . . . . . . . . . . . . . . . . . . Not supported Free-fall Control . . . . . . . . . . . . . . . . Not supported Free-fall Control Sensitivity . . . . . . . . . . Not supported SSD Features ------------ Data Set Management . . . . . . . . . . . . . . . Not supported TRIM Command. . . . . . . . . . . . . . . . . . . Not supported Deterministic Read After TRIM . . . . . . . . . . Not supported S.M.A.R.T. Details ------------------ Off-line Data Collection Status . . . . . . . . . Successfully Completed Self Test Execution Status. . . . . . . . . . . . Successfully Completed Total Time To Complete Off-line Data Collection . 15354 seconds Execute Off-line Immediate. . . . . . . . . . . . Supported Abort/restart Off-line By Host. . . . . . . . . . Not supported Off-line Read Scanning. . . . . . . . . . . . . . Supported Short Self-test . . . . . . . . . . . . . . . . . Supported Extended Self-test. . . . . . . . . . . . . . . . Supported Conveyance Self-test. . . . . . . . . . . . . . . Not supported Selective Self-Test . . . . . . . . . . . . . . . Supported Save Data Before/After Power Saving Mode. . . . . Supported Enable/Disable Attribute Autosave . . . . . . . . Supported Error Logging Capability. . . . . . . . . . . . . Supported Short Self-test Estimated Time. . . . . . . . . . 1 minutes Security Mode ------------- Security Mode . . . . . . . . . . . . . . . . . . Supported Security Erase. . . . . . . . . . . . . . . . . . Supported Security Erase Time . . . . . . . . . . . . . . . 170 minutes Security Enhanced Erase Feature . . . . . . . . . Not supported Security Enhanced Erase Time. . . . . . . . . . . Not supported Security Enabled. . . . . . . . . . . . . . . . . No Security Locked . . . . . . . . . . . . . . . . . No Security Frozen . . . . . . . . . . . . . . . . . Yes Security Counter Expired. . . . . . . . . . . . . No Security Level. . . . . . . . . . . . . . . . . . High Serial ATA Features ------------------- S-ATA Compliance. . . . . . . . . . . . . . . . . Yes S-ATA I Signaling Speed (1.5 Gps) . . . . . . . . Supported S-ATA II Signaling Speed (3 Gps). . . . . . . . . Not supported Receipt Of Power Management Requests From Host. . Supported PHY Event Counters. . . . . . . . . . . . . . . . Supported Non-Zero Buffer Offsets In DMA Setup FIS. . . . . Supported, Disabled DMA Setup Auto-Activate Optimization. . . . . . . Supported, Enabled Device Initiating Interface Power Management. . . Supported, Disabled In-Order Data Delivery. . . . . . . . . . . . . . Supported, Disabled Asynchronous Notification . . . . . . . . . . . . Not supported Software Settings Preservation. . . . . . . . . . Supported, Enabled Native Command Queuing (NCQ). . . . . . . . . . . Supported Queue Length. . . . . . . . . . . . . . . . . . . 32 S.M.A.R.T. ---------- No. Attribute Thre.. Value Worst Data Status Flags 1 Raw Read Error Rate 16 100 100 000000000000 OK Error-Rate, Statistical, Critical 2 Throughput Performance 54 130 130 000000000096 OK Performance, Critical 3 Spin Up Time 24 148 148 000701E801C6 OK Performance, Statistical, Critical 4 Start/Stop Count 0 100 100 0000000003EC OK (Always passing) Event Count, Statistical 5 Reallocated Sectors Co.. 5 100 100 000000000000 OK Self Preserving, Event Count, Statistical, Critical 7 Seek Error Rate 67 100 100 000000000000 OK Error-Rate, Statistical, Critical 8 Seek Time Performance 20 132 132 000000000021 OK Performance, Critical 9 Power On Time Count 0 99 99 000000001D30 OK (Always passing) Event Count, Statistical 10 Spin Retry Count 60 100 100 000000000000 OK Event Count, Statistical, Critical 12 Drive Power Cycle Count 0 100 100 0000000000CF OK (Always passing) Self Preserving, Event Count, Statistical 192 Power off Retract Cycl.. 0 100 100 0000000004A3 OK (Always passing) Self Preserving, Event Count, Statistical 193 Load/Unload Cycle Count 0 100 100 0000000004A3 OK (Always passing) Event Count, Statistical 194 Disk Temperature 0 206 206 002F000E001D OK (Always passing) Statistical 196 Reallocation Event Count 0 100 100 000000000000 OK (Always passing) Self Preserving, Event Count, Statistical 197 Current Pending Sector.. 0 100 100 000000000000 OK (Always passing) Self Preserving, Statistical 198 Off-Line Uncorrectable.. 0 100 100 000000000000 OK (Always passing) Error-Rate 199 Ultra ATA CRC Error Co.. 0 200 200 000000000000 OK (Always passing) Error-Rate, Statistical -- Partition Information -- Logical Drive Total Space Free Space Free Space Used Space <Partition Drive="/" Total_Space="237,886 MB" Free_Space="180,904 MB" Free_Space_Percent=" 76 %" Disk="/" BlockSize="4096" Files="15482880" FileSystem="61267" /> <Partition Drive="/media/BA728B3C728AFBFF (Disk #2)" Total_Space="703,766 MB" Free_Space="501,427 MB" Free_Space_Percent=" 71 %" Disk="/dev/sdc2" BlockSize="4096" Files="513756292" FileSystem="1702057286" /> And yes it would seem unlikely that a cable issue would suddenly develop just as you ran HDS. As I said that would just be my first guess as cable/communicaiton problems are fairly common and do suddenly pop up sometimes. Plus it did fit the parameters of how your issues presented. Is drive 10 is your last syslog the same drive, port, or cabling as the troubled drive/port/cable from your first post? If so, that drive/port/cable is definitely having problems. If Janos does not manage to get on this forum, I suggest you contact him about this. One of the great things about HDS is Janos' support. Quote Link to comment
flaggart Posted March 15, 2015 Share Posted March 15, 2015 I ran an extended SMART test from unRAID gui on drive 10. Not sure if there is a log for that, but the GUI reports no errors: Num Test Description Status Remaining LifeTime(hours) LBA of first error 1 Extended offline Completed without error 00% 16812 None 2 Short offline Completed without error 00% 16800 None 3 Short captive Completed without error 00% 24 None Quote Link to comment
WeeboTech Posted March 15, 2015 Share Posted March 15, 2015 I ran an extended SMART test from unRAID gui on drive 10. Not sure if there is a log for that, but the GUI reports no errors: Num Test Description Status Remaining LifeTime(hours) LBA of first error 1 Extended offline Completed without error 00% 16812 None 2 Short offline Completed without error 00% 16800 None 3 Short captive Completed without error 00% 24 None Post a full smart report, let's see if there are pending or re-allocated sectors. There could have been some kind of controller issue, invalid ioctl that caused grief for short period. Quote Link to comment
flaggart Posted March 15, 2015 Share Posted March 15, 2015 root@BIGBOX:~# smartctl -a -A /dev/sdh smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green Device Model: WDC WD20EADS-00R6B0 Serial Number: WD-WCAVY1187400 LU WWN Device Id: 5 0014ee 2ae3ff9f8 Firmware Version: 01.00A01 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Sun Mar 15 12:51:26 2015 GMT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (41580) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 473) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 149 147 021 Pre-fail Always - 9508 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1637 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 16822 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 481 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 75 193 Load_Cycle_Count 0x0032 194 194 000 Old_age Always - 18501 194 Temperature_Celsius 0x0022 121 102 000 Old_age Always - 31 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 16812 - # 2 Short offline Completed without error 00% 16800 - # 3 Short captive Completed without error 00% 24 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
c3 Posted March 15, 2015 Share Posted March 15, 2015 Some controllers (like 1064, 1068) are notorious for this kind of behavior, disk load and SMART collision results in the drive disappearing and returning. This gives an example to repro the problem http://lists.us.dell.com/pipermail/linux-poweredge/2009-November/040453.html What it looks like to recover with mdadm http://sgros.blogspot.com/2011/11/readding-sata-disk-to-software-raid.html Quote Link to comment
WeeboTech Posted March 15, 2015 Share Posted March 15, 2015 Drive looks good. This one of those older WD EADS drives. I had a number of them start to show weak sectors as they aged. I would suggest a periodic routine of smart long tests so you are informed early. I would suggest double checking the cables and make sure everything is tight. I've had issues in years gone by where they would creep out over time due to vibrations. Something happened and the controller and/or drive reset. Given that there are two programs accessing the smart data, HDSentinel and smartctl, one could have caused conflict with the other at a crucial time. There's an awful lot of 'program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO' Mar 13 21:37:48 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Mar 13 21:38:03 BIGBOX last message repeated 14 times ... Mar 13 21:38:11 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Mar 13 21:38:56 BIGBOX last message repeated 32 times Mar 13 21:38:59 BIGBOX last message repeated 9 times ... Mar 13 21:43:18 BIGBOX kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Mar 13 21:43:23 BIGBOX last message repeated 46 times Quote Link to comment
vanstinator Posted August 10, 2015 Share Posted August 10, 2015 HDSentinel is a great program. I use it in all 10 of my computers. There is a Pro Windows version, an Enterprise version (server), a Linux version, a Linux daemon version, a home version, and a free trial version that never expires (it just lacks some of the higher order functions). The guy who runs the company and develops the program, Janos, is a gem of a human being. He is as valuable as the product itself. Finding nice, helpful technical experts who will actually talk with you, is just refreshing. Even back when I was only using the trial version, Janos was more than happy to help me problem solve. He is why I went so strongly with HDS, that and the program itself and it's capabilities. And there are others here who feel the same way. As JudyZ's comments will attest -- I liked Janos' HDS program so much that I installed it on my Linux computers too. But Installation and use was strictly manual, command-line stuff in Linux. And it overwrites the report each time you run it, unless you manually rename the old report before you run HDS again. So I wrote a couple simple scripts, to help automate the installation, put launchers in a couple logical places, and give you an automated way to run the program, generate a new report, and save it automatically, with the date/time in the filename, thereby building a library of searchable reports. HDSentinel is a gem, it does a great many basic tasks really well. Plus it helps you avoid catastrophic failures, premature retirements of drives, and gives you options for reclaiming or salvaging a disk with issues. http://www.hdsentinel.com/ -- general info and links http://www.hdsentinel.com/add-on-linux-installers.php -- the program bundled with my installers and scripts Now Janos, Judy and I are working on a way to include the HDS Daemon on an unRAID USB drive, have it run automatically at startup, and have its output be monitored by the Enterprise (server) version of HDS either in Docker or remotely. So this is old, but wondering if there was any traction on this? I'm actually looking for just a solution. While I love the unraid SMART if I could add those drives to a dashboard like HDSentinel that would be awesome. Quote Link to comment
alexdodd Posted March 19, 2018 Share Posted March 19, 2018 Just fell across HDSentinal, and realised its by far the best HD test, and monitoring software ever made. It seems criminal I have to use my windows box/VM to get all the features! Would love to have it all integrated in to unraid, in a docker outputtingputing to a web front end. Would happily pay for that Quote Link to comment
ArxKnight Posted October 14, 2023 Share Posted October 14, 2023 On 3/19/2018 at 8:34 PM, alexdodd said: Just fell across HDSentinal, and realised its by far the best HD test, and monitoring software ever made. It seems criminal I have to use my windows box/VM to get all the features! Would love to have it all integrated in to unraid, in a docker outputtingputing to a web front end. Would happily pay for that Did you or anyone manage to find out how we can install this via the terminal or a docker container? I would really like to install this. Any and all help/ advice is much appriciated. Thanks Quote Link to comment
Masterwishx Posted March 28 Share Posted March 28 On 10/14/2023 at 2:26 PM, ArxKnight said: Did you or anyone manage to find out how we can install this via the terminal or a docker container? I would really like to install this. Sure ,used it for some years .... in 6.12.9 have kernel trap for hdsentinel on boot Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.