December 23, 201312 yr Hi all, Been running a 16 (15 + parity) array for almost a year without any issues. Just had my first issue and hoping for some guidance so I don't lose any data (or minimize data loss). I'm running version 5.0.4 on a Supermicro X9SCM-F-O on ESXi. There are two M1015s both flashed to P15. I was watching a movie when streaming froze and I checked dmesg on unraid to find: sd 2:0:2:0: [sdl] CDB: cdb[0]=0x88: 88 00 00 00 00 01 3c 46 29 c0 00 00 04 00 00 00 scsi target2:0:2: handle(0x000b), sas_address(0x4433221102000000), phy(2) scsi target2:0:2: enclosure_logical_id(0x500605b0022a1530), slot(1) sd 2:0:2:0: task abort: SUCCESS scmd(db84c0c0) sd 2:0:2:0: attempting task abort! scmd(db84c0c0) sd 2:0:2:0: [sdl] CDB: cdb[0]=0x88: 88 00 00 00 00 01 3c 46 29 c0 00 00 04 00 00 00 scsi target2:0:2: handle(0x000b), sas_address(0x4433221102000000), phy(2) scsi target2:0:2: enclosure_logical_id(0x500605b0022a1530), slot(1) sd 2:0:2:0: task abort: SUCCESS scmd(db84c0c0) sd 2:0:2:0: attempting task abort! scmd(db84c0c0) sd 2:0:2:0: [sdl] CDB: cdb[0]=0x88: 88 00 00 00 00 01 3c 46 29 c0 00 00 04 00 00 00 scsi target2:0:2: handle(0x000b), sas_address(0x4433221102000000), phy(2) scsi target2:0:2: enclosure_logical_id(0x500605b0022a1530), slot(1) sd 2:0:2:0: task abort: SUCCESS scmd(db84c0c0) sd 2:0:2:0: attempting task abort! scmd(db84c0c0) sd 2:0:2:0: [sdl] CDB: cdb[0]=0x88: 88 00 00 00 00 01 3c 46 29 c0 00 00 04 00 00 00 scsi target2:0:2: handle(0x000b), sas_address(0x4433221102000000), phy(2) scsi target2:0:2: enclosure_logical_id(0x500605b0022a1530), slot(1) sd 2:0:2:0: task abort: SUCCESS scmd(db84c0c0) Cleanly shut down, rebooted and a parity check kicked off. It grinds along happily at ~90MB/sec until it hit ~77% then those same errors started throwing on the same drive. It then slows down to XXKB/sec and the simplefeatures UI on 80 and the stock UI on 8080 both become _very_ unresponsive. I had run a successful parity check 100 days prior with no issues and have also checked all cabling thoroughly. The drives normally run ~30C. They are now up at 38-40C, because it is 35C outside here and they are grinding through parity. I stopped the check, rebooted, and parity checked again. Again the same process. I will let the parity check continue to plug along at 50KB/sec and hope it eventually speeds up, but the errors are continuing. I have attached a syslog. I tried to run smartctl to attach smart info, but it hung and can't be killed with CTRL+C. Running ps seems to indicate the system already tried to run smart unsuccessfully: I have attached an older smart output for the sdl drive. root@Tower:~# ps ax | grep smart 2193 ? S 0:00 sh -c smartctl -d ata -A /dev/sdl| grep -i temperature 2194 ? D 0:00 smartctl -d ata -A /dev/sdl 8328 pts/1 D+ 0:00 smartctl -a -A /dev/sdl 8848 ? D 0:00 /usr/sbin/smartctl -n standby -A /dev/sdl 8849 ? Z 0:00 [smartctl] <defunct> 8850 ? Z 0:00 [smartctl] <defunct> 8851 ? Z 0:00 [smartctl] <defunct> 9390 pts/2 S+ 0:00 grep smart At this point I assume the drive is done, and I just would like guidance on best ways to proceed. I have a identical brand new (in shrink wrap) 3TB WD Green ready to move forward. A couple drives have thrown ~100-200 errors during the failed parity rebuild processes, so I'm thinking I might have multiple drives dying at the same time. Any suggestions on the best way to proceed to minimize data loss would be greatly appreciated. Thanks so much in advance and happy holidays to all. syslog_and_smart.zip
December 23, 201312 yr Author Sorry to reply to my own post, but a new data point as it is running _very_ slow through a parity check: The system appears to be repeatedly running smartctl against the sdl drive. root@Tower:/var/log# ps auwx | grep smart root 2606 1.0 0.0 3552 1224 ? D 22:43 0:00 /usr/sbin/smartctl -n standby -A /dev/sdl root 2607 1.0 0.0 0 0 ? Z 22:43 0:00 [smartctl] <defunct> root 2608 1.0 0.0 0 0 ? Z 22:43 0:00 [smartctl] <defunct> root 2609 1.0 0.0 0 0 ? Z 22:43 0:00 [smartctl] <defunct> root 2611 0.0 0.0 2448 588 pts/2 R+ 22:43 0:00 grep smart root@Tower:/var/log# ps auwx | grep smart root 2644 0.0 0.0 3552 1220 ? D 22:43 0:00 /usr/sbin/smartctl -n standby -A /dev/sdl root 2645 0.0 0.0 0 0 ? Z 22:43 0:00 [smartctl] <defunct> root 2646 0.0 0.0 0 0 ? Z 22:43 0:00 [smartctl] <defunct> root 2647 0.0 0.0 0 0 ? Z 22:43 0:00 [smartctl] <defunct> root 2649 0.0 0.0 448 4 pts/2 R+ 22:43 0:00 grep smart root@Tower:/var/log# ps auwx | grep smart root 2709 0.0 0.0 2448 588 pts/2 S+ 22:44 0:00 grep smart Any help would be greatly appreciated. I've been down for days now and really just want to know how to replace this drive or if it needs to be replaced? The system has all drives as green at the moment.
December 23, 201312 yr I recall reading that if you have simplefeatures installed keeping the webgui open will cause this.
December 24, 201312 yr [me=DaleWilliams]shakes his fist [/me] How do we get SimpleFeatures removed from this page: WIKI: UNRAID5 Plugins I, too, installed SF early on in 5.0.0Final and suffered through the debug and uninstall process. Since SF doesn't work consistently, and is unsupported, shouldn't it be removed from the Wiki as being 'officially' compatible with 5.0? I'm happy to help edit, but the WIKI makes it clear that the plugin author should do the editing.
December 24, 201312 yr I'm happy to help edit, but the WIKI makes it clear that the plugin author should do the editing. So don't remove any info, just add a short paragraph near the top of the simplefeatures entry, stating that some people have had issues using simplefeatures in the 5.x release series, and include forum links to some examples. Don't use hyperbole, don't say how you really feel, just state the facts and give examples. Inform, and let people decide for themselves.
December 25, 201312 yr A lot of people are having issue with Sf and version 5. sf is not compatible with version 5. It's intended replacement is the github GUI which has it's own problems.
Archived
This topic is now archived and is closed to further replies.