Jump to content

Re: Drive stops responding during clear (EMKO)


Recommended Posts

trying to clear a drive first time after many hours i came back and the server was unresponsive with both web and ssh,  on the second time i last checked it was at 20% and i left it over night when i checked it looks like it stopped because it showed the list of drives and on disc 6 drop down the drive is not there anymore had to reboot. The logs where massive im doing another clear right now so far in the logs is this

 

 

tail -n 40 -f /var/log/syslog
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] 
Jul 12 07:26:59 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] CDB: 
Jul 12 07:26:59 Tower kernel: cdb[0]=0x8a: 8a 00 00 00 00 00 03 f9 84 00 00 00 00 40 00 00
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] READ CAPACITY failed
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] 
Jul 12 07:26:59 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Sense not available.
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Write Protect is on
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Mode Sense: 80 bf 1d c5
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Truncating mode parameter data from 32961 to 512 bytes
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Got wrong page
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Assuming drive cache: write through
Jul 12 07:26:59 Tower kernel: sdf: detected capacity change from 3000592982016 to 0
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] READ CAPACITY(16) failed
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] 
Jul 12 07:26:59 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Sense not available.
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] READ CAPACITY failed
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] 
Jul 12 07:26:59 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Sense not available.
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Write Protect is off
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Mode Sense: 00 00 00 00
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Asking for cache data failed
Jul 12 07:26:59 Tower kernel: sd 0:0:0:0: [sdf] Assuming drive cache: write through
Jul 12 07:26:59 Tower ata_id[3467]: HDIO_GET_IDENTITY failed for '/dev/sdf' 
Jul 12 07:27:00 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jul 12 07:27:02 Tower last message repeated 17 times
Jul 12 07:27:02 Tower kernel: sas: sas_form_port: phy0 belongs to port0 already(1)!
Jul 12 07:27:03 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jul 12 07:27:34 Tower last message repeated 132 times
Jul 12 07:28:20 Tower last message repeated 103 times
Jul 12 07:28:21 Tower emhttp: clear: 3% complete
Jul 12 07:28:22 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jul 12 07:28:53 Tower last message repeated 68 times
Jul 12 07:29:50 Tower last message repeated 134 times
Jul 12 07:29:51 Tower emhttp: clear: 4% complete
Jul 12 07:29:53 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jul 12 07:30:24 Tower last message repeated 84 times
Jul 12 07:31:21 Tower last message repeated 149 times
Jul 12 07:31:23 Tower emhttp: clear: 5% complete
Jul 12 07:31:23 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jul 12 07:31:54 Tower last message repeated 84 times

Link to comment

ran another clear got to 100% then the logs started to get big again, web page changed to the drop down list again and in disc 6 i can choose (sdf) 0. The log just keeps repeating until it crashes the server.

 

tail -n 40 -f /var/log/syslog
Jul 12 09:56:04 Tower emhttp: WDC_WD20EADS-00R6B0_WD-WCAVY2267178 (hdc) 1953514584
Jul 12 09:56:04 Tower emhttp: ST4000DM000-1F2168_Z300D00R (sdb) 3907018584
Jul 12 09:56:04 Tower emhttp: WDC_WD30EZRX-00DC0B0_WD-WMC1T0564002 (sdc) 2930266584
Jul 12 09:56:04 Tower emhttp: WDC_WD20EARS-00MVWB0_WD-WMAZA1789360 (sdd) 1953514584
Jul 12 09:56:04 Tower emhttp: WDC_WD15EADS-00P8B0_WD-WMAVU0108337 (sde) 1465138584
Jul 12 09:56:04 Tower emhttp: (sdf) 0
Jul 12 09:56:04 Tower emhttp: WDC_WD20EARX-00PASB0_WD-WCAZA8276467 (sdg) 1953514584
Jul 12 09:56:04 Tower emhttp: ST3500418AS_9VM5HX2G (sdh) 488386584
Jul 12 09:56:04 Tower kernel: mdcmd (1): import 0 8,16 3907018532 ST4000DM000-1F2168_Z300D00R
Jul 12 09:56:04 Tower kernel: md: import disk0: [8,16] (sdb) ST4000DM000-1F2168_Z300D00R size: 3907018532
Jul 12 09:56:04 Tower kernel: mdcmd (2): import 1 22,0 1953514552 WDC_WD20EADS-00R6B0_WD-WCAVY2267178
Jul 12 09:56:04 Tower kernel: md: import disk1: [22,0] (hdc) WDC_WD20EADS-00R6B0_WD-WCAVY2267178 size: 1953514552
Jul 12 09:56:04 Tower emhttp: shcmd (109): /usr/local/sbin/emhttp_event driver_loaded
Jul 12 09:56:04 Tower kernel: mdcmd (3): import 2 8,64 1465138552 WDC_WD15EADS-00P8B0_WD-WMAVU0108337
Jul 12 09:56:04 Tower kernel: md: import disk2: [8,64] (sde) WDC_WD15EADS-00P8B0_WD-WMAVU0108337 size: 1465138552
Jul 12 09:56:04 Tower kernel: mdcmd (4): import 3 8,48 1953514552 WDC_WD20EARS-00MVWB0_WD-WMAZA1789360
Jul 12 09:56:04 Tower kernel: md: import disk3: [8,48] (sdd) WDC_WD20EARS-00MVWB0_WD-WMAZA1789360 size: 1953514552
Jul 12 09:56:04 Tower kernel: mdcmd (5): import 4 8,96 1953514552 WDC_WD20EARX-00PASB0_WD-WCAZA8276467
Jul 12 09:56:04 Tower kernel: md: import disk4: [8,96] (sdg) WDC_WD20EARX-00PASB0_WD-WCAZA8276467 size: 1953514552
Jul 12 09:56:04 Tower kernel: mdcmd (6): import 5 8,32 2930266532 WDC_WD30EZRX-00DC0B0_WD-WMC1T0564002
Jul 12 09:56:04 Tower kernel: md: import disk5: [8,32] (sdc) WDC_WD30EZRX-00DC0B0_WD-WMC1T0564002 size: 2930266532
Jul 12 09:56:04 Tower kernel: mdcmd (7): import 6 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (: import 7 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (9): import 8 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (10): import 9 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (11): import 10 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (12): import 11 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (13): import 12 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (14): import 13 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (15): import 14 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (16): import 15 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (17): import 16 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (18): import 17 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (19): import 18 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (20): import 19 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (21): import 20 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (22): import 21 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (23): import 22 0,0
Jul 12 09:56:04 Tower kernel: mdcmd (24): import 23 0,0
Jul 12 09:56:04 Tower emhttp_event: driver_loaded
Jul 12 09:56:06 Tower emhttp: shcmd (110): rmmod md-mod |& logger
Jul 12 09:56:06 Tower emhttp: shcmd (111): modprobe md-mod super=/boot/config/super.dat slots=24 |& logger
Jul 12 09:56:06 Tower kernel: md: unRAID driver removed
Jul 12 09:56:06 Tower emhttp: shcmd (112): udevadm settle
Jul 12 09:56:06 Tower kernel: md: unRAID driver 2.1.6 installed
Jul 12 09:56:06 Tower emhttp: Device inventory:
Jul 12 09:56:06 Tower emhttp: WDC_WD20EADS-00R6B0_WD-WCAVY2267178 (hdc) 1953514584
Jul 12 09:56:06 Tower emhttp: ST4000DM000-1F2168_Z300D00R (sdb) 3907018584
Jul 12 09:56:06 Tower emhttp: WDC_WD30EZRX-00DC0B0_WD-WMC1T0564002 (sdc) 2930266584
Jul 12 09:56:06 Tower emhttp: WDC_WD20EARS-00MVWB0_WD-WMAZA1789360 (sdd) 1953514584
Jul 12 09:56:06 Tower emhttp: WDC_WD15EADS-00P8B0_WD-WMAVU0108337 (sde) 1465138584
Jul 12 09:56:06 Tower emhttp: (sdf) 0
Jul 12 09:56:06 Tower emhttp: WDC_WD20EARX-00PASB0_WD-WCAZA8276467 (sdg) 1953514584
Jul 12 09:56:06 Tower emhttp: ST3500418AS_9VM5HX2G (sdh) 488386584
Jul 12 09:56:06 Tower kernel: mdcmd (1): import 0 8,16 3907018532 ST4000DM000-1F2168_Z300D00R
Jul 12 09:56:06 Tower kernel: md: import disk0: [8,16] (sdb) ST4000DM000-1F2168_Z300D00R size: 3907018532
Jul 12 09:56:06 Tower kernel: mdcmd (2): import 1 22,0 1953514552 WDC_WD20EADS-00R6B0_WD-WCAVY2267178
Jul 12 09:56:06 Tower kernel: md: import disk1: [22,0] (hdc) WDC_WD20EADS-00R6B0_WD-WCAVY2267178 size: 1953514552
Jul 12 09:56:06 Tower kernel: mdcmd (3): import 2 8,64 1465138552 WDC_WD15EADS-00P8B0_WD-WMAVU0108337
Jul 12 09:56:06 Tower kernel: md: import disk2: [8,64] (sde) WDC_WD15EADS-00P8B0_WD-WMAVU0108337 size: 1465138552
Jul 12 09:56:06 Tower kernel: mdcmd (4): import 3 8,48 1953514552 WDC_WD20EARS-00MVWB0_WD-WMAZA1789360
Jul 12 09:56:06 Tower kernel: md: import disk3: [8,48] (sdd) WDC_WD20EARS-00MVWB0_WD-WMAZA1789360 size: 1953514552
Jul 12 09:56:06 Tower kernel: mdcmd (5): import 4 8,96 1953514552 WDC_WD20EARX-00PASB0_WD-WCAZA8276467
Jul 12 09:56:06 Tower kernel: md: import disk4: [8,96] (sdg) WDC_WD20EARX-00PASB0_WD-WCAZA8276467 size: 1953514552
Jul 12 09:56:06 Tower emhttp: shcmd (113): /usr/local/sbin/emhttp_event driver_loaded
Jul 12 09:56:06 Tower kernel: mdcmd (6): import 5 8,32 2930266532 WDC_WD30EZRX-00DC0B0_WD-WMC1T0564002
Jul 12 09:56:06 Tower kernel: md: import disk5: [8,32] (sdc) WDC_WD30EZRX-00DC0B0_WD-WMC1T0564002 size: 2930266532
Jul 12 09:56:06 Tower kernel: mdcmd (7): import 6 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (: import 7 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (9): import 8 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (10): import 9 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (11): import 10 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (12): import 11 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (13): import 12 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (14): import 13 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (15): import 14 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (16): import 15 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (17): import 16 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (18): import 17 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (19): import 18 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (20): import 19 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (21): import 20 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (22): import 21 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (23): import 22 0,0
Jul 12 09:56:06 Tower kernel: mdcmd (24): import 23 0,0
Jul 12 09:56:06 Tower emhttp_event: driver_loaded
Jul 12 09:56:06 Tower emhttp: shcmd (114): rmmod md-mod |& logger
Jul 12 09:56:07 Tower emhttp: shcmd (115): modprobe md-mod super=/boot/config/super.dat slots=24 |& logger
Jul 12 09:56:07 Tower kernel: md: unRAID driver removed
Jul 12 09:56:07 Tower kernel: md: unRAID driver 2.1.6 installed
Jul 12 09:56:07 Tower emhttp: shcmd (116): udevadm settle

Link to comment

@madburg here is my log, what i tried was to use preclear was working while it did a pre read but once it started to write, disc 4 showed up as red and cache just stopped working in preclear it was writing at 350MB/s :)

these drives might be on hooked up to a  Supermicro AOC-SASLP-MVL8 any way to tell if it is?

 

im going to install those packages now

syslog.txt

Link to comment

Just took the server apart all drives that failed once i added this drive are all on the same controller Supermicro AOC-SASLP-MVL8

DISC 4 red ball and cache failed while trying to clear DISC 6 witch is also on the same controller

 

how do i tell unraid that this hard drive is fine so i don't have to rebuild it? and whats causing this problem? the controller?

Link to comment

I saw a problem with my card an 4 tb drives seems the spin up wait time was taking to long. See if there is a seting in the card bios to wait longer. Also seems thus happened to me more when i restarted not from a full power down. See if this can help. :-\

 

I put my bigest drivrs off the card on to my mother board and have not had any more problems.

 

Sent from my YP-G70 using Tapatalk 2

 

Link to comment

I initially labeled this thread for v4.6 because the syslog you attached involved UnRAID v4.6.  Later I saw that it was from January 23, so it probably was the wrong syslog attached.  Plus this syslog is for a system with only 4 SATA ports and no SAS card.

 

Without a syslog, it is hard to diagnose much, but there are a few things I can say from the tails you posted.  In the first one, drive sdf has completely stopped responding, is not even answering queries about its identity.  You probably should have aborted right there, and captured the syslog.  What we needed to see were the very first error messages when it began to have issues.  That would possibly have told us what was wrong right there.  Since it is sdf, we know it's the 6th drive to be assigned, and it was on sd 0:0:0:0, so it was probably attached to a large SAS card.  A clear was being performed, so you were adding a new drive, not PreCleared, and hopefully not the one known as sdf.  The only way to recover a non-responding drive is to reboot, after which you can determine if the drive is responding or not.  If not, then you check connections.  If according to the syslog it is responding normally, then you should run SMART tests on it, to determine if it is a good drive still.  If the drive is good, then you turn your attention to its cables or backplane connection, and its disk controller.

 

If I could emphasize one point, whenever you have an issue, capture the syslog right then!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...