Jump to content

unRAID needs to keep a log file thru a hard boot


JustinChase

Recommended Posts

I just had my entire server lock up, but I could still access SABnzbd, and whenever I stopped it from within the program, a few seconds later, it was running again.  i assume docker is trying to help me by restarting programs, I'm not really sure.

 

Regardless, I lost access to the GUI for about an hour, and even after several shutdown commands being sent from Putty, it would not shutdown, nor restart the server.

 

VERY frustrating.

 

I ended up hard booting the machine, but of course there is no syslog from that, so I can't help LimeTech figure out what went wrong.  This seems like a terrible oversight to not keep some kind of log when things like this happen.

 

Hopefully the reasons for the loss of GUI can be determined and resolved, so this never happens again, but that seems unlikely, so some kind of persistent logging seems like a really good use of developer resources.

Link to comment

So, I started a movie, and it got maybe an hour in, then it started stuttering, and generally not playing.  I looked at the server, and it was responding rather slowly.  here is what was in the log at the time.  The last bit is after I spun up all drives, and now the movie seems to be playing fine again.

 

 /usr/bin/tail -f /var/log/syslog 2>&1
Oct 3 19:28:18 media emhttp: shcmd (254): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service
Oct 3 19:28:18 media avahi-daemon[3364]: Files changed, reloading.
Oct 3 19:28:18 media avahi-daemon[3364]: Service group file /services/smb.service changed, reloading.
Oct 3 19:28:18 media emhttp: shcmd (255): pidof rpc.mountd &> /dev/null
Oct 3 19:28:18 media emhttp: Restart NFS...
Oct 3 19:28:18 media emhttp: shcmd (256): exportfs -ra |& logger
Oct 3 19:28:18 media emhttp: shcmd (257): /etc/rc.d/rc.atalk status
Oct 3 19:28:18 media emhttp: shcmd (258): /usr/local/sbin/emhttp_event svcs_restarted
Oct 3 19:28:18 media emhttp_event: svcs_restarted
Oct 3 19:28:19 media avahi-daemon[3364]: Service "media" (/services/smb.service) successfully established.
Oct 3 19:33:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 19:43:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 19:53:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:03:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:05:44 media kernel: ata9.00: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
Oct 3 20:05:44 media kernel: ata9.00: irq_stat 0x00400040, connection status changed
Oct 3 20:05:44 media kernel: ata9: SError: { HostInt PHYRdyChg 10B8B DevExch }
Oct 3 20:05:44 media kernel: ata9.00: failed command: FLUSH CACHE EXT
Oct 3 20:05:44 media kernel: ata9.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 19
Oct 3 20:05:44 media kernel: res 40/00:90:c9:51:21/00:00:1d:00:00/40 Emask 0x50 (ATA bus error)
Oct 3 20:05:44 media kernel: ata9.00: status: { DRDY }
Oct 3 20:05:44 media kernel: ata9: hard resetting link
Oct 3 20:05:46 media kernel: ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Oct 3 20:05:46 media kernel: ata9.00: configured for UDMA/33
Oct 3 20:05:46 media kernel: ata9.00: retrying FLUSH 0xea Emask 0x50
Oct 3 20:05:46 media kernel: ata9: EH complete
Oct 3 20:13:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:17:21 media kernel: ata9: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
Oct 3 20:17:21 media kernel: ata9: irq_stat 0x00400040, connection status changed
Oct 3 20:17:21 media kernel: ata9: SError: { HostInt PHYRdyChg 10B8B DevExch }
Oct 3 20:17:21 media kernel: ata9: hard resetting link
Oct 3 20:17:24 media kernel: ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Oct 3 20:17:24 media kernel: ata9.00: configured for UDMA/33
Oct 3 20:17:24 media kernel: ata9: EH complete
Oct 3 20:23:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:33:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:43:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:53:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 21:01:37 media kernel: mdcmd (38): spindown 6
Oct 3 21:01:38 media kernel: mdcmd (39): spindown 7
Oct 3 21:01:49 media kernel: mdcmd (40): spindown 2
Oct 3 21:01:50 media kernel: mdcmd (41): spindown 3
Oct 3 21:01:50 media kernel: mdcmd (42): spindown 8
Oct 3 21:01:51 media kernel: mdcmd (43): spindown 10
Oct 3 21:03:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 21:13:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 21:14:10 media emhttp: Spinning up all drives...
Oct 3 21:14:10 media emhttp: shcmd (259): /usr/sbin/hdparm -S0 /dev/sdh &> /dev/null
Oct 3 21:14:10 media kernel: mdcmd (44): spinup 0
Oct 3 21:14:10 media kernel: mdcmd (45): spinup 1
Oct 3 21:14:10 media kernel: mdcmd (46): spinup 2
Oct 3 21:14:10 media kernel: mdcmd (47): spinup 3
Oct 3 21:14:10 media kernel: mdcmd (48): spinup 4
Oct 3 21:14:10 media kernel: mdcmd (49): spinup 6
Oct 3 21:14:10 media kernel: mdcmd (50): spinup 7
Oct 3 21:14:10 media kernel: mdcmd (51): spinup 8
Oct 3 21:14:10 media kernel: mdcmd (52): spinup 9
Oct 3 21:14:10 media kernel: mdcmd (53): spinup 10

Link to comment

Although it might be nice for log files to survive a reboot the only place such a file could be logged to is the USB flash drive, and this could drastically shorten its lifetime.

 

It is easy enough to a line to the go file to get the log persisted to the flash drive when it is really needed if you are prepared to take the potential hit on its lifetime.

Link to comment

Although it might be nice for log files to survive a reboot the only place such a file could be logged to is the USB flash drive, and this could drastically shorten its lifetime.

 

It is easy enough to a line to the go file to get the log persisted to the flash drive when it is really needed if you are prepared to take the potential hit on its lifetime.

 

so that's not a great solution then.  Is there any reason it can't be logged to the cache drive or an array drive, as an option.  Perhaps only when 'persistent logging' is selected by the user, because of a problematic system?

Link to comment

Thanks. I have installed it now.

 

Is there a special command or method I need to use to use this to shutdown/restart the server, instead of the normal "shutdown -r now" that I try when I do have putty access?

 

If I can't connect via putty, and can't use the console, hitting the in/off button is my only other option.  Will this help me in that situation also?

Link to comment

Thanks. I have installed it now.

 

Is there a special command or method I need to use to use this to shutdown/restart the server, instead of the normal "shutdown -r now" that I try when I do have putty access?

 

If I can't connect via putty, and can't use the console, hitting the in/off button is my only other option.  Will this help me in that situation also?

 

Use "powerdown" to shutdown.

Use "powerdown -r" to reboot.

 

Pressing the power button will also run the powerdown script.  Ctl-Alt-Del at the console will also shutdown and run the powerdown script.

 

All of these situations should prevent the parity check on restart unless something would not shut down.  If you get an unclean shutdown, powerdown will log enough information to help track down the culprit.  All of these shutdowns will save a syslog to the flash.

Link to comment

You were able to capture the log with powerdown?

 

I haven't had to shutdown yet, but will do so soon, as I need to swap out a failing drive.

 

I would check the connections to this drive ST3500830AS_6QG123RD (sdh) and smart reports.

 

That's my cache drive.  I'll check the connections when I swap out the other drive.  Thanks for taking a look and giving me advice about a solution.

Link to comment

I used the powerdown command from putty, but I don't see any syslog on the flash drive, other than the one I uploaded earlier.

 

Also, my cache drive seems to have died.  i replaced the cable, not recognized.  i moved it to a different slot on my SATA card, not recognized.  i replaced the cable with another (3rd), not recognized, I moved that to a different slot, not recognized.  I moved it to the last available slot, not recognized.

 

Hard to believe it completely died, and unRAID can't recognize it at all any more, but it seems to be the case.

 

I decided to mount the new/replacement drive for my red-balled disk5, so that's going to take some time to rebuild before I can try anything else.

Link to comment

I used the powerdown command from putty, but I don't see any syslog on the flash drive, other than the one I uploaded earlier.

 

Unless LOGDIR is set, Powerdown saves syslogs to the logs folder of the flash drive (i.e. /boot/logs).

 

You have certainly had more than your share of hardware issues!

Link to comment

Ah, yes, there they are, in the /logs/ folder.  thanks for that.

 

Should i be concerned about ata1?

 

Oct  4 15:09:24 media emhttp: shcmd (158): /sbin/poweroff
Oct  4 15:09:24 media shutdown[3279]: shutting down for system halt
Oct  4 15:09:24 media init: Switching to runlevel: 0
Oct  4 15:09:24 media kernel: ata1: SATA link down (SStatus 0 SControl 300)
Oct  4 15:09:24 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:24 media kernel: ata1: EH complete
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm] START_STOP FAILED
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm]  
Oct  4 15:09:24 media kernel: Result: hostbyte=0x04 driverbyte=0x00
Oct  4 15:09:24 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:24 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:24 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:24 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:24 media kernel: ata1: hard resetting link
Oct  4 15:09:25 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:25 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:25 media kernel: ata1: EH complete
Oct  4 15:09:25 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:25 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:25 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:25 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:25 media kernel: ata1: hard resetting link
Oct  4 15:09:27 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:27 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:27 media kernel: ata1: EH complete
Oct  4 15:09:27 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:27 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:27 media kernel: ata1: hard resetting link
Oct  4 15:09:29 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:29 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:29 media kernel: ata1: EH complete
Oct  4 15:09:29 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:29 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:29 media kernel: ata1: hard resetting link
Oct  4 15:09:30 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:30 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:30 media kernel: ata1: EH complete
Oct  4 15:09:30 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:30 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:30 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:30 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:30 media kernel: ata1: hard resetting link
Oct  4 15:09:31 media rc.unRAID[3288][3289]: Powerdown V2.12
Oct  4 15:09:31 media rc.unRAID[3288][3294]: Array is Stopped

Link to comment

 

 

Ah, yes, there they are, in the /logs/ folder.  thanks for that.

 

Should i be concerned about ata1?

 

Oct  4 15:09:24 media emhttp: shcmd (158): /sbin/poweroff
Oct  4 15:09:24 media shutdown[3279]: shutting down for system halt
Oct  4 15:09:24 media init: Switching to runlevel: 0
Oct  4 15:09:24 media kernel: ata1: SATA link down (SStatus 0 SControl 300)
Oct  4 15:09:24 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:24 media kernel: ata1: EH complete
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm] START_STOP FAILED
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm]  
Oct  4 15:09:24 media kernel: Result: hostbyte=0x04 driverbyte=0x00
Oct  4 15:09:24 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:24 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:24 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:24 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:24 media kernel: ata1: hard resetting link
Oct  4 15:09:25 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:25 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:25 media kernel: ata1: EH complete
Oct  4 15:09:25 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:25 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:25 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:25 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:25 media kernel: ata1: hard resetting link
Oct  4 15:09:27 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:27 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:27 media kernel: ata1: EH complete
Oct  4 15:09:27 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:27 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:27 media kernel: ata1: hard resetting link
Oct  4 15:09:29 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:29 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:29 media kernel: ata1: EH complete
Oct  4 15:09:29 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:29 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:29 media kernel: ata1: hard resetting link
Oct  4 15:09:30 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:30 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:30 media kernel: ata1: EH complete
Oct  4 15:09:30 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:30 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:30 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:30 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:30 media kernel: ata1: hard resetting link
Oct  4 15:09:31 media rc.unRAID[3288][3289]: Powerdown V2.12
Oct  4 15:09:31 media rc.unRAID[3288][3294]: Array is Stopped

 

I would check swap cables. Check for bios updates. If you text search your log you should be able to figure out which drive ata1 is.

Link to comment

Hmmm...

 

That's my cache drive, which is no longer recognized by unRAID.  I've tried 3 cables, and 4 different SATA slots and none can recognize that drive any longer.

 

It went 'unformatted' when it was formatted as btrfs, but I reformatted it to xfs, then it was fine for a few days.  When I went to replace my disk5, then restarted the cache drive never showed up again.  i've tried swapping and rebooting about 8-10 times, but unRAID has never seen it again.

 

I just finished pre-clearing the disk5 that was giving me problems, and it looks fine, so I'm going to use it to replace a 1TB drive, then was thinking of using the 1TB drive as my cache, but I'd really rather get the cache drive to be recognized again.

 

I will probably put it in a spare machine and try pre-clearing it (if it gets recognized) and see if it tests okay.  if so, maybe I will need to purchase a new SATA cable to put in the server.

 

I'm so tired of drive problems.  I hope to have them all behind me soon!

 

thanks for looking at my log.

 

Oh, and I hope the powerdown functionality gets baked into unRAID soon :)

Link to comment

Ah, yes, there they are, in the /logs/ folder.  thanks for that.

 

Should i be concerned about ata1?

 

Oct  4 15:09:24 media emhttp: shcmd (158): /sbin/poweroff
Oct  4 15:09:24 media shutdown[3279]: shutting down for system halt
Oct  4 15:09:24 media init: Switching to runlevel: 0
Oct  4 15:09:24 media kernel: ata1: SATA link down (SStatus 0 SControl 300)
Oct  4 15:09:24 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:24 media kernel: ata1: EH complete
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm] START_STOP FAILED
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm]  
Oct  4 15:09:24 media kernel: Result: hostbyte=0x04 driverbyte=0x00
Oct  4 15:09:24 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:24 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:24 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:24 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:24 media kernel: ata1: hard resetting link
Oct  4 15:09:25 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:25 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:25 media kernel: ata1: EH complete
Oct  4 15:09:25 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:25 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:25 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:25 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:25 media kernel: ata1: hard resetting link
Oct  4 15:09:27 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:27 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:27 media kernel: ata1: EH complete
Oct  4 15:09:27 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:27 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:27 media kernel: ata1: hard resetting link
Oct  4 15:09:29 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:29 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:29 media kernel: ata1: EH complete
Oct  4 15:09:29 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:29 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:29 media kernel: ata1: hard resetting link
Oct  4 15:09:30 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:30 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:30 media kernel: ata1: EH complete
Oct  4 15:09:30 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:30 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:30 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:30 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:30 media kernel: ata1: hard resetting link
Oct  4 15:09:31 media rc.unRAID[3288][3289]: Powerdown V2.12
Oct  4 15:09:31 media rc.unRAID[3288][3294]: Array is Stopped

 

There must be a lot of other history in the previous sections of the syslog concerning ata1.  It appears to be in a strange state at the moment, as it knows there was a drive here, set up as sdm, but cannot find it any more.  Normally, there would not be more error messages if the drive is truly gone, but it seems to think something is still there, but it cannot even raise the SATA link, without which it cannot talk to the drive.  In addition, the fact that this is on the very first SCSI channel (sd 0:0:0:0) and was assigned the very first ATA channel (ata1), yet did not get a drive ID assignment until sdm seems indicative of a long delay in setting up the drive, and that implies there was trouble.  Normally it would get sda or something close to that, not sdm.

 

I see you just posted, and this is an unrecognized drive, which doesn't surprise me.  The drive HAS to establish a SATA link first, then it HAS to respond to IDENTIFY requests, and then the normal SATA communications begin.  This drive is probably 'broken'.

Link to comment

interesting.  Any way to 'fix' while still in unRAID, or should I try to put it as the only drive in a raw unRAID install in a different machine?

 

Full log attached.  As is a previous log, in case it's useful.

 

About the only difference between the 2 syslogs is which port the bad drive was connected to, first port on one and 4th port on the other.  For most of its history in either syslog, the drive had no SATA link at all, yet the system knew something was there.  On the first port, it did raise the SATA link very briefly (after quite a delay), assigned it to sdm, then the SATA link went back down and the drive was disabled.  After the array has fully started and all is quiet, then in both syslogs the drive's SATA link comes up at its slowest speed, with the kernel complaining about very slow response even at that speed.  The drive was identified and assigned (or re-assigned) sdm.  It stayed up for a little, then other drive activity began and in both syslogs the SATA link was lost.  That makes me wonder if you are badly underpowered?

 

In both syslogs, another drive (sdc, Disk 2, Hitachi_HDS5C3030ALA630_MJ1323YNG1U3PC) also had trouble, at the same point, just as it was being mounted.  They were strictly interface errors, not drive issues, so the drive is probably fine.  Again, I have to wonder if limited power was to blame.

 

At one point, when it was struggling to communicate with the bad drive and had just gotten the drive to identify itself, it reported the following:

Oct  4 14:55:16 media kernel: ata4.00: Drive reports diagnostics failure. This may indicate a drive

Oct  4 14:55:16 media kernel: ata4.00: fault or invalid emulation. Contact drive vendor for information.

I seriously doubt you can get a SMART report for the drive, and that it looks good.  If replacing the power supply with a better stronger one does not help, then I would not waste another minute on this drive.

Link to comment

hmmm...

 

is 650 watts not enough power for a dozen drives?  This is my power supply...

 

Rosewill - Capstone 650W Continuous, Single Rail, 80 PLUS GOLD Active PFC

 

With all drives spun up, according to the UPS plugin, I'm using 130 watts.  it seems that that should be okay.

 

It could be a cable issue, I suppose.  Maybe I'll buy some new SATA cables and replace the ones I have in there now.  the ones I have are all the free ones they send with hard drives, but they all seem to be of okay quality, as far as I can tell.

 

thanks for taking such a thorough look at my syslogs for me; I really do appreciate it!!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...