unRAID needs to keep a log file thru a hard boot

October 3, 201411 yr

I just had my entire server lock up, but I could still access SABnzbd, and whenever I stopped it from within the program, a few seconds later, it was running again. i assume docker is trying to help me by restarting programs, I'm not really sure.

Regardless, I lost access to the GUI for about an hour, and even after several shutdown commands being sent from Putty, it would not shutdown, nor restart the server.

VERY frustrating.

I ended up hard booting the machine, but of course there is no syslog from that, so I can't help LimeTech figure out what went wrong. This seems like a terrible oversight to not keep some kind of log when things like this happen.

Hopefully the reasons for the loss of GUI can be determined and resolved, so this never happens again, but that seems unlikely, so some kind of persistent logging seems like a really good use of developer resources.

Quote

October 3, 201411 yr

Leave the log window open until is crashes again.

Quote

October 3, 201411 yr

Author

Thanks. I suppose that might work, but I don't know when it will crash, so leaving a window open all the time in case it dies seems like a less-than-ideal 'solution'.

Quote

October 4, 201411 yr

Author

So, I started a movie, and it got maybe an hour in, then it started stuttering, and generally not playing. I looked at the server, and it was responding rather slowly. here is what was in the log at the time. The last bit is after I spun up all drives, and now the movie seems to be playing fine again.

 /usr/bin/tail -f /var/log/syslog 2>&1
Oct 3 19:28:18 media emhttp: shcmd (254): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service
Oct 3 19:28:18 media avahi-daemon[3364]: Files changed, reloading.
Oct 3 19:28:18 media avahi-daemon[3364]: Service group file /services/smb.service changed, reloading.
Oct 3 19:28:18 media emhttp: shcmd (255): pidof rpc.mountd &> /dev/null
Oct 3 19:28:18 media emhttp: Restart NFS...
Oct 3 19:28:18 media emhttp: shcmd (256): exportfs -ra |& logger
Oct 3 19:28:18 media emhttp: shcmd (257): /etc/rc.d/rc.atalk status
Oct 3 19:28:18 media emhttp: shcmd (258): /usr/local/sbin/emhttp_event svcs_restarted
Oct 3 19:28:18 media emhttp_event: svcs_restarted
Oct 3 19:28:19 media avahi-daemon[3364]: Service "media" (/services/smb.service) successfully established.
Oct 3 19:33:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 19:43:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 19:53:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:03:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:05:44 media kernel: ata9.00: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
Oct 3 20:05:44 media kernel: ata9.00: irq_stat 0x00400040, connection status changed
Oct 3 20:05:44 media kernel: ata9: SError: { HostInt PHYRdyChg 10B8B DevExch }
Oct 3 20:05:44 media kernel: ata9.00: failed command: FLUSH CACHE EXT
Oct 3 20:05:44 media kernel: ata9.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 19
Oct 3 20:05:44 media kernel: res 40/00:90:c9:51:21/00:00:1d:00:00/40 Emask 0x50 (ATA bus error)
Oct 3 20:05:44 media kernel: ata9.00: status: { DRDY }
Oct 3 20:05:44 media kernel: ata9: hard resetting link
Oct 3 20:05:46 media kernel: ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Oct 3 20:05:46 media kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Oct 3 20:05:46 media kernel: ata9.00: configured for UDMA/33
Oct 3 20:05:46 media kernel: ata9.00: retrying FLUSH 0xea Emask 0x50
Oct 3 20:05:46 media kernel: ata9: EH complete
Oct 3 20:13:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:17:21 media kernel: ata9: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
Oct 3 20:17:21 media kernel: ata9: irq_stat 0x00400040, connection status changed
Oct 3 20:17:21 media kernel: ata9: SError: { HostInt PHYRdyChg 10B8B DevExch }
Oct 3 20:17:21 media kernel: ata9: hard resetting link
Oct 3 20:17:24 media kernel: ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Oct 3 20:17:24 media kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Oct 3 20:17:24 media kernel: ata9.00: configured for UDMA/33
Oct 3 20:17:24 media kernel: ata9: EH complete
Oct 3 20:23:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:33:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:43:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 20:53:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 21:01:37 media kernel: mdcmd (38): spindown 6
Oct 3 21:01:38 media kernel: mdcmd (39): spindown 7
Oct 3 21:01:49 media kernel: mdcmd (40): spindown 2
Oct 3 21:01:50 media kernel: mdcmd (41): spindown 3
Oct 3 21:01:50 media kernel: mdcmd (42): spindown 8
Oct 3 21:01:51 media kernel: mdcmd (43): spindown 10
Oct 3 21:03:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 21:13:38 media apcupsd[1855]: Communications with UPS lost.
Oct 3 21:14:10 media emhttp: Spinning up all drives...
Oct 3 21:14:10 media emhttp: shcmd (259): /usr/sbin/hdparm -S0 /dev/sdh &> /dev/null
Oct 3 21:14:10 media kernel: mdcmd (44): spinup 0
Oct 3 21:14:10 media kernel: mdcmd (45): spinup 1
Oct 3 21:14:10 media kernel: mdcmd (46): spinup 2
Oct 3 21:14:10 media kernel: mdcmd (47): spinup 3
Oct 3 21:14:10 media kernel: mdcmd (48): spinup 4
Oct 3 21:14:10 media kernel: mdcmd (49): spinup 6
Oct 3 21:14:10 media kernel: mdcmd (50): spinup 7
Oct 3 21:14:10 media kernel: mdcmd (51): spinup 8
Oct 3 21:14:10 media kernel: mdcmd (52): spinup 9
Oct 3 21:14:10 media kernel: mdcmd (53): spinup 10

Quote

October 4, 201411 yr

Community Expert

Although it might be nice for log files to survive a reboot the only place such a file could be logged to is the USB flash drive, and this could drastically shorten its lifetime.

It is easy enough to a line to the go file to get the log persisted to the flash drive when it is really needed if you are prepared to take the potential hit on its lifetime.

Quote

October 4, 201411 yr

Author

Although it might be nice for log files to survive a reboot the only place such a file could be logged to is the USB flash drive, and this could drastically shorten its lifetime.

It is easy enough to a line to the go file to get the log persisted to the flash drive when it is really needed if you are prepared to take the potential hit on its lifetime.

so that's not a great solution then. Is there any reason it can't be logged to the cache drive or an array drive, as an option. Perhaps only when 'persistent logging' is selected by the user, because of a problematic system?

Quote

October 4, 201411 yr

Install the powerdown plugin and use it to shut down your system from the console or a telnet session when your server locks up. It saves the syslog to the flash drive when it shuts down and keeps a history of logs.

The powerdown plugin is available here: http://lime-technology.com/forum/index.php?topic=31735.0

Quote

October 4, 201411 yr

Author

Thanks. I have installed it now.

Is there a special command or method I need to use to use this to shutdown/restart the server, instead of the normal "shutdown -r now" that I try when I do have putty access?

If I can't connect via putty, and can't use the console, hitting the in/off button is my only other option. Will this help me in that situation also?

Quote

October 4, 201411 yr

Thanks. I have installed it now.

Is there a special command or method I need to use to use this to shutdown/restart the server, instead of the normal "shutdown -r now" that I try when I do have putty access?

If I can't connect via putty, and can't use the console, hitting the in/off button is my only other option. Will this help me in that situation also?

Use "powerdown" to shutdown.

Use "powerdown -r" to reboot.

Pressing the power button will also run the powerdown script. Ctl-Alt-Del at the console will also shutdown and run the powerdown script.

All of these situations should prevent the parity check on restart unless something would not shut down. If you get an unclean shutdown, powerdown will log enough information to help track down the culprit. All of these shutdowns will save a syslog to the flash.

Quote

October 4, 201411 yr

Author

awesome. thank you again.

Quote

October 4, 201411 yr

If you have a full log I could see which drive ata9. Do you have your motherboard sata in ahci or raid mode?

Quote

October 4, 201411 yr

Author

If you have a full log I could see which drive ata9. Do you have your motherboard sata in ahci or raid mode?

It's huge. I split in 2.

syslog1.zip

Quote

October 4, 201411 yr

Author

part 2

I believe I have AHCI set in the BIOS for the onboard SATA controller

syslog2.zip

Quote

October 4, 201411 yr

If you have a full log I could see which drive ata9. Do you have your motherboard sata in ahci or raid mode?

It's huge. I split in 2.

You were able to capture the log with powerdown?

Quote

October 4, 201411 yr

part 2

I believe I have AHCI set in the BIOS for the onboard SATA controller

I would check the connections to this drive ST3500830AS_6QG123RD (sdh) and smart reports.

Quote

October 4, 201411 yr

Author

You were able to capture the log with powerdown?

I haven't had to shutdown yet, but will do so soon, as I need to swap out a failing drive.

I would check the connections to this drive ST3500830AS_6QG123RD (sdh) and smart reports.

That's my cache drive. I'll check the connections when I swap out the other drive. Thanks for taking a look and giving me advice about a solution.

Quote

October 4, 201411 yr

Author

I used the powerdown command from putty, but I don't see any syslog on the flash drive, other than the one I uploaded earlier.

Also, my cache drive seems to have died. i replaced the cable, not recognized. i moved it to a different slot on my SATA card, not recognized. i replaced the cable with another (3rd), not recognized, I moved that to a different slot, not recognized. I moved it to the last available slot, not recognized.

Hard to believe it completely died, and unRAID can't recognize it at all any more, but it seems to be the case.

I decided to mount the new/replacement drive for my red-balled disk5, so that's going to take some time to rebuild before I can try anything else.

Quote

October 6, 201411 yr

I used the powerdown command from putty, but I don't see any syslog on the flash drive, other than the one I uploaded earlier.

Unless LOGDIR is set, Powerdown saves syslogs to the logs folder of the flash drive (i.e. /boot/logs).

You have certainly had more than your share of hardware issues!

Quote

October 6, 201411 yr

Author

Ah, yes, there they are, in the /logs/ folder. thanks for that.

Should i be concerned about ata1?

Oct  4 15:09:24 media emhttp: shcmd (158): /sbin/poweroff
Oct  4 15:09:24 media shutdown[3279]: shutting down for system halt
Oct  4 15:09:24 media init: Switching to runlevel: 0
Oct  4 15:09:24 media kernel: ata1: SATA link down (SStatus 0 SControl 300)
Oct  4 15:09:24 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:24 media kernel: ata1: EH complete
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm] START_STOP FAILED
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm]  
Oct  4 15:09:24 media kernel: Result: hostbyte=0x04 driverbyte=0x00
Oct  4 15:09:24 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:24 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:24 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:24 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:24 media kernel: ata1: hard resetting link
Oct  4 15:09:25 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:25 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:25 media kernel: ata1: EH complete
Oct  4 15:09:25 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:25 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:25 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:25 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:25 media kernel: ata1: hard resetting link
Oct  4 15:09:27 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:27 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:27 media kernel: ata1: EH complete
Oct  4 15:09:27 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:27 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:27 media kernel: ata1: hard resetting link
Oct  4 15:09:29 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:29 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:29 media kernel: ata1: EH complete
Oct  4 15:09:29 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:29 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:29 media kernel: ata1: hard resetting link
Oct  4 15:09:30 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:30 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:30 media kernel: ata1: EH complete
Oct  4 15:09:30 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:30 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:30 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:30 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:30 media kernel: ata1: hard resetting link
Oct  4 15:09:31 media rc.unRAID[3288][3289]: Powerdown V2.12
Oct  4 15:09:31 media rc.unRAID[3288][3294]: Array is Stopped

Quote

October 6, 201411 yr

Ah, yes, there they are, in the /logs/ folder. thanks for that.

Should i be concerned about ata1?

Oct  4 15:09:24 media emhttp: shcmd (158): /sbin/poweroff
Oct  4 15:09:24 media shutdown[3279]: shutting down for system halt
Oct  4 15:09:24 media init: Switching to runlevel: 0
Oct  4 15:09:24 media kernel: ata1: SATA link down (SStatus 0 SControl 300)
Oct  4 15:09:24 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:24 media kernel: ata1: EH complete
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm] START_STOP FAILED
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm]  
Oct  4 15:09:24 media kernel: Result: hostbyte=0x04 driverbyte=0x00
Oct  4 15:09:24 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:24 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:24 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:24 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:24 media kernel: ata1: hard resetting link
Oct  4 15:09:25 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:25 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:25 media kernel: ata1: EH complete
Oct  4 15:09:25 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:25 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:25 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:25 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:25 media kernel: ata1: hard resetting link
Oct  4 15:09:27 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:27 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:27 media kernel: ata1: EH complete
Oct  4 15:09:27 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:27 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:27 media kernel: ata1: hard resetting link
Oct  4 15:09:29 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:29 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:29 media kernel: ata1: EH complete
Oct  4 15:09:29 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:29 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:29 media kernel: ata1: hard resetting link
Oct  4 15:09:30 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:30 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:30 media kernel: ata1: EH complete
Oct  4 15:09:30 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:30 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:30 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:30 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:30 media kernel: ata1: hard resetting link
Oct  4 15:09:31 media rc.unRAID[3288][3289]: Powerdown V2.12
Oct  4 15:09:31 media rc.unRAID[3288][3294]: Array is Stopped

I would check swap cables. Check for bios updates. If you text search your log you should be able to figure out which drive ata1 is.

Quote

October 6, 201411 yr

Author

Hmmm...

That's my cache drive, which is no longer recognized by unRAID. I've tried 3 cables, and 4 different SATA slots and none can recognize that drive any longer.

It went 'unformatted' when it was formatted as btrfs, but I reformatted it to xfs, then it was fine for a few days. When I went to replace my disk5, then restarted the cache drive never showed up again. i've tried swapping and rebooting about 8-10 times, but unRAID has never seen it again.

I just finished pre-clearing the disk5 that was giving me problems, and it looks fine, so I'm going to use it to replace a 1TB drive, then was thinking of using the 1TB drive as my cache, but I'd really rather get the cache drive to be recognized again.

I will probably put it in a spare machine and try pre-clearing it (if it gets recognized) and see if it tests okay. if so, maybe I will need to purchase a new SATA cable to put in the server.

I'm so tired of drive problems. I hope to have them all behind me soon!

thanks for looking at my log.

Oh, and I hope the powerdown functionality gets baked into unRAID soon

Quote

October 6, 201411 yr

Ah, yes, there they are, in the /logs/ folder. thanks for that.

Should i be concerned about ata1?

Oct  4 15:09:24 media emhttp: shcmd (158): /sbin/poweroff
Oct  4 15:09:24 media shutdown[3279]: shutting down for system halt
Oct  4 15:09:24 media init: Switching to runlevel: 0
Oct  4 15:09:24 media kernel: ata1: SATA link down (SStatus 0 SControl 300)
Oct  4 15:09:24 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:24 media kernel: ata1: EH complete
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm] START_STOP FAILED
Oct  4 15:09:24 media kernel: sd 0:0:0:0: [sdm]  
Oct  4 15:09:24 media kernel: Result: hostbyte=0x04 driverbyte=0x00
Oct  4 15:09:24 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:24 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:24 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:24 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:24 media kernel: ata1: hard resetting link
Oct  4 15:09:25 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:25 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:25 media kernel: ata1: EH complete
Oct  4 15:09:25 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:25 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:25 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:25 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:25 media kernel: ata1: hard resetting link
Oct  4 15:09:27 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:27 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:27 media kernel: ata1: EH complete
Oct  4 15:09:27 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:27 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:27 media kernel: ata1: hard resetting link
Oct  4 15:09:29 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:29 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:29 media kernel: ata1: EH complete
Oct  4 15:09:29 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct  4 15:09:29 media kernel: ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00000000
Oct  4 15:09:29 media kernel: ata1: hard resetting link
Oct  4 15:09:30 media kernel: ata1: SATA link down (SStatus 0 SControl 310)
Oct  4 15:09:30 media kernel: ata1.00: link offline, clearing class 1 to NONE
Oct  4 15:09:30 media kernel: ata1: EH complete
Oct  4 15:09:30 media kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Oct  4 15:09:30 media kernel: ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
Oct  4 15:09:30 media kernel: ata1: SError: { PHYRdyChg DevExch }
Oct  4 15:09:30 media kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  4 15:09:30 media kernel: ata1: hard resetting link
Oct  4 15:09:31 media rc.unRAID[3288][3289]: Powerdown V2.12
Oct  4 15:09:31 media rc.unRAID[3288][3294]: Array is Stopped

There must be a lot of other history in the previous sections of the syslog concerning ata1. It appears to be in a strange state at the moment, as it knows there was a drive here, set up as sdm, but cannot find it any more. Normally, there would not be more error messages if the drive is truly gone, but it seems to think something is still there, but it cannot even raise the SATA link, without which it cannot talk to the drive. In addition, the fact that this is on the very first SCSI channel (sd 0:0:0:0) and was assigned the very first ATA channel (ata1), yet did not get a drive ID assignment until sdm seems indicative of a long delay in setting up the drive, and that implies there was trouble. Normally it would get sda or something close to that, not sdm.

I see you just posted, and this is an unrecognized drive, which doesn't surprise me. The drive HAS to establish a SATA link first, then it HAS to respond to IDENTIFY requests, and then the normal SATA communications begin. This drive is probably 'broken'.

Quote

October 6, 201411 yr

Author

interesting. Any way to 'fix' while still in unRAID, or should I try to put it as the only drive in a raw unRAID install in a different machine?

Full log attached. As is a previous log, in case it's useful.

syslog-20141004-150931.zip

syslog-20141004-145544.zip

Quote

October 7, 201411 yr

interesting. Any way to 'fix' while still in unRAID, or should I try to put it as the only drive in a raw unRAID install in a different machine?

Full log attached. As is a previous log, in case it's useful.

About the only difference between the 2 syslogs is which port the bad drive was connected to, first port on one and 4th port on the other. For most of its history in either syslog, the drive had no SATA link at all, yet the system knew something was there. On the first port, it did raise the SATA link very briefly (after quite a delay), assigned it to sdm, then the SATA link went back down and the drive was disabled. After the array has fully started and all is quiet, then in both syslogs the drive's SATA link comes up at its slowest speed, with the kernel complaining about very slow response even at that speed. The drive was identified and assigned (or re-assigned) sdm. It stayed up for a little, then other drive activity began and in both syslogs the SATA link was lost. That makes me wonder if you are badly underpowered?

In both syslogs, another drive (sdc, Disk 2, Hitachi_HDS5C3030ALA630_MJ1323YNG1U3PC) also had trouble, at the same point, just as it was being mounted. They were strictly interface errors, not drive issues, so the drive is probably fine. Again, I have to wonder if limited power was to blame.

At one point, when it was struggling to communicate with the bad drive and had just gotten the drive to identify itself, it reported the following:

Oct 4 14:55:16 media kernel: ata4.00: Drive reports diagnostics failure. This may indicate a drive
Oct 4 14:55:16 media kernel: ata4.00: fault or invalid emulation. Contact drive vendor for information.

I seriously doubt you can get a SMART report for the drive, and that it looks good. If replacing the power supply with a better stronger one does not help, then I would not waste another minute on this drive.

Quote

October 7, 201411 yr

Author

hmmm...

is 650 watts not enough power for a dozen drives? This is my power supply...

Rosewill - Capstone 650W Continuous, Single Rail, 80 PLUS GOLD Active PFC

With all drives spun up, according to the UPS plugin, I'm using 130 watts. it seems that that should be okay.

It could be a cable issue, I suppose. Maybe I'll buy some new SATA cables and replace the ones I have in there now. the ones I have are all the free ones they send with hard drives, but they all seem to be of okay quality, as far as I can tell.

thanks for taking such a thorough look at my syslogs for me; I really do appreciate it!!

Quote

unRAID needs to keep a log file thru a hard boot

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)