July 18, 20178 yr Came home tonight to find that my new build server had rebooted and started a parity check. Well, actually I stopped the parity check before I found out it had rebooted, so I have restarted it. Not sure why it rebooted, but I am using a SAS2LP controller which has given me problems in the past in my other server where I stopped using it and went to a Dell H310. I may have to do that with this new build but in the meantime I've attached my diagnostics, if someone could be so kind as to take a peek and share their thoughts I'd appreciate it. movies-diagnostics-20170717-2203.zip
July 18, 20178 yr Community Expert Define 'New'. All new components or a recycle of used parts from inventory or a mix of the two categories?
July 18, 20178 yr Author All new components except for some of the drives which were recycled out of an existing Synology NAS and the HBA which had been used in previous servers.
July 18, 20178 yr Community Expert How long had the server been running before the reboot? How many hours are currently on the new components? Do you have a UPS on it? Is this by an chance one of the server type MB's? (If so, did you check to see what the restart options are power restore?) I would have a good look at the connectors and be sure that they all firmly seated onto the new motherboard as soon as the parity check is done.
July 18, 20178 yr Author The server had been running fine for seven or eight days before the reboot. I don't have a UPS connected to it currently no, not sure what the BIOS settings are set to re power. The drives that came out of the Synology have a little over two years on them the rest are new, and apart from the HBA, everything else is brand new as well. Have you had a peek at the logs?
July 18, 20178 yr Community Expert I got a chance to look at your syslog and I see several instances of this error near the end of the file: Jul 17 01:43:23 Movies kernel: ata4.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen Jul 17 01:43:23 Movies kernel: ata4.00: irq_stat 0x08000000, interface fatal error Jul 17 01:43:23 Movies kernel: ata4: SError: { UnrecovData HostInt 10B8B BadCRC } Jul 17 01:43:23 Movies kernel: ata4.00: failed command: READ DMA EXT Jul 17 01:43:23 Movies kernel: ata4.00: cmd 25/00:40:18:93:67/00:05:00:00:00/e0 tag 14 dma 688128 in Jul 17 01:43:23 Movies kernel: res 50/00:00:18:93:67/00:00:00:00:00/e0 Emask 0x50 (ATA bus error) Jul 17 01:43:23 Movies kernel: ata4.00: status: { DRDY } Jul 17 01:43:23 Movies kernel: ata4: hard resetting link Jul 17 01:43:23 Movies kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jul 17 01:43:23 Movies kernel: ata4.00: configured for UDMA/133 Exactly what it might mean or what action is required, I don't know. It might be that SAS2LP controller. I am hoping that one of the real Gurus will be able to jump in at this point and be able to give you some real good information about what is happening. You might also provide a bit more details regarding the when and why about this diagnostics file. Was it just before the reboot or is it the syslog to date after the reboot? My gut feeling is that you have some sort of hardware related error as most software errors tend to just lockup the server..
July 18, 20178 yr Community Expert CRC errors are usually caused by a bad SATA cable, ATA4 is disk3, connected on the onboard controller.
July 18, 20178 yr Author Which is interesting because I don't have any drives connected to onboard controllers. Everything is connected to the SAS2LP controller.
July 18, 20178 yr Author The log was from after the reboot. The server has been running parity check for the past sixteen and a half hours and has found and corrected 15 sync errors so far, two and a quarter hours left before its done.
July 18, 20178 yr Community Expert 13 minutes ago, ashman70 said: Which is interesting because I don't have any drives connected to onboard controllers. Everything is connected to the SAS2LP controller. Disk 3 and disk 4 are connected on the onboard controller, or they were when the diagnostics were saved.
July 18, 20178 yr Author Nope, I've never had disks connected to onboard controllers in this server, ever.
July 18, 20178 yr Community Expert Then you posted the wrong diagnostics, server has 10 disks total, 8 on the SAS2LP, 2 onboard.
July 18, 20178 yr Author You are right, I'm sorry, I'm having a bad day or something, yes there are 10 disks in the server 8 connected to the SAS2LP and two connected to onboard SATA ports.
July 20, 20178 yr Author So I am running fix common problems in troubleshooting mode, just looked at the logs and saw these entries from this morning. To me it looks like HBA trouble? Jul 20 09:57:56 Movies kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Jul 20 09:57:56 Movies kernel: sas: ata9: end_device-1:2: cmd error handler Jul 20 09:57:56 Movies kernel: sas: ata7: end_device-1:0: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata8: end_device-1:1: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata9: end_device-1:2: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata10: end_device-1:3: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata11: end_device-1:4: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata12: end_device-1:5: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata13: end_device-1:6: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata14: end_device-1:7: dev error handler Jul 20 09:57:56 Movies kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1 Jul 20 09:57:56 Movies kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Jul 20 09:57:56 Movies kernel: sas: ata9: end_device-1:2: cmd error handler Jul 20 09:57:56 Movies kernel: sas: ata7: end_device-1:0: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata8: end_device-1:1: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata9: end_device-1:2: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata10: end_device-1:3: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata11: end_device-1:4: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata12: end_device-1:5: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata13: end_device-1:6: dev error handler Jul 20 09:57:56 Movies kernel: sas: ata14: end_device-1:7: dev error handler Jul 20 09:57:56 Movies kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1
Archived
This topic is now archived and is closed to further replies.