January 13, 201313 yr Hi everyone, I was performing my monthly parity check (without auto-correction) on my unRAID 4.7. server. At some point it seemed to get stuck (i could not reach main or unmenu). Since i had cache dirs running on the background I thought that it might have something to do with the memory and i performed a hard restart, having removed cache_dirs from the go script. The server restarts ok, i can access it locally and through ssh but main and unmenu do not start (first time i get a 'wait..' in my browser and when I refresh it seems to try to load indefinitely). I also cannot see the server, the shares or the flash from my pc. I can also hear the HDs which probably means there is a parity check being performed after the improper restart, which is to be expected. Can you please give me some advice? Thanks a lot... syslog_00-32_14-1-2013.txt
January 13, 201313 yr Author What do you see on the system console? I see what I think I am supposed to see. the login prompt. and i can login normally. PS: i attached the syslog. i think i see something irregular (and a bit scary). Those 2 messages repeating a lot: Jan 14 00:14:50 Tower kernel: handle_stripe read error: 1000/6, count: 1 Jan 14 00:14:50 Tower kernel: md: disk6 read error Jan 14 00:15:21 Tower kernel: sas: --- Exit sas_scsi_recover_host Jan 14 00:15:52 Tower kernel: sas: command 0xf2069240, task 0xf29cd640, timed out: BLK_EH_NOT_HANDLED Jan 14 00:15:52 Tower kernel: sas: Enter sas_scsi_recover_host Jan 14 00:15:52 Tower kernel: sas: trying to find task 0xf29cd640 Jan 14 00:15:52 Tower kernel: sas: sas_scsi_find_task: aborting task 0xf29cd640 Jan 14 00:15:52 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1701:mvs_abort_task:rc= 5 Jan 14 00:15:52 Tower kernel: sas: sas_scsi_find_task: querying task 0xf29cd640 Jan 14 00:15:52 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1645:mvs_query_task:rc= 5 Jan 14 00:15:52 Tower kernel: sas: sas_scsi_find_task: task 0xf29cd640 failed to abort Jan 14 00:15:52 Tower kernel: sas: task 0xf29cd640 is not at LU: I_T recover Jan 14 00:15:52 Tower kernel: sas: I_T nexus reset for dev 0400000000000000 Jan 14 00:15:52 Tower kernel: sas: I_T 0400000000000000 recovered Jan 14 00:15:52 Tower kernel: sas: --- Exit sas_scsi_recover_host
January 13, 201313 yr Author At my 4th-5th attempt to restart, everything started fine. I really have no clue why that happened, I changed nothing and now I am wondering if I should do the parity check in case the same thing happens again....
January 16, 201313 yr Author Does disk6 have a non-zero error count on unRAID main? No, disk 6 shows no errors. Unfortunately, the parity check showed 866 errors, but I am thinking they might be due to the repeated hard restarts (Sorry for taking so long to respond while you try to help, too much work this week)
Archived
This topic is now archived and is closed to further replies.