big trouble with my array


caplam

Recommended Posts

i continue to see sas errors in log:

Oct 24 19:37:26 godzilla kernel: sas: Enter sas_scsi_recover_host busy: 9 failed: 9
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x00000000b6e063ac
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x00000000b6e063ac
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 00000000b6e063ac, old_request == 00000000faf8b36b
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 00000000b6e063ac , old_request == 00000000faf8b36b
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x00000000b6e063ac is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x00000000b6e063ac is done
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x000000001fb3ea36
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x000000001fb3ea36
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 000000001fb3ea36, old_request == 00000000f191f1ed
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 000000001fb3ea36 , old_request == 00000000f191f1ed
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x000000001fb3ea36 is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x000000001fb3ea36 is done
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x0000000029adb82b
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x0000000029adb82b
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 0000000029adb82b, old_request == 0000000057d81343
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 0000000029adb82b , old_request == 0000000057d81343
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x0000000029adb82b is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x0000000029adb82b is done
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x000000001f95191f
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x000000001f95191f
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 000000001f95191f, old_request == 00000000a10e1ea9
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 000000001f95191f , old_request == 00000000a10e1ea9
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x000000001f95191f is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x000000001f95191f is done
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x000000005c2436ca
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x000000005c2436ca
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 000000005c2436ca, old_request == 000000009da361b3
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 000000005c2436ca , old_request == 000000009da361b3
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x000000005c2436ca is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x000000005c2436ca is done
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x0000000023c5b3a7
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x0000000023c5b3a7
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 0000000023c5b3a7, old_request == 0000000047e60ca1
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 0000000023c5b3a7 , old_request == 0000000047e60ca1
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x0000000023c5b3a7 is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x0000000023c5b3a7 is done
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x000000006facca72
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x000000006facca72
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 000000006facca72, old_request == 0000000024adc4d9
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 000000006facca72 , old_request == 0000000024adc4d9
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x000000006facca72 is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x000000006facca72 is done
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x00000000520d3197
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x00000000520d3197
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 00000000520d3197, old_request == 00000000d6f7823f
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 00000000520d3197 , old_request == 00000000d6f7823f
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x00000000520d3197 is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x00000000520d3197 is done
Oct 24 19:37:26 godzilla kernel: sas: trying to find task 0x0000000039adefa1
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: aborting task 0x0000000039adefa1
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: dev = 00000000ae73c6a0 (STP/SATA), task = 0000000039adefa1, old_request == 000000001462f4c3
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: SATA/STP request or complete_in_target (1), or IDEV_GONE (0), thus no TMF
Oct 24 19:37:26 godzilla kernel: isci 0000:02:00.0: isci_task_abort_task: Done; dev = 00000000ae73c6a0, task = 0000000039adefa1 , old_request == 000000001462f4c3
Oct 24 19:37:26 godzilla kernel: sas: sas_scsi_find_task: task 0x0000000039adefa1 is done
Oct 24 19:37:26 godzilla kernel: sas: sas_eh_handle_sas_errors: task 0x0000000039adefa1 is done
Oct 24 19:37:26 godzilla kernel: sas: ata10: end_device-9:0: cmd error handler
Oct 24 19:37:26 godzilla kernel: sas: ata10: end_device-9:0: dev error handler
Oct 24 19:37:26 godzilla kernel: ata10.00: exception Emask 0x0 SAct 0x3fe00 SErr 0x0 action 0x6 frozen
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/98:00:38:22:0b/02:00:00:00:00/40 tag 9 ncq dma 339968 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/98:00:d0:24:0b/01:00:00:00:00/40 tag 10 ncq dma 208896 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: sas: ata14: end_device-9:1: dev error handler
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/38:00:68:26:0b/02:00:00:00:00/40 tag 11 ncq dma 290816 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: sas: ata15: end_device-9:2: dev error handler
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/08:00:f8:87:c1/00:00:2a:00:00/40 tag 12 ncq dma 4096 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/28:00:a0:28:0b/01:00:00:00:00/40 tag 13 ncq dma 151552 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/70:00:c8:29:0b/00:00:00:00:00/40 tag 14 ncq dma 57344 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/20:00:e8:04:62/00:00:5d:01:00/40 tag 15 ncq dma 16384 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/58:00:38:2a:0b/02:00:00:00:00/40 tag 16 ncq dma 307200 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10.00: failed command: READ FPDMA QUEUED
Oct 24 19:37:26 godzilla kernel: ata10.00: cmd 60/50:00:90:2c:0b/00:00:00:00:00/40 tag 17 ncq dma 40960 in
Oct 24 19:37:26 godzilla kernel:         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 24 19:37:26 godzilla kernel: ata10.00: status: { DRDY }
Oct 24 19:37:26 godzilla kernel: ata10: hard resetting link
Oct 24 19:37:26 godzilla kernel: ata10.00: configured for UDMA/133
Oct 24 19:37:26 godzilla kernel: ata10: EH complete
Oct 24 19:37:26 godzilla kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 9 tries: 1

 

Link to comment

For now i have not access to my server.
But rebuild started and i had still errors on sas controllers.
So i rebooted and a new rebuild started.
Now it’s finished and the array is ok. Disk2 is enabled. I have no more sas errors.
As i seem very unlucky this week end the replacement disk throw some read error rate.
I have messages that point to raw read error rate and few minutes or hours later it’s back to normal.
I am also running a 2 passes preclear on the first replacement disk2.

Link to comment

I thought i was fine but i'm definitely a "black cat" (french expression to say very unlucky).

This night smbd process crashed during vm backup. This morning some dockers and some vm were unresponsive. I had to kill smbd to be able to stop array and reboot. It took me one hour. I hope this is a one time bug (i never had this before). 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.