Tons of write errors suddenly on multiple drives


Recommended Posts

Been running these drives (and hardware) for well over a couple years now. Suddenly just started getting tons of write errors equally on 2 drives.

 

This probably isn't the hard drives then write? Should I be looking at either the controller or cables first?

 

 

write_errors.png

Link to comment

It appears to be a problem with the rocketRAID controller, there were timeouts in several if not all, of the disks, for example:

Mar 31 04:28:12 Storinator kernel: r750:[01:00 00] Start Soft Reset for 0/0
Mar 31 04:28:12 Storinator kernel: r750:[01:00 03] Start Soft Reset for 0/3
Mar 31 04:28:12 Storinator kernel: r750:[01:00 16] Start Soft Reset for 0/4
Mar 31 04:28:13 Storinator kernel: r750:[01:00 P3] Asyn Notification Received
Mar 31 04:28:21 Storinator kernel: r750:[01:00 13] Device request(1c) timeout.
Mar 31 04:28:21 Storinator kernel: r750:HIM_EVENT_DEVICE_TIMEOUT vd 00000000f0b5df3b
Mar 31 04:28:21 Storinator kernel: r750:[        ] Cdb [88, 0, 0, 0,  0, 0, 3,67, 87,10, 0, 0,  0,80, 0, 0].
Mar 31 04:28:21 Storinator kernel: r750:[        ] H2D FIS(Slot:1c): 00258127 40678710 00000003 00000080
Mar 31 04:28:23 Storinator kernel: r750:[01:00 P3] Asyn Notification Received
Mar 31 04:28:23 Storinator kernel: r750:[01:00 P3] GSCR changed
Mar 31 04:28:33 Storinator kernel: r750:[01:00 12] Start Soft Reset for 3/0
Mar 31 04:28:33 Storinator kernel: r750:[01:00 13] Start Soft Reset for 3/1
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Mar 31 04:28:34 Storinator kernel: r750:[01:00 P3] Asyn Notification Received
Mar 31 04:28:42 Storinator kernel: r750:[01:00 13] Device request(39) timeout.
Mar 31 04:28:42 Storinator kernel: r750:HIM_EVENT_DEVICE_TIMEOUT vd 00000000f0b5df3b
Mar 31 04:28:42 Storinator kernel: r750:[        ] Cdb [88, 0, 0, 0,  0, 0, 3,67, c7,38, 0, 0,  0,80, 0, 0].
Mar 31 04:28:42 Storinator kernel: r750:[        ] H2D FIS(Slot:39): 00258127 4067c738 00000003 00000080
Mar 31 04:28:44 Storinator kernel: r750:[01:00 P3] Asyn Notification Received
Mar 31 04:28:44 Storinator kernel: r750:[01:00 P3] GSCR changed
Mar 31 04:28:53 Storinator kernel: r750:[01:00 12] Start Soft Reset for 3/0
Mar 31 04:28:53 Storinator kernel: r750:[01:00 13] Start Soft Reset for 3/1

And those two ended up being dropped:

Mar 31 04:35:22 Storinator kernel: r750:[01:00 09] Reset Phase 2 failed for 1/1
Mar 31 04:35:22 Storinator kernel: r750:[01:00 09] disk removed (0).
Mar 31 04:35:22 Storinator kernel: r750:[01:00 10] Start Soft Reset for 2/2
Mar 31 04:35:22 Storinator kernel: r750:[01:00 10] Request failed. Error information 0x90800000
Mar 31 04:35:22 Storinator kernel: r750:[        ] H2D FIS: 00000227 00000000 00000000 04000000
Mar 31 04:35:22 Storinator kernel: r750:[01:00 10] Reset Phase 2 failed for 2/1
Mar 31 04:35:22 Storinator kernel: r750:[01:00 10] disk removed (0).

A reboot should fix it for now, though you'll need to rebuild the disks, but if it happens again you might want to consider replacing the controller with one of the recommended LSI HBAs.

  • Like 1
Link to comment

I figured it wasn't the drives, but replacing an $800 controller is a much more costly venture, I'd rather it be bad drives. It's been working fine since 2014. Was the only one I could find that supported all the ports I needed. Possibly one of the ports failed then. I'll reboot and test. Thank you.

Edited by djvj
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.