deltaexray Posted August 20, 2021 Share Posted August 20, 2021 So, last night I played a couple movies thru plex until this error message popped up: Aug 19 21:47:21 Server kernel: blk_update_request: I/O error, dev sdc, sector 173824 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0 Is this a, more or less, the start of a faliling disk or what could be happening here? Quote Link to comment
trurl Posted August 20, 2021 Share Posted August 20, 2021 Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
deltaexray Posted August 20, 2021 Author Share Posted August 20, 2021 (edited) I forgot that this is/was the best way to get answers, ups. There you go, server-diagnostics-20210820-2052.zip EDIT: The drive that spitted out theses errors has no self reported SMART Errors, even after the extended self test that I had running thru the day. Doesn't surprise me tho Edited August 20, 2021 by deltaexray Quote Link to comment
trurl Posted August 20, 2021 Share Posted August 20, 2021 According to diagnostics the self-test was still running. Did it complete? Aug 19 21:47:17 Server kernel: sd 7:0:1:0: attempting task abort!scmd(0x000000003ae4e576), outstanding for 15123 ms & timeout 15000 ms Aug 19 21:47:17 Server kernel: sd 7:0:1:0: [sdc] tag#6804 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Aug 19 21:47:17 Server kernel: scsi target7:0:1: handle(0x0009), sas_address(0x4433221105000000), phy(5) Aug 19 21:47:17 Server kernel: scsi target7:0:1: enclosure logical id(0x500605b006da4650), slot(6) Aug 19 21:47:21 Server kernel: sd 7:0:1:0: task abort: SUCCESS scmd(0x000000003ae4e576) Aug 19 21:47:21 Server kernel: sd 7:0:1:0: [sdc] tag#3034 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=19s Aug 19 21:47:21 Server kernel: sd 7:0:1:0: [sdc] tag#3034 Sense Key : 0x2 [current] Aug 19 21:47:21 Server kernel: sd 7:0:1:0: [sdc] tag#3034 ASC=0x4 ASCQ=0x0 Aug 19 21:47:21 Server kernel: sd 7:0:1:0: [sdc] tag#3034 CDB: opcode=0x88 88 00 00 00 00 00 00 02 a7 00 00 00 00 20 00 00 Aug 19 21:47:21 Server kernel: blk_update_request: I/O error, dev sdc, sector 173824 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0 Aug 19 21:47:21 Server kernel: md: disk3 read error, sector=173760 Aug 19 21:47:21 Server kernel: md: disk3 read error, sector=173768 Aug 19 21:47:21 Server kernel: md: disk3 read error, sector=173776 Aug 19 21:47:21 Server kernel: md: disk3 read error, sector=173784 Maybe a connection problem. Aug 19 17:35:09 Server root: Fix Common Problems: Other Warning: Background notifications not enabled You should setup Notifications to alert you immediately be email or other agent as soon as a problem is detected. Quote Link to comment
deltaexray Posted August 20, 2021 Author Share Posted August 20, 2021 It did finish, without errors. Everything was good Well, I'd like to hope that this was only because of a connection issue. But on the other hand, that HBA that is in that server works like champ. And, if it really is an issue like it, why is it only one drive when all of them are on the same channel on the HBA? I'm just curious because of the word "sector" in the error message, which could be the drive, If I'm not wrong? Yeah, I still need to set them up but I'm always confused how to set them up. I do need to do a ton of other stuff, like security and stuff Quote Link to comment
deltaexray Posted August 20, 2021 Author Share Posted August 20, 2021 That's the SMART Report of the drive that has these issues. May take a look at this, maybe this will say something server-smart-20210820-2225.zip Quote Link to comment
trurl Posted August 20, 2021 Share Posted August 20, 2021 The SMART report for all attached disks is already included in Diagnostics. That one looks like the earlier one except the test completed and passed. The disk isn't even disabled according to those Diagnostics. You can reset the error count on Main at Array Operation - Clear Stats, or by rebooting. I guess you could do a non-correcting parity check as a further test. Quote Link to comment
deltaexray Posted August 21, 2021 Author Share Posted August 21, 2021 Yeah, after I posted it, I thought so too that it was already included. I wanna find out what caused it and I'm going to do a parity check, without writing corrections to the array. We'll see what's happening afterwards I guess Quote Link to comment
deltaexray Posted August 21, 2021 Author Share Posted August 21, 2021 Qucik question? Is it normal that the speed the parity check runs drops over time? Like by a good amount? Quote Link to comment
itimpi Posted August 21, 2021 Share Posted August 21, 2021 Yes - the inner tracks on the drives are alway run significantly slower than the outer ones. Quote Link to comment
deltaexray Posted August 21, 2021 Author Share Posted August 21, 2021 (edited) I was just curious because my check dropped from 270 mbits to 130mbits during the runtime of the check. But yeah, that makes way more sense EDIT: No erros where found during the check, average speed was the same as before too. So, I guess, It was a random i/o error then? Edited August 21, 2021 by deltaexray Quote Link to comment
deltaexray Posted August 22, 2021 Author Share Posted August 22, 2021 Well. I just had the same 4 errors on my other array disk. I'm pretty sure it can only be something with the HBA or the system itself. Otherwise it wouldn't make any sense Quote Link to comment
deltaexray Posted August 23, 2021 Author Share Posted August 23, 2021 So, now all of the disks have erros. So does anyone have an idea what this could cause or do I need to order different hardware? Can't really Imagine that all drives are going to give up their god now server-diagnostics-20210823-1150.zip Quote Link to comment
itimpi Posted August 23, 2021 Share Posted August 23, 2021 The syslog shows CRC errors followed by a device reset. This suggests either a power/SZTA cabling issue, or perhaps a general power issue? Quote Link to comment
JorgeB Posted August 23, 2021 Share Posted August 23, 2021 Aug 23 10:35:11 Server kernel: sd 7:0:0:0: Power-on or device reset occurred Aug 23 10:35:11 Server kernel: sd 7:0:2:0: Power-on or device reset occurred This is happening to multiple devices, it's usually a power/connection problem. Quote Link to comment
deltaexray Posted August 23, 2021 Author Share Posted August 23, 2021 To answer both of you: The only thing that changed over the last few days is that I updated from 6.8.3 to 6.9.2. Everything else is and was the same. They are all powered up the same way as before and all cables are fixed in place. Maybe the HBA is giving up? Don't really think that the Powersupply is the issues here, given that all 3 drives are on cable while the unit is 650 watts, do not really believe that they have to less power available. Quote Link to comment
JorgeB Posted August 23, 2021 Share Posted August 23, 2021 9 minutes ago, deltaexray said: I updated from 6.8.3 to 6.9.2. 8TB Ironwolf with LSI, see here: Quote Link to comment
deltaexray Posted August 23, 2021 Author Share Posted August 23, 2021 Yeah, I thought so that it could be the reason for those issues. A few months ago, I also did the update to 6.9.x and I had to rebuild my parity because of these issues. Well, looks like I got work to do Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.