emuhack Posted October 27, 2019 Share Posted October 27, 2019 I posted in the rc threads and was asked to come here and post. after I installed RC 1 through 4. I've been getting drive errors. highway back to 6.7 and it was fine for about 36 hours, then driver errors started showing up. Attached are my diagnostic zip emu-unraid-diagnostics-20191027-1723.zip Quote Link to comment
emuhack Posted October 28, 2019 Author Share Posted October 28, 2019 Here is what is in the log and now I'm up to 277 errors Oct 27 18:16:38 eMu-Unraid kernel: md: disk3 read error, sector=2340611736 Oct 27 18:16:38 eMu-Unraid kernel: md: disk5 read error, sector=2340611736 Oct 27 18:16:38 eMu-Unraid kernel: md: disk6 read error, sector=2340611736 Oct 27 18:16:38 eMu-Unraid kernel: md: disk7 read error, sector=2340611736 Oct 27 18:16:38 eMu-Unraid kernel: md: disk2 read error, sector=2340611744 Oct 27 18:16:38 eMu-Unraid kernel: md: disk3 read error, sector=2340611744 Oct 27 18:16:38 eMu-Unraid kernel: md: disk5 read error, sector=2340611744 Oct 27 18:16:38 eMu-Unraid kernel: md: disk6 read error, sector=2340611744 Oct 27 18:16:38 eMu-Unraid kernel: md: disk7 read error, sector=2340611744 Oct 27 20:16:28 eMu-Unraid kernel: mdcmd (87): spindown 0 Oct 27 20:16:29 eMu-Unraid kernel: mdcmd (88): spindown 1 Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (89): spindown 2 Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (90): spindown 3 Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (91): spindown 5 Oct 27 20:16:31 eMu-Unraid kernel: mdcmd (92): spindown 6 Oct 27 20:16:31 eMu-Unraid kernel: mdcmd (93): spindown 7 Oct 27 20:26:23 eMu-Unraid kernel: mdcmd (94): spindown 4 Oct 27 21:30:20 eMu-Unraid kernel: sd 1:1:6:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00 Oct 27 21:30:20 eMu-Unraid kernel: sd 1:1:6:0: [sdd] tag#0 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00 Oct 27 21:30:20 eMu-Unraid kernel: print_req_error: I/O error, dev sdd, sector 3337004368 Oct 27 21:30:20 eMu-Unraid kernel: md: disk4 read error, sector=3337004304 Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:4:0: [sdb] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00 Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:4:0: [sdb] tag#6 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00 Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdb, sector 3337004368 Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:5:0: [sdc] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00 Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:5:0: [sdc] tag#5 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00 Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdc, sector 3337004368 Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:15:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00 Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:15:0: [sdi] tag#1 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00 Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdi, sector 3337004368 Oct 27 21:30:31 eMu-Unraid kernel: sd 1:1:7:0: [sde] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00 Oct 27 21:30:31 eMu-Unraid kernel: sd 1:1:7:0: [sde] tag#7 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00 Oct 27 21:30:31 eMu-Unraid kernel: print_req_error: I/O error, dev sde, sector 3337004368 Oct 27 21:30:31 eMu-Unraid kernel: md: disk2 read error, sector=3337004304 Oct 27 21:30:31 eMu-Unraid kernel: md: disk3 read error, sector=3337004304 Oct 27 21:30:31 eMu-Unraid kernel: md: disk5 read error, sector=3337004304 Oct 27 21:30:31 eMu-Unraid kernel: md: disk7 read error, sector=3337004304 Oct 27 21:30:31 eMu-Unraid kernel: XFS (md4): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0xc6e6a510 len 8 error 5 Quote Link to comment
Squid Posted October 28, 2019 Share Posted October 28, 2019 Have you tried reseating the cabling and power to the drives? Quote Link to comment
emuhack Posted October 28, 2019 Author Share Posted October 28, 2019 Just now, Squid said: Have you tried reseating the cabling and power to the drives? I did that once already, im ordering longer cables as these are tight. Its just weird that it just started to happen, and there are no smart errors! I guess how long can I go until this start effecting data? I can get 2tb drives on amazon for $45 new so i may just be replacing them. I also have a 1tb laying around that i may plug into the HBA to make sure its not that as well. I just dont know where to start and im hoping its not 6 drives failing Quote Link to comment
Squid Posted October 28, 2019 Share Posted October 28, 2019 Personally, I'd shut the thing off until new cabling arrives, and then go from there. Disk 4 will probably need to have the file system check run on it. Quote Link to comment
emuhack Posted October 28, 2019 Author Share Posted October 28, 2019 9 hours ago, Squid said: Personally, I'd shut the thing off until new cabling arrives, and then go from there. Disk 4 will probably need to have the file system check run on it. Most if not 95% of my plex data is on the 8tb, so the other drives stay spun down 90% of the time... so i ordered these https://www.amazon.com/gp/product/B009APIZFI/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 and we will see what happens after i swap the cables and i will run the file system check as you mentioned. Quote Link to comment
emuhack Posted October 28, 2019 Author Share Posted October 28, 2019 (edited) root@eMu-Unraid:~# xfs_repair -v /dev/md7 Phase 1 - find and verify superblock... - block cache size set to 752616 entries Phase 2 - using internal log - zero log... zero_log: head block 396 tail block 396 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... XFS_REPAIR Summary Mon Oct 28 10:17:12 2019 Phase Start End Duration Phase 1: 10/28 10:17:12 10/28 10:17:12 Phase 2: 10/28 10:17:12 10/28 10:17:12 Phase 3: 10/28 10:17:12 10/28 10:17:12 Phase 4: 10/28 10:17:12 10/28 10:17:12 Phase 5: 10/28 10:17:12 10/28 10:17:12 Phase 6: 10/28 10:17:12 10/28 10:17:12 Phase 7: 10/28 10:17:12 10/28 10:17:12 Total run time: done I ran repair command on disk 7 and this was the output? unless im missing something, it does not look like it did anything ? Edited October 28, 2019 by emuhack Quote Link to comment
itimpi Posted October 28, 2019 Share Posted October 28, 2019 25 minutes ago, emuhack said: I ran repair command on disk 7 and this was the output? unless im missing something, it does not look like it did anything ? Since to error was reported then the disk should now be mountable - is this the case? Quote Link to comment
emuhack Posted October 28, 2019 Author Share Posted October 28, 2019 14 minutes ago, itimpi said: Since to error was reported then the disk should now be mountable - is this the case? I ran the command on all the drives that were reporting errors and it seems to be stable. It has been mounted and i enabled my mover to move over data just to test writes to the array, and it has been error free for 40min now... so we will see... If this stays error free, what would cause the errors in the first place on the disks if the disks were not bad? Quote Link to comment
itimpi Posted October 28, 2019 Share Posted October 28, 2019 1 minute ago, emuhack said: I ran the command on all the drives that were reporting errors and it seems to be stable. It has been mounted and i enabled my mover to move over data just to test writes to the array, and it has been error free for 40min now... so we will see... If this stays error free, what would cause the errors in the first place on the disks if the disks were not bad? The commonest cause is some sort of connection glitch that causes a write to fail. In such a case you end up with the disk disabled, and frequently unmountable until you run the file system repair utility. The cause of such things can be widely varied although cabling issues are amongst the most common. Quote Link to comment
emuhack Posted October 28, 2019 Author Share Posted October 28, 2019 2 hours in and i now have 1 drive with errors new cables come tomorrow and will swap out Oct 28 13:10:58 eMu-Unraid kernel: print_req_error: I/O error, dev sdf, sector 1953515488 Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515424 Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515432 Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515440 Quote Link to comment
emuhack Posted October 30, 2019 Author Share Posted October 30, 2019 On 10/28/2019 at 11:04 AM, itimpi said: The commonest cause is some sort of connection glitch that causes a write to fail. In such a case you end up with the disk disabled, and frequently unmountable until you run the file system repair utility. The cause of such things can be widely varied although cabling issues are amongst the most common. I had almost 300 read errors as of 5min ago... I just got the actual ADAPTEC cables... not cheap, but WHATEVER!!! lol I just switched out the cables and will keep you posted!! Quote Link to comment
emuhack Posted October 30, 2019 Author Share Posted October 30, 2019 On 10/27/2019 at 10:54 PM, Squid said: Personally, I'd shut the thing off until new cabling arrives, and then go from there. Disk 4 will probably need to have the file system check run on it. I swapped out the cables and woke up this morning to drive errors. I don't know what is up. I'm beginning to thing that the drives are actually going bad, they all have 45-50k hrs on them. I just orded a 8tb off Amazon with delivery btoday to copy data from the drives. Unless you guys have any other thoughts I hope this fixes it. Quote Link to comment
emuhack Posted October 30, 2019 Author Share Posted October 30, 2019 Attached are the logs from this morning... emu-unraid-diagnostics-20191030-1349.zip Quote Link to comment
JorgeB Posted October 30, 2019 Share Posted October 30, 2019 Almost impossible for multiple drives to error on the same sector at the same time, if it's not the SATA cables could be PSU (or power cables) or the HBA. Quote Link to comment
emuhack Posted October 30, 2019 Author Share Posted October 30, 2019 20 minutes ago, johnnie.black said: Almost impossible for multiple drives to error on the same sector at the same time, if it's not the SATA cables could be PSU (or power cables) or the HBA. The 8tb drive and Parity on my system is on the same HBA and no errors ? Quote Link to comment
JorgeB Posted October 30, 2019 Share Posted October 30, 2019 Since the read errors happened after spin ups try disabling spin down for a couple of days, I seem to remember someone else having similar errors after spin up. Quote Link to comment
emuhack Posted October 31, 2019 Author Share Posted October 31, 2019 On 10/30/2019 at 10:05 AM, johnnie.black said: Since the read errors happened after spin ups try disabling spin down for a couple of days, I seem to remember someone else having similar errors after spin up. I took out the drives with the most read errors and put in my new 8tb drive. I have run since 10:30pm CST with no errors... SOOOO my next question is could the 2 drives that had the most errors cause the other drives to error as well? Another user brought up Power Cabling as well as a cause I have three molex from the PSU to a 5 out splitter x 3 --- The original 8tb and parity drive and one 2tb drive was on 1 splitter and the other five 2 tb drives were on its own and the cache pool of SSD's are on another splitter. I now have 2 less drives in the system and no errors as of yet? I have a 760 watt supply and this here: https://outervision.com/power-supply-calculator recommended 485. SO ? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.