Disk Errors - After RC install then back to 6.7.2


emuhack

Recommended Posts

Here is what is in the log and now I'm up to 277 errors 

 

Oct 27 18:16:38 eMu-Unraid kernel: md: disk3 read error, sector=2340611736
Oct 27 18:16:38 eMu-Unraid kernel: md: disk5 read error, sector=2340611736
Oct 27 18:16:38 eMu-Unraid kernel: md: disk6 read error, sector=2340611736
Oct 27 18:16:38 eMu-Unraid kernel: md: disk7 read error, sector=2340611736
Oct 27 18:16:38 eMu-Unraid kernel: md: disk2 read error, sector=2340611744
Oct 27 18:16:38 eMu-Unraid kernel: md: disk3 read error, sector=2340611744
Oct 27 18:16:38 eMu-Unraid kernel: md: disk5 read error, sector=2340611744
Oct 27 18:16:38 eMu-Unraid kernel: md: disk6 read error, sector=2340611744
Oct 27 18:16:38 eMu-Unraid kernel: md: disk7 read error, sector=2340611744
Oct 27 20:16:28 eMu-Unraid kernel: mdcmd (87): spindown 0
Oct 27 20:16:29 eMu-Unraid kernel: mdcmd (88): spindown 1
Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (89): spindown 2
Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (90): spindown 3
Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (91): spindown 5
Oct 27 20:16:31 eMu-Unraid kernel: mdcmd (92): spindown 6
Oct 27 20:16:31 eMu-Unraid kernel: mdcmd (93): spindown 7
Oct 27 20:26:23 eMu-Unraid kernel: mdcmd (94): spindown 4
Oct 27 21:30:20 eMu-Unraid kernel: sd 1:1:6:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:20 eMu-Unraid kernel: sd 1:1:6:0: [sdd] tag#0 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:20 eMu-Unraid kernel: print_req_error: I/O error, dev sdd, sector 3337004368
Oct 27 21:30:20 eMu-Unraid kernel: md: disk4 read error, sector=3337004304
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:4:0: [sdb] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:4:0: [sdb] tag#6 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdb, sector 3337004368
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:5:0: [sdc] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:5:0: [sdc] tag#5 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdc, sector 3337004368
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:15:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:15:0: [sdi] tag#1 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdi, sector 3337004368
Oct 27 21:30:31 eMu-Unraid kernel: sd 1:1:7:0: [sde] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:31 eMu-Unraid kernel: sd 1:1:7:0: [sde] tag#7 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:31 eMu-Unraid kernel: print_req_error: I/O error, dev sde, sector 3337004368
Oct 27 21:30:31 eMu-Unraid kernel: md: disk2 read error, sector=3337004304
Oct 27 21:30:31 eMu-Unraid kernel: md: disk3 read error, sector=3337004304
Oct 27 21:30:31 eMu-Unraid kernel: md: disk5 read error, sector=3337004304
Oct 27 21:30:31 eMu-Unraid kernel: md: disk7 read error, sector=3337004304
Oct 27 21:30:31 eMu-Unraid kernel: XFS (md4): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0xc6e6a510 len 8 error 5

 

Link to comment
Just now, Squid said:

Have you tried reseating the cabling and power to the drives?

 

I did that once already, im ordering longer cables as these are tight. Its just weird that it just started to happen, and there are no smart errors! I guess how long can I go until this start effecting data? I can get 2tb drives on amazon for $45 new so i may just be replacing them. I also have a 1tb laying around that i may plug into the HBA to make sure its not that as well. I just dont know where to start and im hoping its not 6 drives failing

 

image.thumb.png.7e4b3919722f56ac57b10025d197e4e7.png

Link to comment
9 hours ago, Squid said:

Personally, I'd shut the thing off until new cabling arrives, and then go from there.  Disk 4 will probably need to have the file system check run on it.

Most if not 95% of my plex data is on the 8tb, so the other drives stay spun down 90% of the time... so i ordered these https://www.amazon.com/gp/product/B009APIZFI/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 and we will see what happens after i swap the cables and i will run the file system check as you mentioned. 

Link to comment
root@eMu-Unraid:~# xfs_repair -v /dev/md7
Phase 1 - find and verify superblock...
        - block cache size set to 752616 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 396 tail block 396
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Mon Oct 28 10:17:12 2019

Phase           Start           End             Duration
Phase 1:        10/28 10:17:12  10/28 10:17:12
Phase 2:        10/28 10:17:12  10/28 10:17:12
Phase 3:        10/28 10:17:12  10/28 10:17:12
Phase 4:        10/28 10:17:12  10/28 10:17:12
Phase 5:        10/28 10:17:12  10/28 10:17:12
Phase 6:        10/28 10:17:12  10/28 10:17:12
Phase 7:        10/28 10:17:12  10/28 10:17:12

Total run time: 
done

I ran repair command on disk 7 and this was the output? unless im missing something, it does not look like it did anything ?

Edited by emuhack
Link to comment

 

14 minutes ago, itimpi said:

Since to error was reported then the disk should now be mountable - is this the case?

I ran the command on all the drives that were reporting errors and it seems to be stable. It has been mounted and i enabled my mover to move over data just to test writes to the array, and it has been error free for 40min now... so we will see... If this stays error free, what would cause the errors in the first place on the disks if the disks were not bad?

Link to comment
1 minute ago, emuhack said:

 

I ran the command on all the drives that were reporting errors and it seems to be stable. It has been mounted and i enabled my mover to move over data just to test writes to the array, and it has been error free for 40min now... so we will see... If this stays error free, what would cause the errors in the first place on the disks if the disks were not bad?

The commonest cause is some sort of connection glitch that causes a write to fail.   In such a case you end up with the disk disabled, and frequently unmountable until you run the file system repair utility.   The cause of such things can be widely varied although cabling issues are amongst the most common.

Link to comment

2 hours in and i now have 1 drive with errors

 

new cables come tomorrow and will swap out

 

Oct 28 13:10:58 eMu-Unraid kernel: print_req_error: I/O error, dev sdf, sector 1953515488
Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515424
Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515432
Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515440

 

 

image.thumb.png.a3250ddba853d37184304decfc8d53d6.png

 

 

Link to comment
On 10/28/2019 at 11:04 AM, itimpi said:

The commonest cause is some sort of connection glitch that causes a write to fail.   In such a case you end up with the disk disabled, and frequently unmountable until you run the file system repair utility.   The cause of such things can be widely varied although cabling issues are amongst the most common.

I had almost 300 read errors as of 5min ago... I just got the actual ADAPTEC cables... not cheap, but WHATEVER!!! lol

 

I just switched out the cables and will keep you posted!!

Link to comment
On 10/27/2019 at 10:54 PM, Squid said:

Personally, I'd shut the thing off until new cabling arrives, and then go from there.  Disk 4 will probably need to have the file system check run on it.

I swapped out the cables and woke up this morning to drive errors. I don't know what is up. I'm beginning to thing that the drives are actually going bad, they all have 45-50k hrs on them. I just orded a 8tb off Amazon with delivery btoday to copy data from the drives. Unless you guys have any other thoughts I hope this fixes it.

Link to comment
On 10/30/2019 at 10:05 AM, johnnie.black said:

Since the read errors happened after spin ups try disabling spin down for a couple of days, I seem to remember someone else having similar errors after spin up.

I took out the drives with the most read errors and put in my new 8tb drive. I have run since 10:30pm CST with no errors... SOOOO my next question is could the 2 drives that had the most errors cause the other drives to error as well?

 

Another user brought up Power Cabling as well as a cause

I have three molex from the PSU to a 5 out splitter x 3 --- The original 8tb and parity drive and one 2tb drive was on 1 splitter and the other five 2 tb drives were on its own and the cache pool of SSD's are on another splitter. I now have 2 less drives in the system and no errors as of yet? I have a 760 watt supply and this here: https://outervision.com/power-supply-calculator recommended 485. SO ?

 

image.thumb.png.afbb62e9c1e3e77b93392687cf7c81b6.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.