Disk Errors - After RC install then back to 6.7.2

emuhack · October 27, 2019

I posted in the rc threads and was asked to come here and post. after I installed RC 1 through 4. I've been getting drive errors. highway back to 6.7 and it was fine for about 36 hours, then driver errors started showing up. Attached are my diagnostic zip

emu-unraid-diagnostics-20191027-1723.zip

emuhack · October 28, 2019

Here is what is in the log and now I'm up to 277 errors

Oct 27 18:16:38 eMu-Unraid kernel: md: disk3 read error, sector=2340611736
Oct 27 18:16:38 eMu-Unraid kernel: md: disk5 read error, sector=2340611736
Oct 27 18:16:38 eMu-Unraid kernel: md: disk6 read error, sector=2340611736
Oct 27 18:16:38 eMu-Unraid kernel: md: disk7 read error, sector=2340611736
Oct 27 18:16:38 eMu-Unraid kernel: md: disk2 read error, sector=2340611744
Oct 27 18:16:38 eMu-Unraid kernel: md: disk3 read error, sector=2340611744
Oct 27 18:16:38 eMu-Unraid kernel: md: disk5 read error, sector=2340611744
Oct 27 18:16:38 eMu-Unraid kernel: md: disk6 read error, sector=2340611744
Oct 27 18:16:38 eMu-Unraid kernel: md: disk7 read error, sector=2340611744
Oct 27 20:16:28 eMu-Unraid kernel: mdcmd (87): spindown 0
Oct 27 20:16:29 eMu-Unraid kernel: mdcmd (88): spindown 1
Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (89): spindown 2
Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (90): spindown 3
Oct 27 20:16:30 eMu-Unraid kernel: mdcmd (91): spindown 5
Oct 27 20:16:31 eMu-Unraid kernel: mdcmd (92): spindown 6
Oct 27 20:16:31 eMu-Unraid kernel: mdcmd (93): spindown 7
Oct 27 20:26:23 eMu-Unraid kernel: mdcmd (94): spindown 4
Oct 27 21:30:20 eMu-Unraid kernel: sd 1:1:6:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:20 eMu-Unraid kernel: sd 1:1:6:0: [sdd] tag#0 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:20 eMu-Unraid kernel: print_req_error: I/O error, dev sdd, sector 3337004368
Oct 27 21:30:20 eMu-Unraid kernel: md: disk4 read error, sector=3337004304
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:4:0: [sdb] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:4:0: [sdb] tag#6 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdb, sector 3337004368
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:5:0: [sdc] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:5:0: [sdc] tag#5 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdc, sector 3337004368
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:15:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:30 eMu-Unraid kernel: sd 1:1:15:0: [sdi] tag#1 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:30 eMu-Unraid kernel: print_req_error: I/O error, dev sdi, sector 3337004368
Oct 27 21:30:31 eMu-Unraid kernel: sd 1:1:7:0: [sde] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
Oct 27 21:30:31 eMu-Unraid kernel: sd 1:1:7:0: [sde] tag#7 CDB: opcode=0x28 28 00 c6 e6 a5 50 00 00 08 00
Oct 27 21:30:31 eMu-Unraid kernel: print_req_error: I/O error, dev sde, sector 3337004368
Oct 27 21:30:31 eMu-Unraid kernel: md: disk2 read error, sector=3337004304
Oct 27 21:30:31 eMu-Unraid kernel: md: disk3 read error, sector=3337004304
Oct 27 21:30:31 eMu-Unraid kernel: md: disk5 read error, sector=3337004304
Oct 27 21:30:31 eMu-Unraid kernel: md: disk7 read error, sector=3337004304
Oct 27 21:30:31 eMu-Unraid kernel: XFS (md4): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0xc6e6a510 len 8 error 5

Squid · October 28, 2019

Have you tried reseating the cabling and power to the drives?

emuhack · October 28, 2019

Just now, Squid said:

Have you tried reseating the cabling and power to the drives?

I did that once already, im ordering longer cables as these are tight. Its just weird that it just started to happen, and there are no smart errors! I guess how long can I go until this start effecting data? I can get 2tb drives on amazon for $45 new so i may just be replacing them. I also have a 1tb laying around that i may plug into the HBA to make sure its not that as well. I just dont know where to start and im hoping its not 6 drives failing

Squid · October 28, 2019

Personally, I'd shut the thing off until new cabling arrives, and then go from there. Disk 4 will probably need to have the file system check run on it.

emuhack · October 28, 2019

9 hours ago, Squid said:

Personally, I'd shut the thing off until new cabling arrives, and then go from there. Disk 4 will probably need to have the file system check run on it.

Most if not 95% of my plex data is on the 8tb, so the other drives stay spun down 90% of the time... so i ordered these https://www.amazon.com/gp/product/B009APIZFI/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 and we will see what happens after i swap the cables and i will run the file system check as you mentioned.

emuhack · October 28, 2019

root@eMu-Unraid:~# xfs_repair -v /dev/md7
Phase 1 - find and verify superblock...
        - block cache size set to 752616 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 396 tail block 396
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Mon Oct 28 10:17:12 2019

Phase           Start           End             Duration
Phase 1:        10/28 10:17:12  10/28 10:17:12
Phase 2:        10/28 10:17:12  10/28 10:17:12
Phase 3:        10/28 10:17:12  10/28 10:17:12
Phase 4:        10/28 10:17:12  10/28 10:17:12
Phase 5:        10/28 10:17:12  10/28 10:17:12
Phase 6:        10/28 10:17:12  10/28 10:17:12
Phase 7:        10/28 10:17:12  10/28 10:17:12

Total run time: 
done

I ran repair command on disk 7 and this was the output? unless im missing something, it does not look like it did anything ?

Edited October 28, 2019 by emuhack

itimpi · October 28, 2019

25 minutes ago, emuhack said:

I ran repair command on disk 7 and this was the output? unless im missing something, it does not look like it did anything ?

Since to error was reported then the disk should now be mountable - is this the case?

emuhack · October 28, 2019

14 minutes ago, itimpi said:

Since to error was reported then the disk should now be mountable - is this the case?

I ran the command on all the drives that were reporting errors and it seems to be stable. It has been mounted and i enabled my mover to move over data just to test writes to the array, and it has been error free for 40min now... so we will see... If this stays error free, what would cause the errors in the first place on the disks if the disks were not bad?

itimpi · October 28, 2019

1 minute ago, emuhack said:

I ran the command on all the drives that were reporting errors and it seems to be stable. It has been mounted and i enabled my mover to move over data just to test writes to the array, and it has been error free for 40min now... so we will see... If this stays error free, what would cause the errors in the first place on the disks if the disks were not bad?

The commonest cause is some sort of connection glitch that causes a write to fail. In such a case you end up with the disk disabled, and frequently unmountable until you run the file system repair utility. The cause of such things can be widely varied although cabling issues are amongst the most common.

emuhack · October 28, 2019

2 hours in and i now have 1 drive with errors

new cables come tomorrow and will swap out

Oct 28 13:10:58 eMu-Unraid kernel: print_req_error: I/O error, dev sdf, sector 1953515488
Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515424
Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515432
Oct 28 13:10:58 eMu-Unraid kernel: md: disk6 read error, sector=1953515440

emuhack · October 30, 2019

On 10/28/2019 at 11:04 AM, itimpi said:

The commonest cause is some sort of connection glitch that causes a write to fail. In such a case you end up with the disk disabled, and frequently unmountable until you run the file system repair utility. The cause of such things can be widely varied although cabling issues are amongst the most common.

I had almost 300 read errors as of 5min ago... I just got the actual ADAPTEC cables... not cheap, but WHATEVER!!! lol

I just switched out the cables and will keep you posted!!

emuhack · October 30, 2019

On 10/27/2019 at 10:54 PM, Squid said:

Personally, I'd shut the thing off until new cabling arrives, and then go from there. Disk 4 will probably need to have the file system check run on it.

I swapped out the cables and woke up this morning to drive errors. I don't know what is up. I'm beginning to thing that the drives are actually going bad, they all have 45-50k hrs on them. I just orded a 8tb off Amazon with delivery btoday to copy data from the drives. Unless you guys have any other thoughts I hope this fixes it.

emuhack · October 30, 2019

Attached are the logs from this morning...

emu-unraid-diagnostics-20191030-1349.zip

JorgeB · October 30, 2019

Almost impossible for multiple drives to error on the same sector at the same time, if it's not the SATA cables could be PSU (or power cables) or the HBA.

emuhack · October 30, 2019

20 minutes ago, johnnie.black said:

Almost impossible for multiple drives to error on the same sector at the same time, if it's not the SATA cables could be PSU (or power cables) or the HBA.

The 8tb drive and Parity on my system is on the same HBA and no errors ?

JorgeB · October 30, 2019

Since the read errors happened after spin ups try disabling spin down for a couple of days, I seem to remember someone else having similar errors after spin up.

emuhack · October 31, 2019

On 10/30/2019 at 10:05 AM, johnnie.black said:

Since the read errors happened after spin ups try disabling spin down for a couple of days, I seem to remember someone else having similar errors after spin up.

I took out the drives with the most read errors and put in my new 8tb drive. I have run since 10:30pm CST with no errors... SOOOO my next question is could the 2 drives that had the most errors cause the other drives to error as well?

Another user brought up Power Cabling as well as a cause

I have three molex from the PSU to a 5 out splitter x 3 --- The original 8tb and parity drive and one 2tb drive was on 1 splitter and the other five 2 tb drives were on its own and the cache pool of SSD's are on another splitter. I now have 2 less drives in the system and no errors as of yet? I have a 760 watt supply and this here: https://outervision.com/power-supply-calculator recommended 485. SO ?

Disk Errors - After RC install then back to 6.7.2

Recommended Posts

emuhack

Link to comment

emuhack

Link to comment

Squid

Link to comment

emuhack

Link to comment

Squid

Link to comment

emuhack

Link to comment

emuhack

Link to comment

itimpi

Link to comment

emuhack

Link to comment

itimpi

Link to comment

emuhack

Link to comment

emuhack

Link to comment

emuhack

Link to comment

emuhack

Link to comment

JorgeB

Link to comment

emuhack

Link to comment

JorgeB

Link to comment

emuhack

Link to comment

Join the conversation