Failed Drive/Very Slow New Drive/GUI crashes need help


Recommended Posts

Hey Guys,


So heres my dilemma. My WD 4TB Red got a few errors on them so I decided to replace it with a newly RMA WD 4TB red. So after a few hours, unraid reported that the new drive was having more error than the old one which resulted to unraid emulating the said disk (Disk8).

 

Since I knew the drive was newly RMA cant really believe that its dead. Anyway I bought a Toshiba N300 6TB Drive.

 

So I replaced the drive and unraid did its thing. 24hours later, I checked my rebuilding % was stuck to 19% because the transfer rate was at 35kb/s and needed like 2yrs+ to finish sometimes even more than that. So I rebooted and it did the same thing.

 

Now I cant hardly use my server anymore because the GUI crashes (blank pages). Im currently using krusader to manually transfer the files from my dead/emulated drive to the other drives in my server. Krusader is still currently running while GUI is down, but all my other docker apps seems to be dead.

 

At this point all I wanted to do is just transfer the files, make a new config to i can remove the dead drive. Then re-add the new drives that was previously giving me errors.

 

Is what im doing the right thing to do? Or is there other fixes to my problem.

 

Diagnostics below doesnt have the 6tb drive plugged.

*Also I have the nvidia driver plugin, and read somewhere it crashes the gui. Do I need to uninstall it?

sobnology-diagnostics-20210823-2040.zip

Edited by karlpox
Link to comment
36 minutes ago, karlpox said:

newly RMA WD 4TB red

The drives you get back when you RMA a drive are almost never new, they are other people's returns that have passed diagnostics. That means you could be inheriting someone else's hard to troubleshoot problem drive. Whenever possible, return for refund and purchase new, even if you have to pay a penalty.

 

Now, all that said, it's quite possible there is nothing wrong with the drive, it could be the power or SATA cable, or controller causing issues.

Link to comment
6 minutes ago, JonathanM said:

The drives you get back when you RMA a drive are almost never new, they are other people's returns that have passed diagnostics. That means you could be inheriting someone else's hard to troubleshoot problem drive. Whenever possible, return for refund and purchase new, even if you have to pay a penalty.

 

Now, all that said, it's quite possible there is nothing wrong with the drive, it could be the power or SATA cable, or controller causing issues.

I actually mentioned that I did buy new drives. But the transfer speeds were 35kbps or <100kbps. Forgot to mention that I did change sata controller and cables.

Link to comment

In your syslog you are getting a lot of

Aug 23 20:40:24 sobnology kernel: FAT-fs (sda1): Directory bread(block 32768) failed
Aug 23 20:40:24 sobnology kernel: FAT-fs (sda1): Directory bread(block 32769) failed
Aug 23 20:40:24 sobnology kernel: FAT-fs (sda1): Directory bread(block 32770) failed
Aug 23 20:40:24 sobnology kernel: FAT-fs (sda1): Directory bread(block 32771) failed

Which suggests problems with the flash drive.  If possible plug it into a USB2 port.

 

I would suggest backing it up and then rewriting it to see if that fixes the issue.   If not then the drive may be failing.

Link to comment
2 hours ago, itimpi said:

In your syslog you are getting a lot of

Aug 23 20:40:24 sobnology kernel: FAT-fs (sda1): Directory bread(block 32768) failed
Aug 23 20:40:24 sobnology kernel: FAT-fs (sda1): Directory bread(block 32769) failed
Aug 23 20:40:24 sobnology kernel: FAT-fs (sda1): Directory bread(block 32770) failed
Aug 23 20:40:24 sobnology kernel: FAT-fs (sda1): Directory bread(block 32771) failed

Which suggests problems with the flash drive.  If possible plug it into a USB2 port.

 

I would suggest backing it up and then rewriting it to see if that fixes the issue.   If not then the drive may be failing.

sadly i dont have usb2 ports. Just a bunch of 3.0 & 3.1. I did try using the other usb port.

 

Im also having this error in the logs 

 

Aug 23 23:58:43 sobnology kernel: pcieport 0000:00:02.0: AER: Corrected error received: 0000:00:02.0
Aug 23 23:58:43 sobnology kernel: pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Aug 23 23:58:43 sobnology kernel: pcieport 0000:00:02.0: device [8086:6f04] error status/mask=00000040/00002000

Link to comment

Aug 24 00:03:45 sobnology kernel: kernel BUG at drivers/md/unraid.c:356!
Aug 24 00:03:45 sobnology kernel: invalid opcode: 0000 [#1] SMP PTI
Aug 24 00:03:45 sobnology kernel: CPU: 6 PID: 2300 Comm: kworker/u24:7 Tainted: P O 5.10.28-Unraid #1

 

Last error before everything crashed.

Link to comment
9 hours ago, trurl said:

Most motherboards have USB2 on a header even if they don't expose it in the I/O panel.

 

Did you?

Oh yeah I forgot about that. But I have a Frankenstein type of case haha. I dont actually have front IO ports because of what I did. Will try and find a way to get usb2 ports.

 

I didnt back it up. I just relocated it to a different usb3 port. So far Ive uninstalled nvidia driver plugin and unraid has been running for 10hours now without the gui crashing.

Link to comment

Any idea how i can get rid of this error

 

Aug 23 23:58:43 sobnology kernel: pcieport 0000:00:02.0: AER: Corrected error received: 0000:00:02.0
Aug 23 23:58:43 sobnology kernel: pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Aug 23 23:58:43 sobnology kernel: pcieport 0000:00:02.0: device [8086:6f04] error status/mask=00000040/00002000

 

keeps popping out every once in awhile.

Link to comment
17 hours ago, JorgeB said:

oh wow that worked. removed a lot of the errors.

 

Also I changed USB drive but the problem still occurs.

 

Im having this errors

 

Aug 25 00:52:24 sobnology kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: G W 5.10.28-Unraid #1
Aug 25 00:52:24 sobnology kernel: Hardware name: Gigabyte Technology Co., Ltd. Default string/X99-Designare EX-CF, BIOS F5c 06/15/2018

Aug 25 00:53:25 sobnology kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: G W 5.10.28-Unraid #1
Aug 25 00:53:25 sobnology kernel: Hardware name: Gigabyte Technology Co., Ltd. Default string/X99-Designare EX-CF, BIOS F5c 06/15/2018

Aug 25 00:56:25 sobnology kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: G W 5.10.28-Unraid #1
Aug 25 00:56:25 sobnology kernel: Hardware name: Gigabyte Technology Co., Ltd. Default string/X99-Designare EX-CF, BIOS F5c 06/15/2018

Aug 25 00:56:25 sobnology nginx: 2021/08/25 00:56:25 [error] 7487#7487: *181694 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.14, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.10", referrer: "http://192.168.0.10/Main"
Aug 25 00:56:25 sobnology nginx: 2021/08/25 00:56:25 [error] 7487#7487: *180685 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.14, server: , request: "POST /plugins/dynamix.system.temp/include/SystemTemp.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.10", referrer: "http://192.168.0.10/Main"
Aug 25 00:56:25 sobnology nginx: 2021/08/25 00:56:25 [error] 7487#7487: *181083 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.14, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.10", referrer: "http://192.168.0.10/Main"
Aug 25 00:56:25 sobnology nginx: 2021/08/25 00:56:25 [error] 7487#7487: *181865 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.14, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.10", referrer: "http://192.168.0.10/Main"
Aug 25 00:56:25 sobnology nginx: 2021/08/25 00:56:25 [error] 7487#7487: *181887 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.14, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.10", referrer: "http://192.168.0.10/Main"
Aug 25 00:58:45 sobnology nginx: 2021/08/25 00:58:45 [error] 7487#7487: *181887 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.14, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.10", referrer: "http://192.168.0.10/Main"

 

Does this mean my motherboard has problems?

Link to comment
2 hours ago, JorgeB said:

The nginx errors are software, not sure about the cause of the other ones, look for a BIOS update, it might help.

already did the bios earlier this year actually and theres no update after that. its an old motherboard too.

 

Any suggestion what else should I try?

Link to comment
14 hours ago, JorgeB said:

Try a different board if available, or ignore the errors and see if the server is stable.

 

Aug 26 03:02:21 sobnology root: /mnt/cache: 1.3 TiB (1453279731712 bytes) trimmed on /dev/sdg1
Aug 26 04:57:47 sobnology kernel: ata6.00: READ LOG DMA EXT failed, trying PIO
Aug 26 04:57:47 sobnology kernel: ata6.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
Aug 26 04:57:47 sobnology kernel: ata6.00: irq_stat 0x40000008
Aug 26 04:57:47 sobnology kernel: ata6.00: failed command: READ FPDMA QUEUED
Aug 26 04:57:47 sobnology kernel: ata6.00: cmd 60/78:88:20:b7:3f/03:00:04:00:00/40 tag 17 ncq dma 454656 in
Aug 26 04:57:47 sobnology kernel: res 41/40:00:9c:b8:3f/00:00:04:00:00/00 Emask 0x409 (media error) <F>
Aug 26 04:57:47 sobnology kernel: ata6.00: status: { DRDY ERR }
Aug 26 04:57:47 sobnology kernel: ata6.00: error: { UNC }
Aug 26 04:57:47 sobnology kernel: ata6.00: configured for UDMA/133
Aug 26 04:57:47 sobnology kernel: sd 6:0:0:0: [sdi] tag#17 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Aug 26 04:57:47 sobnology kernel: sd 6:0:0:0: [sdi] tag#17 Sense Key : 0x3 [current]
Aug 26 04:57:47 sobnology kernel: sd 6:0:0:0: [sdi] tag#17 ASC=0x11 ASCQ=0x4
Aug 26 04:57:47 sobnology kernel: sd 6:0:0:0: [sdi] tag#17 CDB: opcode=0x28 28 00 04 3f b7 20 00 03 78 00
Aug 26 04:57:47 sobnology kernel: blk_update_request: I/O error, dev sdi, sector 71284892 op 0x0:(READ) flags 0x80000 phys_seg 64 prio class 0
Aug 26 04:57:47 sobnology kernel: ata6: EH complete
Aug 26 05:40:03 sobnology kernel: ata6.00: exception Emask 0x0 SAct 0x100000 SErr 0x0 action 0x0
Aug 26 05:40:03 sobnology kernel: ata6.00: irq_stat 0x40000008
Aug 26 05:40:03 sobnology kernel: ata6.00: failed command: READ FPDMA QUEUED
Aug 26 05:40:03 sobnology kernel: ata6.00: cmd 60/00:a0:50:b2:3f/02:00:04:00:00/40 tag 20 ncq dma 262144 in
Aug 26 05:40:03 sobnology kernel: res 41/40:00:84:b2:3f/00:00:04:00:00/00 Emask 0x409 (media error) <F>
Aug 26 05:40:03 sobnology kernel: ata6.00: status: { DRDY ERR }
Aug 26 05:40:03 sobnology kernel: ata6.00: error: { UNC }
Aug 26 05:40:03 sobnology kernel: ata6.00: configured for UDMA/133
Aug 26 05:40:03 sobnology kernel: sd 6:0:0:0: [sdi] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Aug 26 05:40:03 sobnology kernel: sd 6:0:0:0: [sdi] tag#20 Sense Key : 0x3 [current]
Aug 26 05:40:03 sobnology kernel: sd 6:0:0:0: [sdi] tag#20 ASC=0x11 ASCQ=0x4
Aug 26 05:40:03 sobnology kernel: sd 6:0:0:0: [sdi] tag#20 CDB: opcode=0x28 28 00 04 3f b2 50 00 02 00 00
Aug 26 05:40:03 sobnology kernel: blk_update_request: I/O error, dev sdi, sector 71283332 op 0x0:(READ) flags 0x80700 phys_seg 58 prio class 0
Aug 26 05:40:03 sobnology kernel: ata6: EH complete
Aug 26 05:40:03 sobnology kernel: ata6.00: exception Emask 0x0 SAct 0x1805128 SErr 0x0 action 0x0
Aug 26 05:40:03 sobnology kernel: ata6.00: irq_stat 0x40000008
Aug 26 05:40:03 sobnology kernel: ata6.00: failed command: READ FPDMA QUEUED
Aug 26 05:40:03 sobnology kernel: ata6.00: cmd 60/08:b8:88:b2:3f/00:00:04:00:00/40 tag 23 ncq dma 4096 in
Aug 26 05:40:03 sobnology kernel: res 41/40:00:88:b2:3f/00:00:04:00:00/00 Emask 0x409 (media error) <F>
Aug 26 05:40:03 sobnology kernel: ata6.00: status: { DRDY ERR }
Aug 26 05:40:03 sobnology kernel: ata6.00: error: { UNC }
Aug 26 05:40:03 sobnology kernel: ata6.00: configured for UDMA/133
Aug 26 05:40:03 sobnology kernel: sd 6:0:0:0: [sdi] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Aug 26 05:40:03 sobnology kernel: sd 6:0:0:0: [sdi] tag#23 Sense Key : 0x3 [current]
Aug 26 05:40:03 sobnology kernel: sd 6:0:0:0: [sdi] tag#23 ASC=0x11 ASCQ=0x4
Aug 26 05:40:03 sobnology kernel: sd 6:0:0:0: [sdi] tag#23 CDB: opcode=0x28 28 00 04 3f b2 88 00 00 08 00
Aug 26 05:40:03 sobnology kernel: blk_update_request: I/O error, dev sdi, sector 71283336 op 0x0:(READ) flags 0x100 phys_seg 1 prio class 0
Aug 26 05:40:03 sobnology kernel: BTRFS error (device sdg1): bdev /dev/sdi1 errs: wr 0, rd 79, flush 0, corrupt 0, gen 0
Aug 26 05:40:03 sobnology kernel: ata6: EH complete
Aug 26 05:40:03 sobnology kernel: BTRFS info (device sdg1): read error corrected: ino 2254065 off 94208 (dev /dev/sdi1 sector 71283272)
Aug 26 05:58:22 sobnology kernel: ata6.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x0
Aug 26 05:58:22 sobnology kernel: ata6.00: irq_stat 0x40000008
Aug 26 05:58:22 sobnology kernel: ata6.00: failed command: READ FPDMA QUEUED
Aug 26 05:58:22 sobnology kernel: ata6.00: cmd 60/60:d8:d0:04:72/02:00:16:00:00/40 tag 27 ncq dma 311296 in
Aug 26 05:58:22 sobnology kernel: res 41/40:00:80:06:72/00:00:16:00:00/00 Emask 0x409 (media error) <F>
Aug 26 05:58:22 sobnology kernel: ata6.00: status: { DRDY ERR }
Aug 26 05:58:22 sobnology kernel: ata6.00: error: { UNC }
Aug 26 05:58:22 sobnology kernel: ata6.00: configured for UDMA/133
Aug 26 05:58:22 sobnology kernel: sd 6:0:0:0: [sdi] tag#27 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Aug 26 05:58:22 sobnology kernel: sd 6:0:0:0: [sdi] tag#27 Sense Key : 0x3 [current]
Aug 26 05:58:22 sobnology kernel: sd 6:0:0:0: [sdi] tag#27 ASC=0x11 ASCQ=0x4
Aug 26 05:58:22 sobnology kernel: sd 6:0:0:0: [sdi] tag#27 CDB: opcode=0x28 28 00 16 72 04 d0 00 02 60 00
Aug 26 05:58:22 sobnology kernel: blk_update_request: I/O error, dev sdi, sector 376571520 op 0x0:(READ) flags 0x80000 phys_seg 22 prio class 0
Aug 26 05:58:22 sobnology kernel: ata6: EH complete
Aug 26 05:58:22 sobnology kernel: ata6.00: exception Emask 0x0 SAct 0x8d000342 SErr 0x0 action 0x0

 

Does this error means both my cache drives have problems? sdg & sdi are both my 1tb ssd cache drives.

Link to comment

updated the LSI cards. unraid still crashing. guess its time to change the board.

 

I also keep getting this error before everything freezes

 

Aug 26 13:44:23 sobnology kernel: kernel BUG at drivers/md/unraid.c:356!
Aug 26 13:44:23 sobnology kernel: invalid opcode: 0000 [#1] SMP PTI
Aug 26 13:44:23 sobnology kernel: CPU: 5 PID: 26228 Comm: mdrecoveryd Not tainted 5.10.28-Unraid #1

 

Edited by karlpox
Link to comment
4 hours ago, karlpox said:
Aug 26 05:40:03 sobnology kernel: res 41/40:00:88:b2:3f/00:00:04:00:00/00 Emask 0x409 (media error) <F>
Aug 26 05:40:03 sobnology kernel: ata6.00: status: { DRDY ERR }
Aug 26 05:40:03 sobnology kernel: ata6.00: error: { UNC }

This is usually device problem, run an extended SMART test.

Link to comment
5 minutes ago, JorgeB said:

This is usually device problem, run an extended SMART test.

 

darn thats gonna takke a long time. did a last resort kinda thing.

 

I updated my unraid to beta version. then i just found out that one of my lsi cards didnt update. need to update it if my server crashes again.

Link to comment

Things I did which has contributed to solving whatever problems i was encountering

 

-changed usb 

-upgraded to a beta version of unraid (because i had a feeling that my existing unraid was corrupt)

-upgraded firmware for both my LSI cards (9211 & 9207)

-switched one of my drives with a new one

-check for errors on all drives

-scrub my cache

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.