Disk Issues - Since 6.9


Recommended Posts

Mar 18 17:14:12 Servo root: ERROR: system chunk array too small 34 < 97
Mar 18 17:14:12 Servo root: ERROR: superblock checksum matches but it has invalid members
Mar 18 17:14:12 Servo root: ERROR: cannot scan /dev/sdc1: Input/output error
Mar 18 17:14:42 Servo emhttpd: mount_pool: ERROR: system chunk array too small 34 < 97
Mar 18 17:14:42 Servo emhttpd: mount_pool: ERROR: superblock checksum matches but it has invalid members
Mar 18 17:14:42 Servo emhttpd: mount_pool: ERROR: cannot scan /dev/sdc1: Input/output error

 

Next Issue

 

Mar 19 21:58:09 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred
Mar 19 21:58:10 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred
Mar 19 21:58:10 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred
Mar 19 21:58:11 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred
Mar 19 21:58:12 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred
Mar 19 21:58:13 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred
Mar 19 21:58:13 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred
Mar 19 21:58:14 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred
Mar 19 21:58:15 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred

 

Which is also causing in SMART  UDMA_CRC_Error_Count to go up

This happens when writing

 

Next Issue

 

SMART doesn't show anymore in the GUI, off any drive off the SAS Card

 

System Components - The only ones that matter in this regards

 

Unraid 6.9.1 <- 6.9 I had the spinup/down issue with this card

Broadcom / LSI MegaRAID SAS 2008 [Falcon] (rev 03)
SFF-8087 to SATA Forward Breakout

Rosewill 3 x 5.25-Inch to 4 x 3.5-Inch Hot-swap SATAIII/SAS Hard Disk Drive Cage

ST8000DM004

 

Disk 4-Disk 8 are located on here

I was having a similar issue on Disk 7 "WRITE FPDMA QUEUED"

I have reseated everything, sff-8087, power, sata etc

 

I have also pulled out the disks in question 4 and 7, and checked the SMART and overall health - 100% - PASS.
Nothing funny

 

What's my next steps, I haven't written to any other disks but these in question.

That being said, the drives running off here "onboard"

 

SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)

Are fine, SMART shows in GUI too

 

Unraid Issue? 
Cable Issue?
Raid Card Issue?
Power Issue?
Drive Issue?
Backplane Issue?

 

Thanks for any help, if you need any more info or need me to test something let me know :)

Edited by G Speed
Link to comment
Mar 18 17:14:12 Servo root: ERROR: system chunk array too small 34 < 97
Mar 18 17:14:12 Servo root: ERROR: superblock checksum matches but it has invalid members
Mar 18 17:14:12 Servo root: ERROR: cannot scan /dev/sdc1: Input/output error

This is about parity, the error itself can be ignore since parity doesn't have a filesystem, and I don't think it would cause any other issues.

 

Mar 18 17:14:17 Servo kernel: BTRFS info (device md2): bdev /dev/md2 errs: wr 0, rd 0, flush 0, corrupt 2666, gen 0

Disk2 is showing data corruption, you should run a scrub.

 

As for the LSI, it's using the MegaRAID driver, I would recommend flashing it to IT mode, note that it will require a new config since the disks IDs will change, then see if the related errors go way or not.

 

Link to comment
4 hours ago, JorgeB said:

Mar 18 17:14:12 Servo root: ERROR: system chunk array too small 34 < 97
Mar 18 17:14:12 Servo root: ERROR: superblock checksum matches but it has invalid members
Mar 18 17:14:12 Servo root: ERROR: cannot scan /dev/sdc1: Input/output error

This is about parity, the error itself can be ignore since parity doesn't have a filesystem, and I don't think it would cause any other issues.

 


Mar 18 17:14:17 Servo kernel: BTRFS info (device md2): bdev /dev/md2 errs: wr 0, rd 0, flush 0, corrupt 2666, gen 0

Disk2 is showing data corruption, you should run a scrub.

 

As for the LSI, it's using the MegaRAID driver, I would recommend flashing it to IT mode, note that it will require a new config since the disks IDs will change, then see if the related errors go way or not.

 

 

It is flashed to it mode, despite what Unraid is saying lol

 

Link to comment

Hmm.. It's been working fine like this since day one, I looked back at my logs from 2 years ago.

It was always showing that


01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03)
    Subsystem: Dell PERC H310 [1028:1f78]
    Kernel driver in use: megaraid_sas
    Kernel modules: megaraid_sas

 

I wonder if in 6.9 something changed to piss it off lol

 

Thanks for pointing this out :)

 

Can I use this to do it?


Your post?
How to upgrade an LSI HBA firmware using Unraid - Storage Devices and Controllers - Unraid

 

Edited by G Speed
Link to comment

Quick update. I bought 2 H310's at the time "used"

One I had an issue flashing, the other seemed okay..

 

I just swaped it out

Let me know if this is correct, and I will plug in the drives

Or if it's not or if I should update the firmware yadda yadda

01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
    Subsystem: Dell 6Gbps SAS HBA Adapter [1028:1f1c]
    Kernel driver in use: mpt3sas
    Kernel modules: mpt3sas

 

Plugged the drives in haven't started array

 

I can now see SMART in the GUI in 6.9 on all the drives

 

Did a new Config, preserving all

 

Edited by G Speed
Link to comment
10 minutes ago, JorgeB said:

That is the correct driver, still need to check it's in IT mode and using latest firmware, you can see that on the syslog, or post it here.

 

You'll need to re-assign all the disks to their original positions.

 

 

Mar 21 11:07:46 Servo kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
Mar 21 11:07:46 Servo kernel: mpt2sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
Mar 21 11:07:46 Servo kernel: scsi host7: Fusion MPT SAS Host
Mar 21 11:07:46 Servo kernel: mpt2sas_cm0: sending port enable !!

 

All good :) - I think

 

but my cache pool is messed?
Cache is on mobo controller

 

Unmountable: No pool uuid

 

 

servo-diagnostics-20210321-1131.zip

Edited by G Speed
Link to comment
18 hours ago, G Speed said:

All good :) - I think

Yes

 

 

18 hours ago, G Speed said:

but my cache pool is messed?

Not seeing the reason why, please reboot and post new diags after array start (there shouldn't be one but if there is a "all data on this device will be delete" in front to the cache pool device(s) don't start the array).

Link to comment
3 hours ago, JorgeB said:

Yes

 

 

Not seeing the reason why, please reboot and post new diags after array start (there shouldn't be one but if there is a "all data on this device will be delete" in front to the cache pool device(s) don't start the array).

 

Unmountable disk present:
Cache • Samsung_SSD_860_EVO_500GB_S3Z (sdc)
Cache 2 • Samsung_SSD_860_EVO_500GB_S3Z1 (sdb)
Format will create a file system in all Unmountable disks.
 Yes, I want to do this

 

I see this after I start array, drives aren't showing blue either for "new"
I have a green ball but, still showing "Unmountable: No pool uuid"

I had these as btrfs-1

 

Changing from auto to Btfrs, on Cache 1 "should be safe correct"

 

 

servo-diagnostics-20210322-0826.zip

Edited by G Speed
Link to comment

There's a valid btrfs filesystem on the SSDs, I think the problem now might be related to this error:

 

Mar 22 08:24:28 Servo emhttpd: shcmd (91): /sbin/btrfs device scan
Mar 22 08:24:29 Servo root: ERROR: system chunk array too small 34 < 97
Mar 22 08:24:29 Servo root: ERROR: superblock checksum matches but it has invalid members
Mar 22 08:24:29 Servo root: ERROR: cannot scan /dev/sdm1: Input/output error

 

It wasn't interfering before but now apparently it is. These errors result of parity having a semi-valid btrfs filesystem because there's an odd number of array devices, if I'm correct this won't happen if you add (or remove) another array device so you get an even number, or alternativly re-sync parity as parity2, parity2 is calculated in a different way, so there shouldn't be an issue even with an odd number of array drives.

 

Link to comment
23 minutes ago, JorgeB said:

There's a valid btrfs filesystem on the SSDs, I think the problem now might be related to this error:

 



Mar 22 08:24:28 Servo emhttpd: shcmd (91): /sbin/btrfs device scan
Mar 22 08:24:29 Servo root: ERROR: system chunk array too small 34 < 97
Mar 22 08:24:29 Servo root: ERROR: superblock checksum matches but it has invalid members
Mar 22 08:24:29 Servo root: ERROR: cannot scan /dev/sdm1: Input/output error

 

It wasn't interfering before but now apparently it is. These errors result of parity having a semi-valid btrfs filesystem because there's an odd number of array devices, if I'm correct this won't happen if you add (or remove) another array device so you get an even number, or alternativly re-sync parity as parity2, parity2 is calculated in a different way, so there shouldn't be an issue even with an odd number of array drives.

 

Not really sure, what that means

Should I remove 1 cache drive? To see what happen.

 

Or you saying I should do a parity scan? For the array

 

 

Edited by G Speed
Link to comment
5 minutes ago, JorgeB said:

No, definatly not that, you could add an array device if you have one available or change parity to parity2 (will need a re-sync).
 

How do I change to parity 2, I only have 1 parity drive.

 

No spare drives here

Edited by G Speed
Link to comment

Stop array, unassign parity, start array, before assign it as parity2 you can confirm that is indeed the problem, wipe the partition with:

 

wipefs -a /dev/sdX1

 

Replace X with the correct letter, it was m as of last diags, note the 1 in the end.

 

Then reboot, start the array whit parity unassigned and see if the cache pool mounts, if yes, stop array, assign it as parity2, start array to begin parity sync.

  • Thanks 1
Link to comment
6 minutes ago, JorgeB said:

Did you wipe it and reboot? IF yes post new diags.

 

Can I pull it out of the system instead?
 and reboot

Edit: I did, and it worked :)
Log attached without SDM drive "parity"


Can I "reassign to parity 1" I think having it at parity 2 will drive me nuts LMAO

Can I just do  a quick format outside of unraid in windows or linux, then throw it back in parity 1 as a "new drive"
or does unraid remember that drive

 

servo-diagnostics-20210322-1003.zip

Edited by G Speed
Link to comment

Same thing, because:

1 hour ago, JorgeB said:

These errors result of parity having a semi-valid btrfs filesystem because there's an odd number of array devices

 

Because of how parity1 works having an odd number of devices can create a valid (or invalid) file-system in the parity disk, and with btrfs this can cause more issues because it will be detected in the scan:

 

1 hour ago, JorgeB said:

Mar 22 08:24:28 Servo emhttpd: shcmd (91): /sbin/btrfs device scan

 

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.