G Speed Posted March 20, 2021 Share Posted March 20, 2021 (edited) Mar 18 17:14:12 Servo root: ERROR: system chunk array too small 34 < 97 Mar 18 17:14:12 Servo root: ERROR: superblock checksum matches but it has invalid members Mar 18 17:14:12 Servo root: ERROR: cannot scan /dev/sdc1: Input/output error Mar 18 17:14:42 Servo emhttpd: mount_pool: ERROR: system chunk array too small 34 < 97 Mar 18 17:14:42 Servo emhttpd: mount_pool: ERROR: superblock checksum matches but it has invalid members Mar 18 17:14:42 Servo emhttpd: mount_pool: ERROR: cannot scan /dev/sdc1: Input/output error Next Issue Mar 19 21:58:09 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Mar 19 21:58:10 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Mar 19 21:58:10 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Mar 19 21:58:11 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Mar 19 21:58:12 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Mar 19 21:58:13 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Mar 19 21:58:13 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Mar 19 21:58:14 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Mar 19 21:58:15 Servo kernel: sd 2:0:4:0: Power-on or device reset occurred Which is also causing in SMART UDMA_CRC_Error_Count to go up This happens when writing Next Issue SMART doesn't show anymore in the GUI, off any drive off the SAS Card System Components - The only ones that matter in this regards Unraid 6.9.1 <- 6.9 I had the spinup/down issue with this card Broadcom / LSI MegaRAID SAS 2008 [Falcon] (rev 03) SFF-8087 to SATA Forward Breakout Rosewill 3 x 5.25-Inch to 4 x 3.5-Inch Hot-swap SATAIII/SAS Hard Disk Drive Cage ST8000DM004 Disk 4-Disk 8 are located on here I was having a similar issue on Disk 7 "WRITE FPDMA QUEUED" I have reseated everything, sff-8087, power, sata etc I have also pulled out the disks in question 4 and 7, and checked the SMART and overall health - 100% - PASS. Nothing funny What's my next steps, I haven't written to any other disks but these in question. That being said, the drives running off here "onboard" SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10) Are fine, SMART shows in GUI too Unraid Issue? Cable Issue? Raid Card Issue? Power Issue? Drive Issue? Backplane Issue? Thanks for any help, if you need any more info or need me to test something let me know Edited March 20, 2021 by G Speed Quote Link to comment
JorgeB Posted March 20, 2021 Share Posted March 20, 2021 Please post the diagnostics: Tools -> Diagnostics Quote Link to comment
G Speed Posted March 20, 2021 Author Share Posted March 20, 2021 Done servo-diagnostics-20210320-1405.zip Quote Link to comment
JorgeB Posted March 21, 2021 Share Posted March 21, 2021 Mar 18 17:14:12 Servo root: ERROR: system chunk array too small 34 < 97 Mar 18 17:14:12 Servo root: ERROR: superblock checksum matches but it has invalid members Mar 18 17:14:12 Servo root: ERROR: cannot scan /dev/sdc1: Input/output error This is about parity, the error itself can be ignore since parity doesn't have a filesystem, and I don't think it would cause any other issues. Mar 18 17:14:17 Servo kernel: BTRFS info (device md2): bdev /dev/md2 errs: wr 0, rd 0, flush 0, corrupt 2666, gen 0 Disk2 is showing data corruption, you should run a scrub. As for the LSI, it's using the MegaRAID driver, I would recommend flashing it to IT mode, note that it will require a new config since the disks IDs will change, then see if the related errors go way or not. Quote Link to comment
G Speed Posted March 21, 2021 Author Share Posted March 21, 2021 4 hours ago, JorgeB said: Mar 18 17:14:12 Servo root: ERROR: system chunk array too small 34 < 97 Mar 18 17:14:12 Servo root: ERROR: superblock checksum matches but it has invalid members Mar 18 17:14:12 Servo root: ERROR: cannot scan /dev/sdc1: Input/output error This is about parity, the error itself can be ignore since parity doesn't have a filesystem, and I don't think it would cause any other issues. Mar 18 17:14:17 Servo kernel: BTRFS info (device md2): bdev /dev/md2 errs: wr 0, rd 0, flush 0, corrupt 2666, gen 0 Disk2 is showing data corruption, you should run a scrub. As for the LSI, it's using the MegaRAID driver, I would recommend flashing it to IT mode, note that it will require a new config since the disks IDs will change, then see if the related errors go way or not. It is flashed to it mode, despite what Unraid is saying lol Quote Link to comment
JorgeB Posted March 21, 2021 Share Posted March 21, 2021 Then it's not correctly flashed, it's not even using the mpt3sas driver in RAID mode, it's using the MegaRAID driver, that's for RAID controllers only. Quote Link to comment
G Speed Posted March 21, 2021 Author Share Posted March 21, 2021 (edited) Hmm.. It's been working fine like this since day one, I looked back at my logs from 2 years ago. It was always showing that 01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2008 [Falcon] [1000:0073] (rev 03) Subsystem: Dell PERC H310 [1028:1f78] Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas I wonder if in 6.9 something changed to piss it off lol Thanks for pointing this out Can I use this to do it? Your post? How to upgrade an LSI HBA firmware using Unraid - Storage Devices and Controllers - Unraid Edited March 21, 2021 by G Speed Quote Link to comment
G Speed Posted March 21, 2021 Author Share Posted March 21, 2021 (edited) Quick update. I bought 2 H310's at the time "used" One I had an issue flashing, the other seemed okay.. I just swaped it out Let me know if this is correct, and I will plug in the drives Or if it's not or if I should update the firmware yadda yadda 01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03) Subsystem: Dell 6Gbps SAS HBA Adapter [1028:1f1c] Kernel driver in use: mpt3sas Kernel modules: mpt3sas Plugged the drives in haven't started array I can now see SMART in the GUI in 6.9 on all the drives Did a new Config, preserving all Edited March 21, 2021 by G Speed Quote Link to comment
JorgeB Posted March 21, 2021 Share Posted March 21, 2021 27 minutes ago, G Speed said: Kernel driver in use: mpt3sas That is the correct driver, still need to check it's in IT mode and using latest firmware, you can see that on the syslog, or post it here. You'll need to re-assign all the disks to their original positions. Quote Link to comment
G Speed Posted March 21, 2021 Author Share Posted March 21, 2021 (edited) 10 minutes ago, JorgeB said: That is the correct driver, still need to check it's in IT mode and using latest firmware, you can see that on the syslog, or post it here. You'll need to re-assign all the disks to their original positions. Mar 21 11:07:46 Servo kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00) Mar 21 11:07:46 Servo kernel: mpt2sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ) Mar 21 11:07:46 Servo kernel: scsi host7: Fusion MPT SAS Host Mar 21 11:07:46 Servo kernel: mpt2sas_cm0: sending port enable !! All good - I think but my cache pool is messed? Cache is on mobo controller Unmountable: No pool uuid servo-diagnostics-20210321-1131.zip Edited March 21, 2021 by G Speed Quote Link to comment
G Speed Posted March 21, 2021 Author Share Posted March 21, 2021 (edited) Do I just need to click on cache drive 1, change filesystem to btrfs? Edited March 21, 2021 by G Speed Quote Link to comment
G Speed Posted March 22, 2021 Author Share Posted March 22, 2021 I shut down the server, for now. Let me know, if you have any suggestions. THanks again for the help! Quote Link to comment
JorgeB Posted March 22, 2021 Share Posted March 22, 2021 18 hours ago, G Speed said: All good - I think Yes 18 hours ago, G Speed said: but my cache pool is messed? Not seeing the reason why, please reboot and post new diags after array start (there shouldn't be one but if there is a "all data on this device will be delete" in front to the cache pool device(s) don't start the array). Quote Link to comment
G Speed Posted March 22, 2021 Author Share Posted March 22, 2021 (edited) 3 hours ago, JorgeB said: Yes Not seeing the reason why, please reboot and post new diags after array start (there shouldn't be one but if there is a "all data on this device will be delete" in front to the cache pool device(s) don't start the array). Unmountable disk present: Cache • Samsung_SSD_860_EVO_500GB_S3Z (sdc) Cache 2 • Samsung_SSD_860_EVO_500GB_S3Z1 (sdb) Format will create a file system in all Unmountable disks. Yes, I want to do this I see this after I start array, drives aren't showing blue either for "new" I have a green ball but, still showing "Unmountable: No pool uuid" I had these as btrfs-1 Changing from auto to Btfrs, on Cache 1 "should be safe correct" servo-diagnostics-20210322-0826.zip Edited March 22, 2021 by G Speed Quote Link to comment
JorgeB Posted March 22, 2021 Share Posted March 22, 2021 There's a valid btrfs filesystem on the SSDs, I think the problem now might be related to this error: Mar 22 08:24:28 Servo emhttpd: shcmd (91): /sbin/btrfs device scan Mar 22 08:24:29 Servo root: ERROR: system chunk array too small 34 < 97 Mar 22 08:24:29 Servo root: ERROR: superblock checksum matches but it has invalid members Mar 22 08:24:29 Servo root: ERROR: cannot scan /dev/sdm1: Input/output error It wasn't interfering before but now apparently it is. These errors result of parity having a semi-valid btrfs filesystem because there's an odd number of array devices, if I'm correct this won't happen if you add (or remove) another array device so you get an even number, or alternativly re-sync parity as parity2, parity2 is calculated in a different way, so there shouldn't be an issue even with an odd number of array drives. Quote Link to comment
G Speed Posted March 22, 2021 Author Share Posted March 22, 2021 (edited) 23 minutes ago, JorgeB said: There's a valid btrfs filesystem on the SSDs, I think the problem now might be related to this error: Mar 22 08:24:28 Servo emhttpd: shcmd (91): /sbin/btrfs device scan Mar 22 08:24:29 Servo root: ERROR: system chunk array too small 34 < 97 Mar 22 08:24:29 Servo root: ERROR: superblock checksum matches but it has invalid members Mar 22 08:24:29 Servo root: ERROR: cannot scan /dev/sdm1: Input/output error It wasn't interfering before but now apparently it is. These errors result of parity having a semi-valid btrfs filesystem because there's an odd number of array devices, if I'm correct this won't happen if you add (or remove) another array device so you get an even number, or alternativly re-sync parity as parity2, parity2 is calculated in a different way, so there shouldn't be an issue even with an odd number of array drives. Not really sure, what that means Should I remove 1 cache drive? To see what happen. Or you saying I should do a parity scan? For the array Edited March 22, 2021 by G Speed Quote Link to comment
JorgeB Posted March 22, 2021 Share Posted March 22, 2021 Should I remove 1 cache drive? To see what happens No, definatly not that, you could add an array device if you have one available or change parity to parity2 (will need a re-sync). Quote Link to comment
G Speed Posted March 22, 2021 Author Share Posted March 22, 2021 (edited) 5 minutes ago, JorgeB said: No, definatly not that, you could add an array device if you have one available or change parity to parity2 (will need a re-sync). How do I change to parity 2, I only have 1 parity drive. No spare drives here Edited March 22, 2021 by G Speed Quote Link to comment
JorgeB Posted March 22, 2021 Share Posted March 22, 2021 Stop array, unassign parity, start array, before assign it as parity2 you can confirm that is indeed the problem, wipe the partition with: wipefs -a /dev/sdX1 Replace X with the correct letter, it was m as of last diags, note the 1 in the end. Then reboot, start the array whit parity unassigned and see if the cache pool mounts, if yes, stop array, assign it as parity2, start array to begin parity sync. 1 Quote Link to comment
G Speed Posted March 22, 2021 Author Share Posted March 22, 2021 Removed Parity Cache Still showing unmountable Quote Link to comment
JorgeB Posted March 22, 2021 Share Posted March 22, 2021 Did you wipe it and reboot? IF yes post new diags. Quote Link to comment
G Speed Posted March 22, 2021 Author Share Posted March 22, 2021 (edited) 6 minutes ago, JorgeB said: Did you wipe it and reboot? IF yes post new diags. Can I pull it out of the system instead? and reboot Edit: I did, and it worked Log attached without SDM drive "parity" Can I "reassign to parity 1" I think having it at parity 2 will drive me nuts LMAO Can I just do a quick format outside of unraid in windows or linux, then throw it back in parity 1 as a "new drive" or does unraid remember that drive servo-diagnostics-20210322-1003.zip Edited March 22, 2021 by G Speed Quote Link to comment
JorgeB Posted March 22, 2021 Share Posted March 22, 2021 17 minutes ago, G Speed said: Can I "reassign to parity 1" I think having it at parity 2 will drive me nuts LMAO If you assign it as parity1 and let it re-sync it will cause the same problem. Quote Link to comment
G Speed Posted March 22, 2021 Author Share Posted March 22, 2021 34 minutes ago, JorgeB said: If you assign it as parity1 and let it re-sync it will cause the same problem. Even if per say I do a quickformat as ntfs or whatever? Then throw it back in as a "new drive", so it won't do a re-sync... but an actual sync? Quote Link to comment
JorgeB Posted March 22, 2021 Share Posted March 22, 2021 Same thing, because: 1 hour ago, JorgeB said: These errors result of parity having a semi-valid btrfs filesystem because there's an odd number of array devices Because of how parity1 works having an odd number of devices can create a valid (or invalid) file-system in the parity disk, and with btrfs this can cause more issues because it will be detected in the scan: 1 hour ago, JorgeB said: Mar 22 08:24:28 Servo emhttpd: shcmd (91): /sbin/btrfs device scan 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.