Olymoly Posted July 25, 2022 Share Posted July 25, 2022 good evening, filled my drives/share, and stop the array to add a hard drive that was excluded from the share. (which i realize after i didn't need to do) restarted the array and now a cache drive is missing. i also have a parity drive missing but i am aware of that as my hard rives have an issue with my lsi controller. my docker is no longer start and i can add any data because when attempt to i receive a catastrophic error. also, i am looking to do a case/mb swap to resolve my parity issue, in the mean time i would like to know how to continue with out data loss given my current error. thank you! nastradamus-diagnostics-20220721-0428.zip Quote Link to comment
JorgeB Posted July 25, 2022 Share Posted July 25, 2022 Parity was already disabled at array start, so before the 16th, btrfs balance is failing due to not enough space, reboot and post new diags before array start. Quote Link to comment
Olymoly Posted July 25, 2022 Author Share Posted July 25, 2022 latest reboot nastradamus-diagnostics-20220725-0657.zip Quote Link to comment
JorgeB Posted July 25, 2022 Share Posted July 25, 2022 There appear to be two cables connected form the same HBA to two different expanders, Unraid doesn't support SAS multipath, connect just one cable, or both to the same expander if dual link is supported, then post new diags and the output of btrfs fi show Quote Link to comment
Olymoly Posted July 25, 2022 Author Share Posted July 25, 2022 ok wasn't expecting anything like that, i assumed i would be deleting data or changing values since it was full. i did my research to built this almost three years ago. it seems i have forgotten a lot of the basic things. however, if i am understanding you correctly, i need to remove one cable from my BPN-SAS2-836EL1 and/or add another HBA/LSI controller to the second cable. Quote Link to comment
JorgeB Posted July 25, 2022 Share Posted July 25, 2022 2 minutes ago, Olymoly said: i need to remove one cable from my BPN-SAS2-836EL1 and/or add another HBA/LSI controller to the second cable. EL1 is a single expander backplane, took another look and you have two HBAs, so looks like you have both connected to the same expander, use just one, you can connect both cables form one HBA to the expander for increased bandwidth, but don't connect two HBAs to the same expander, that can be done for redundancy when supported by the OS, Unraid doesn't not support that. Quote Link to comment
Olymoly Posted July 25, 2022 Author Share Posted July 25, 2022 ive shutdown,removed the extra third sas cable and rebooted. Second cache is showing, posting diagnostic without array start. btrfs fi show nastradamus-diagnostics-20220725-1742.zip Quote Link to comment
Solution JorgeB Posted July 26, 2022 Solution Share Posted July 26, 2022 OK, logs is clean now, first lets cancel the btrfs balance, to do that: mkdir /temp mount -o skip_balance /dev/sdj1 /temp If mount is successful type: btrfs balance cancel /temp then umount /temp Now unassign cache1, start array, stop array, re-assign both cache devices, start array and post new diags. Quote Link to comment
Olymoly Posted July 26, 2022 Author Share Posted July 26, 2022 here you go , checked docker and it loaded! nastradamus-diagnostics-20220726-0331.zip Quote Link to comment
JorgeB Posted July 26, 2022 Share Posted July 26, 2022 Run a scrub on the pool to see if it can sync both devices, check output in end, check all errors were corrected, possibly some device issues with sdi, pool is also very full, should avoid that, try keeping it under 80% used, 90% tops. Quote Link to comment
Olymoly Posted July 26, 2022 Author Share Posted July 26, 2022 thanks you, im going start to unbalnce the drives back to 80% as i was in the middle of including a new drive. i tried to look up scrubbing etc as the terms you mentioned were over my head. giv4n the current state would it be safe to do a mb swap before i do this, or after? i would like to also fic my parity situation while doing any other data integrity check. Quote Link to comment
JorgeB Posted July 26, 2022 Share Posted July 26, 2022 3 minutes ago, Olymoly said: i tried to look up scrubbing etc as the terms you mentioned were over my head. Click on cache and there will be a scrub option, need to scroll down. 4 minutes ago, Olymoly said: giv4n the current state would it be safe to do a mb swap before i do this, or after? Should be safe before, but I would recommend after just in case there are some unexpected issues. Quote Link to comment
Olymoly Posted July 29, 2022 Author Share Posted July 29, 2022 scrub and read check all passed with no errors, i added an 8tb drive and i've been moving data to allow for 10% free peace across all disks. question, my vms tab says "libvirt" service failed to start. do you know or see anything in the diagnostics that would cause this?Sent from my iPhone using Tapatalk Quote Link to comment
JorgeB Posted July 29, 2022 Share Posted July 29, 2022 I do: Jul 26 03:30:12 Nastradamus emhttpd: shcmd (36814): /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1 Jul 26 03:30:13 Nastradamus kernel: sd 7:0:7:0: [sdi] tag#520 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Jul 26 03:30:13 Nastradamus kernel: sd 7:0:7:0: [sdi] tag#520 Sense Key : 0x3 [current] Jul 26 03:30:13 Nastradamus kernel: sd 7:0:7:0: [sdi] tag#520 ASC=0x11 ASCQ=0x0 Jul 26 03:30:13 Nastradamus kernel: sd 7:0:7:0: [sdi] tag#520 CDB: opcode=0x28 28 00 00 27 fe a0 00 00 20 00 Jul 26 03:30:13 Nastradamus kernel: print_req_error: critical medium error, dev sdi, sector 2621088 Jul 26 03:30:13 Nastradamus kernel: BTRFS: device fsid c5bc54e2-9593-41d2-b010-1527e11dbd61 devid 1 transid 162 /dev/loop3 Jul 26 03:30:13 Nastradamus kernel: BTRFS info (device loop3): disk space caching is enabled Jul 26 03:30:13 Nastradamus kernel: BTRFS info (device loop3): has skinny extents Jul 26 03:30:13 Nastradamus kernel: BTRFS error (device loop3): bad tree block start, want 30883840 have 0 It suggests a device problem, please post new diags after rebooting and starting the array to confirm it's the same issue. Quote Link to comment
Olymoly Posted July 29, 2022 Author Share Posted July 29, 2022 reboot seems to have resolved it Sent from my iPhone using Tapatalk 1 Quote Link to comment
Olymoly Posted July 30, 2022 Author Share Posted July 30, 2022 rebooted and added some drives to preclear, however docker is failing to start again, seems to be the same drive and now failed smart, old age. i hardened to have already ordered larger ssd for replacement cache drives that should arrive today to help either way. i just have to figure out the procedure to swap them. nastradamus-diagnostics-20220730-0156.zip Quote Link to comment
JorgeB Posted July 30, 2022 Share Posted July 30, 2022 Docker image is corrupt, you need to delete and re-create, before doing that run a scrub on the pool and make sure all errors are corrected, no signs of a device problem that I can see, at least in these diags. Quote Link to comment
Olymoly Posted July 30, 2022 Author Share Posted July 30, 2022 oh no! does that mean i have lost all my docker app's data? how do i recreate/delete the docker image? ive scrubbed pool and didnt get any errors. hmm i will do it again and check error correction. thanks again for your help. Quote Link to comment
Olymoly Posted July 30, 2022 Author Share Posted July 30, 2022 i wanted to increase my cache drive size anyways would it better to go that route, regardless of erros kn the pool. im guessing this pool needs to be fix either way before i move on to the new drives. Quote Link to comment
itimpi Posted July 30, 2022 Share Posted July 30, 2022 13 minutes ago, Olymoly said: oh no! does that mean i have lost all my docker app's data? how do i recreate/delete the docker image? No. This section of the online documentations accessible via the ‘Manual’ link at the bottom of the GUI covers what you need to do. Quote Link to comment
Olymoly Posted July 30, 2022 Author Share Posted July 30, 2022 10 minutes ago, itimpi said: No. This section of the online documentations accessible via the ‘Manual’ link at the bottom of the GUI covers what you need to do. ok cool thanks Quote Link to comment
Olymoly Posted July 30, 2022 Author Share Posted July 30, 2022 12 hours ago, JorgeB said: Docker image is corrupt, you need to delete and re-create, before doing that run a scrub on the pool and make sure all errors are corrected, no signs of a device problem that I can see, at least in these diags. ran the scrub with correction, before i continue anything else i need to do to make sure all errors are corrected? Quote Link to comment
JorgeB Posted July 31, 2022 Share Posted July 31, 2022 Lots of uncorrectable errors, run another one and post the diags after it's done. Quote Link to comment
Olymoly Posted July 31, 2022 Author Share Posted July 31, 2022 nastradamus-diagnostics-20220731-0841.zip Quote Link to comment
JorgeB Posted July 31, 2022 Share Posted July 31, 2022 Cache1 appears to be failing, run an extended SMART test. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.