comet424 Posted February 4, 2020 Share Posted February 4, 2020 i was watching on plex and then something went flaky.. i exited and then noticed my vms stopped working i rebooted my server and then all Dockers are gone and VMS then noticed my Cache drive gone.. its detectable but unmountable.. i stuck it in windows machine and ran WD Dashboard.. it shows up as 0 for file size etc and 90 percent left of the SSD.. i do have backup appdata/cache to the hard drive but i not sure how often it backs up is the drive bricked? what causes this? is it fixable or SSD done.. and if i get 2 sdds for cache if this happens again does it wreck the 2nd one same time? im fairly new with SSDs been doing regular hards drives for 30 yrs so i dont know the lifes of SSDs etc the disk log says ErrorWarningSystemArray Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] Write Protect is off Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] Mode Sense: 7f 00 10 08 Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA Feb 3 23:12:03 Tower kernel: sde: sde1 Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] Attached SCSI disk Feb 3 23:12:03 Tower kernel: BTRFS: device fsid eab034ad-96d1-4e0f-bc9f-aa82cab0d3f2 devid 1 transid 2813515 /dev/sde1 Feb 3 23:12:59 Tower emhttpd: WDC_WDS100T2B0A-00SM50_1821C6803409 (sde) 512 1953525168 Feb 3 23:16:08 Tower unassigned.devices: Adding disk '/dev/sde1'... Feb 3 23:16:08 Tower unassigned.devices: Mount drive command: /sbin/mount -t btrfs -o rw,auto,async,noatime,nodiratime,discard '/dev/sde1' '/mnt/disks/WDC_WDS100T2B0A-00SM50_1821C6803409' Feb 3 23:16:08 Tower kernel: BTRFS info (device sde1): turning on discard Feb 3 23:16:08 Tower kernel: BTRFS info (device sde1): disk space caching is enabled Feb 3 23:16:08 Tower kernel: BTRFS info (device sde1): has skinny extents Feb 3 23:16:08 Tower kernel: BTRFS error (device sde1): parent transid verify failed on 745510895616 wanted 2813504 found 2812744 Feb 3 23:16:08 Tower kernel: BTRFS error (device sde1): failed to read block groups: -5 Feb 3 23:16:08 Tower kernel: BTRFS error (device sde1): open_ctree failed Feb 3 23:16:08 Tower unassigned.devices: Mount of '/dev/sde1' failed. Error message: mount: /mnt/disks/WDC_WDS100T2B0A-00SM50_1821C6803409: wrong fs type, bad option, bad superblock on /dev/sde1, missing codepage or helper program, or other error. Feb 3 23:16:12 Tower unassigned.devices: Adding disk '/dev/sde1'... Feb 3 23:16:12 Tower unassigned.devices: Mount drive command: /sbin/mount -t btrfs -o rw,auto,async,noatime,nodiratime,discard '/dev/sde1' '/mnt/disks/WDC_WDS100T2B0A-00SM50_1821C6803409' Feb 3 23:16:12 Tower kernel: BTRFS info (device sde1): turning on discard Feb 3 23:16:12 Tower kernel: BTRFS info (device sde1): disk space caching is enabled Feb 3 23:16:12 Tower kernel: BTRFS info (device sde1): has skinny extents Feb 3 23:16:12 Tower kernel: BTRFS error (device sde1): parent transid verify failed on 745510895616 wanted 2813504 found 2812744 Feb 3 23:16:12 Tower kernel: BTRFS error (device sde1): failed to read block groups: -5 Feb 3 23:16:12 Tower kernel: BTRFS error (device sde1): open_ctree failed Feb 3 23:16:12 Tower unassigned.devices: Mount of '/dev/sde1' failed. Error message: mount: /mnt/disks/WDC_WDS100T2B0A-00SM50_1821C6803409: wrong fs type, bad option, bad superblock on /dev/sde1, missing codepage or helper program, or other error. Feb 3 23:20:07 Tower emhttpd: WDC_WDS100T2B0A-00SM50_1821C6803409 (sde) 512 1953525168 Feb 3 23:20:07 Tower emhttpd: import 30 cache device: (sde) WDC_WDS100T2B0A-00SM50_1821C6803409 Feb 3 23:21:50 Tower emhttpd: shcmd (348): mount -t btrfs -o noatime,nodiratime /dev/sde1 /mnt/cache Feb 3 23:21:50 Tower kernel: BTRFS info (device sde1): disk space caching is enabled Feb 3 23:21:50 Tower kernel: BTRFS info (device sde1): has skinny extents Feb 3 23:21:50 Tower kernel: BTRFS error (device sde1): parent transid verify failed on 745510895616 wanted 2813504 found 2812744 Feb 3 23:21:50 Tower kernel: BTRFS error (device sde1): failed to read block groups: -5 Feb 3 23:21:50 Tower root: mount: /mnt/cache: wrong fs type, bad option, bad superblock on /dev/sde1, missing codepage or helper program, or other error. Feb 3 23:21:50 Tower kernel: BTRFS error (device sde1): open_ctree failed so not sure if its fixable or not Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 i was able to re add cache drive but it wont mount... and i guess my backup cache/appdata hasnt been backed up since june of last year and it didnt back up the vms so all that is lost ugh...... so what can i do now? and how do i make backups of vms next time .. such an idiot i am i feel.... Quote Link to comment
JorgeB Posted February 4, 2020 Share Posted February 4, 2020 Please post the diagnostics: Tools-> Diagnostics Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 tower-diagnostics-20200204-0823.zip Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 is there data recovery software for unraid?? I able to scan the drive in windows trying MiniPower Tool Data Recovery and its detecting files but its not finding the file format it was in but separating bmps swf jpg in own folders I gonna try easus I googled for help with ssds but these are all windows programs and I dunno but does unraid or is there a help to recovert btrfs to get the files back and I remember I heard in past the diagnostic file is only good if I didn't reboot the computer but I rebooted the computer a couple times before I gave diagnostics so I hope it helps... very frustrating I thought I had a good setup parity and drives but I didn't do the cache drive so all domains, dockers, apps, gone 😞 and I don't even know how you back up the vms if there is an app to backup too Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 question i didnt do 2 SSDs as i didnt have one and i thought SSDS dont fail boy i an idiot as i made sure hds had a Parity but the SSD adding a 2nd i is that the Parity or is there a section for Partiy Disk for SSD as whats to stop what happened for this SSD to not do it to my backup same time i able to to scan in windows and it finds lots of files all over the place so i guess SSD isnt broken just lost the partition i know before it went down and all nuts i was watching Plex,, I was copying files in Krusader from my Array to a unnasigned drive,,, and my Transmission downloader was constantly showing more and more file location errors best i can do explain.. and then i rebooted,... and then everything was gone except the arry its ok its just what was on the SSD the 4 folders hope that helps if the diagnostic doesnt as i did reboot the computer... sorry if that made things worse Quote Link to comment
JorgeB Posted February 4, 2020 Share Posted February 4, 2020 There are some btrfs recovery options here, try them in order, but for this case btrfs restore is likely the one that will work best. Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 ok thank you and did the diagnostic show anything? and running 2 ssds will this prevent this issue in future? the only difference I did that day was I upgraded to the latest unraid.. and like 12 hours later I had the issue.. and is there a backup for the vms to copy to the array? or any other tricks I should be doing to be better from this ssd data loss.. ill try those procedures now.. thanks Quote Link to comment
JorgeB Posted February 4, 2020 Share Posted February 4, 2020 2 minutes ago, comet424 said: ok thank you and did the diagnostic show anything? and running 2 ssds will this prevent this issue in future? Problem was a corrupt filesystem due to unclean unmount (in this case caused by the device dropping offline), and no, for this a mirror wouldn't help, an SSD with power loss protection might have. 5 minutes ago, comet424 said: and is there a backup for the vms to copy to the array? There's a VM backup plugin, you can also use snapshots together with send/receive, point is, anything important needs to be regularly backed up. Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 oh ok so is that something I can add to an SSD a power loss protection.. or that's built into the SSD.. and is there better SSDS then WD? I stuck with WD as its easy to get replacements and they been pretty good all these years. I also swapped the SSD to a different SATA cable figured maybe that was the issue.. I do know I have issues powering this comp up after a reboot it will hang as I working on replacing the board.. issue I have is the current board is a Intel socket and my new board is a Ryzen AMD and I haven't tested if I can just switch board and plug all back in and it will work fine or if it go haywire as its a transisition from INTEL to AMD... so haven't tried.. been testing the AMD board for stability.. also if the mirror doesn't help doesn't cache offer Parity too? and can the regular hard drives with parity also get corrupted like this HD.. I currently do nightly backups of my important files to my backserver in my house but I wasn't doing cache/appdata or the VMS... but ya ill get back to you once I do these tests hope I get the files back and ill do better backups and im also on a UPS so I didn't get data loss go figure... thank you for the help so far I appreciate it.. lets hope these fix's fix It 🙂 Quote Link to comment
JorgeB Posted February 4, 2020 Share Posted February 4, 2020 Power loss protection is present mostly on enterprise devices, array devices can also get filesystem corruption after an unclean shutdown/unmount, and parity can't help with that. Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 oh ic so always keep backups …. I come across an issue I don't know do I choose Yes or no? I had to make a restore directory on the disk2.. but I get a looping.. and in the one image you don't see mkdir restore as I did it in another ssh window but I not sure do I says yes or no and there is a Option a? so I not sure what to press didn't wanna touch nothing to not screw it up Quote Link to comment
JorgeB Posted February 4, 2020 Share Posted February 4, 2020 Always yes, but the file might or not be OK, btrfs restore doesn't verify checksums, it's a recovery option, so priority is to recover anything. Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 is there a way for checksums? and what the does the A mean? I keep having to hit Y(enter) a lot no option for Yes for all it did 5 Yes's with bunch of numbers before I get what you see in the picture Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 its going again what does the looping issue mean is that an error it finds? Quote Link to comment
JorgeB Posted February 4, 2020 Share Posted February 4, 2020 3 minutes ago, comet424 said: I keep having to hit Y(enter) a for always Quote Link to comment
JorgeB Posted February 4, 2020 Share Posted February 4, 2020 2 minutes ago, comet424 said: what does the looping issue mean is that an error it finds? I read the explanation some time ago but don't remember the details, I do remember it's not necessarily an error. Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 oh thank you my bad.. I have been hitting Y 52 times so far lol and ok thank you... so lets hope all this works I appreciate your help and knowledge of what you know etc... as I was stuck.. I did read another article they use Scrub? I seen that somewhere in unraid but cant remember where.. as as long as the unraid is up and running I don't play with it much lol Quote Link to comment
JorgeB Posted February 4, 2020 Share Posted February 4, 2020 Scrub verifies checksums for all blocks on a btrfs filesystem, option is available after clicking on that device on the main GUI page, both for cache and array device. Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 (edited) ok I think... so it stopped it didn't do everything what I do now re run it again with the -I to skip errors? btrfs restore -vi /dev/sdf1 /mnt/disk2/restore update I trying the -vi so its continuing I just ask a lot of questions as I don't know and I don't wanna ruin anything im a visual learner then a reader Edited February 4, 2020 by comet424 Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 so the drive is finally done... I have an older regular laptop 750meg drive I going to use as a cache to copy the restored stuff to it to see if its working didn't wanna over write the ssd incase it didn't work I trying to unmount my drives but its retrying could been the reason it failed last night?? is there a way to find out whats going on??? its been sitting like this for 10 min Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 after 20 min of sitting there I trying a power down... cant even save don't mount drive on boot up.... so doing a power down and then try a power up and turn off array.. but something is causing this isn't it?? does the diagnostic file tell you why I having issues still Quote Link to comment
comet424 Posted February 4, 2020 Author Share Posted February 4, 2020 so the computer was staying on even though I powered off... I powered up ssh in and edited the disk.cfg to start array yes to no and then rebooted.. I installed a 750gb regular drive laptop to test... but with the issues you see I guess I be swapping the motherboard sooner.. and hopefully it fix's the issues I currently did "cp -r /mnt/disk2/restore /mnt/cache" so hopefully when I reboots I find that it works perfectly fine... just be nice to know eactly all the issues that's its having as common fix's never finds any errors least nothing serious Quote Link to comment
comet424 Posted February 5, 2020 Author Share Posted February 5, 2020 so I got it to work with the spare laptop drive to test the vms seem to be working so that parts fixed and the dockers showed up I was not able to get scrub tools to work on the SSD and I couldn't get the scrub button to show up under SSD when its an unnasigned drive.. but I switching to the SSD as cache and going to recopy from the restore location but I still finding it wont unmount the disks still properly it will still show retrying to unmount shares... is there anything I can fix... or is it something I need to setup my amd computer a new hard drive and move the files from one server to the new one and reconfigure things... so id have newly formatted drives.... will diagnostic file say if anything else I having is corrupt? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.