Jump to content

Cache Drive suddenly gone down


Recommended Posts

i was watching on plex and then something went flaky..  i exited and then noticed my vms stopped working

 

i rebooted  my server  and then all Dockers are gone and VMS

then noticed my Cache drive gone..

its detectable but unmountable..  i stuck it in windows machine and ran WD Dashboard..  it shows up as 0 for file size etc and 90 percent left of the SSD..

i do have backup appdata/cache   to the hard drive but i not sure how often it backs up

 

is the drive bricked?  what causes this?  is it fixable or SSD done..  and if i get 2 sdds  for cache  if this happens again does it wreck the 2nd one same time?

im fairly new with SSDs  been doing regular hards drives for 30 yrs  so  i dont know the lifes of SSDs etc

 

 

the disk log says 

ErrorWarningSystemArray


Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] Write Protect is off
Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] Mode Sense: 7f 00 10 08
Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
Feb 3 23:12:03 Tower kernel: sde: sde1
Feb 3 23:12:03 Tower kernel: sd 8:0:2:0: [sde] Attached SCSI disk
Feb 3 23:12:03 Tower kernel: BTRFS: device fsid eab034ad-96d1-4e0f-bc9f-aa82cab0d3f2 devid 1 transid 2813515 /dev/sde1
Feb 3 23:12:59 Tower emhttpd: WDC_WDS100T2B0A-00SM50_1821C6803409 (sde) 512 1953525168
Feb 3 23:16:08 Tower unassigned.devices: Adding disk '/dev/sde1'...
Feb 3 23:16:08 Tower unassigned.devices: Mount drive command: /sbin/mount -t btrfs -o rw,auto,async,noatime,nodiratime,discard '/dev/sde1' '/mnt/disks/WDC_WDS100T2B0A-00SM50_1821C6803409'
Feb 3 23:16:08 Tower kernel: BTRFS info (device sde1): turning on discard
Feb 3 23:16:08 Tower kernel: BTRFS info (device sde1): disk space caching is enabled
Feb 3 23:16:08 Tower kernel: BTRFS info (device sde1): has skinny extents
Feb 3 23:16:08 Tower kernel: BTRFS error (device sde1): parent transid verify failed on 745510895616 wanted 2813504 found 2812744
Feb 3 23:16:08 Tower kernel: BTRFS error (device sde1): failed to read block groups: -5
Feb 3 23:16:08 Tower kernel: BTRFS error (device sde1): open_ctree failed
Feb 3 23:16:08 Tower unassigned.devices: Mount of '/dev/sde1' failed. Error message: mount: /mnt/disks/WDC_WDS100T2B0A-00SM50_1821C6803409: wrong fs type, bad option, bad superblock on /dev/sde1, missing codepage or helper program, or other error.
Feb 3 23:16:12 Tower unassigned.devices: Adding disk '/dev/sde1'...
Feb 3 23:16:12 Tower unassigned.devices: Mount drive command: /sbin/mount -t btrfs -o rw,auto,async,noatime,nodiratime,discard '/dev/sde1' '/mnt/disks/WDC_WDS100T2B0A-00SM50_1821C6803409'
Feb 3 23:16:12 Tower kernel: BTRFS info (device sde1): turning on discard
Feb 3 23:16:12 Tower kernel: BTRFS info (device sde1): disk space caching is enabled
Feb 3 23:16:12 Tower kernel: BTRFS info (device sde1): has skinny extents
Feb 3 23:16:12 Tower kernel: BTRFS error (device sde1): parent transid verify failed on 745510895616 wanted 2813504 found 2812744
Feb 3 23:16:12 Tower kernel: BTRFS error (device sde1): failed to read block groups: -5
Feb 3 23:16:12 Tower kernel: BTRFS error (device sde1): open_ctree failed
Feb 3 23:16:12 Tower unassigned.devices: Mount of '/dev/sde1' failed. Error message: mount: /mnt/disks/WDC_WDS100T2B0A-00SM50_1821C6803409: wrong fs type, bad option, bad superblock on /dev/sde1, missing codepage or helper program, or other error.
Feb 3 23:20:07 Tower emhttpd: WDC_WDS100T2B0A-00SM50_1821C6803409 (sde) 512 1953525168
Feb 3 23:20:07 Tower emhttpd: import 30 cache device: (sde) WDC_WDS100T2B0A-00SM50_1821C6803409
Feb 3 23:21:50 Tower emhttpd: shcmd (348): mount -t btrfs -o noatime,nodiratime /dev/sde1 /mnt/cache
Feb 3 23:21:50 Tower kernel: BTRFS info (device sde1): disk space caching is enabled
Feb 3 23:21:50 Tower kernel: BTRFS info (device sde1): has skinny extents
Feb 3 23:21:50 Tower kernel: BTRFS error (device sde1): parent transid verify failed on 745510895616 wanted 2813504 found 2812744
Feb 3 23:21:50 Tower kernel: BTRFS error (device sde1): failed to read block groups: -5
Feb 3 23:21:50 Tower root: mount: /mnt/cache: wrong fs type, bad option, bad superblock on /dev/sde1, missing codepage or helper program, or other error.
Feb 3 23:21:50 Tower kernel: BTRFS error (device sde1): open_ctree failed

 

so not sure if its fixable or not

Link to comment

i was able to re add cache drive but it wont mount... and i guess my backup cache/appdata hasnt been backed up since june of last year and it didnt back up the vms 

so all that is lost ugh......

so what can i do now? and how do i make backups of vms next time ..  such an idiot i am i feel....

 

cache.PNG

Link to comment

is there data recovery software for unraid??  I able to scan the drive in windows  trying MiniPower Tool Data Recovery   and its detecting files  but its not finding the file format it was in  but separating bmps swf jpg  in own folders

I gonna try easus  I googled for help with ssds  but these are all windows programs and I dunno but does unraid or is there a help  to recovert btrfs to get the files back  and I remember I heard in past the diagnostic file is only good if I didn't reboot the computer  but I rebooted the computer a couple times before I gave diagnostics  so I hope it helps...  very frustrating  I thought I had a good setup  parity and drives   but I didn't do the cache drive 

so all domains, dockers, apps, gone 😞  and I don't even know how you back up the vms  if there is an app to backup too

Link to comment

question i  didnt do 2 SSDs as i didnt have one   and i thought SSDS dont fail boy i an idiot as i made sure hds had a Parity

 

but the SSD adding a 2nd i  is that the Parity  or is there a section for Partiy Disk for SSD   as whats to stop what happened  for this SSD to not do it to my backup same time

i able to to scan in windows and it finds lots of files  all over the place   so i guess SSD isnt broken  just lost the partition    

i know before it went down   and all nuts

i was watching Plex,, I was copying files in Krusader from my Array to a unnasigned drive,,,  and my Transmission downloader was constantly showing   more and more file location errors

 

best i can do explain.. and then i rebooted,... and then everything was gone

except the arry its ok  its just what was on the SSD the 4 folders 

hope that helps if the diagnostic doesnt  as i did reboot the computer... sorry if that made things worse

 

Link to comment

ok thank you  and did the diagnostic show anything? and running 2 ssds  will this prevent this issue in future?  the only difference I did that day was I upgraded to the latest unraid.. and like 12 hours later  I had the issue..  

 

and is there a backup for the vms to copy to the array?  or any other tricks I should be doing to be better from this ssd data loss..

 

ill try those procedures now.. thanks

 

Link to comment
2 minutes ago, comet424 said:

ok thank you  and did the diagnostic show anything? and running 2 ssds  will this prevent this issue in future?

Problem was a corrupt filesystem due to unclean unmount (in this case caused by the device dropping offline), and no, for this a mirror wouldn't help, an SSD with power loss protection might have.

 

5 minutes ago, comet424 said:

and is there a backup for the vms to copy to the array?

There's a VM backup plugin, you can also use snapshots together with send/receive, point is, anything important needs to be regularly backed up.

 

Link to comment

oh ok  so is that something I can add to an SSD a power loss protection..  or that's built into the SSD.. and is there better SSDS then WD? I stuck with WD as its easy to get replacements and they been pretty good all these years.

I also swapped the SSD to a different SATA cable  figured maybe that was the issue.. I do know I have issues powering this comp up after a reboot it will hang  as I working on replacing the board..  issue I have is the current board is a Intel socket and my new board is a Ryzen AMD  and I haven't tested  if I can just switch board and plug all back in and it will work fine or if it go haywire  as its a transisition from INTEL to AMD... so haven't tried.. been testing the AMD board for stability..

 

also if the mirror doesn't help doesn't cache offer Parity too?    and can the regular hard drives with parity  also get corrupted like this HD..  I currently do  nightly backups of my important files to my backserver in my house  but I wasn't doing  cache/appdata  or the VMS...  but  

ya  ill get back to you once I do these tests  hope I get the files back and ill do better backups  and im also on a UPS so I didn't get data loss  go figure...

 

thank you for the help so far  I appreciate it.. lets hope these fix's fix It 🙂

 

 

Link to comment

oh ic  so always keep backups ….  I come across an issue I don't know  do I choose Yes or no?  

I had to make a restore directory on the disk2..   but I get a looping.. and in the one image  you don't see  mkdir restore  as I did it in another ssh window

but I not sure do I says yes or no and there is a Option a? so I not sure what to press didn't wanna touch nothing to not screw it up

 

disk2.PNG

disk3.PNG

Link to comment

oh thank you my bad.. I have been hitting Y 52 times so far lol    and ok  thank you... so lets hope all this works I appreciate your help and knowledge of what you know etc... as I was stuck.. I did read another article they use  Scrub?  I seen that somewhere in unraid but cant remember where.. as  as long as the unraid is up and running  I don't play with it much lol

 

Link to comment

ok I think... so  it stopped  it didn't do everything  what I do now re run it again  with the -I  to skip errors?

btrfs restore -vi /dev/sdf1 /mnt/disk2/restore

 

 

update I trying the -vi    so its continuing  I just ask a lot of questions as I don't know and I don't wanna ruin anything  im a visual learner then a reader 

disk5.PNG

Edited by comet424
Link to comment

so the drive is finally done...  I have an older regular laptop 750meg drive I going to use as a cache to copy the restored stuff to it to see if its working  didn't wanna over write the ssd incase it didn't work

I trying to unmount my drives  but its retrying  could been the reason it failed last night??  is there a way to find out whats going on???  its been sitting like this for 10 min  

 

disk6.PNG

Link to comment

so the computer was staying on even though I powered off... I powered up ssh in  and edited the disk.cfg   to start array yes to no  and then rebooted..

I installed a 750gb regular drive laptop to test... but with the issues you see I guess I be swapping the motherboard sooner..

and hopefully it fix's  the issues   I currently did   

 

"cp -r /mnt/disk2/restore /mnt/cache"    

so hopefully when I reboots I find that it works perfectly fine... just be nice to know eactly  all the issues that's its having as common fix's never finds any errors least nothing serious

 

Link to comment

so I got it to work with the spare laptop drive to test  the  vms seem to be working  so that parts fixed and the dockers showed up  

 

I was not able to get scrub tools to work on the SSD    and I couldn't get the scrub button to show up under SSD  when its an unnasigned drive..

but  I switching to the SSD as cache and going to recopy from the restore location

 

but I still finding it wont unmount the disks still properly  it will still show  retrying to unmount shares...  is there anything I can fix...  or is it something I need to setup my amd computer a new hard drive and move the files from one server to the new one and reconfigure things...  so id have newly formatted drives....  

 

will diagnostic file say if anything else I having is corrupt?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...