Jump to content

gmight

Members
  • Posts

    13
  • Joined

  • Last visited

Posts posted by gmight

  1. Yea, with the corruption of my docker image last week and this weekend the log tree issue, I think you're right about this one circling back.  If it does, I'll rebuild and maybe do a ZFS cache.   I've been looking at what benefits I might gain, probably not many right? ZFS is best to protect from the human factor not the file system or other unknows.  Regardless, I appreciate the advice and the fix.  Marked solved, you rock dude.

    • Like 1
  2. Nice, that brought the array up on the first go.

     

    Commands:

    root@semperfi:/mnt/user# btrfs rescue zero-log /dev/nvme0n1p1
    parent transid verify failed on 3821051052032 wanted 2451649 found 2451645
    parent transid verify failed on 3821051052032 wanted 2451649 found 2451645
    Couldn't setup log root tree
    Clearing log on /dev/nvme0n1p1, previous log_root 3821051052032, level 0
    root@semperfi:/mnt/user# btrfs rescue zero-log /dev/nvme1n1p1
    Clearing log on /dev/nvme1n1p1, previous log_root 0, level 0

     

    Ran Scrub immediately after "no errors found".

    Ran fix common problems scan " nothing to report"

     

    grep for nvme

    Spoiler

    root@semperfi:~# cat /var/log/syslog | grep -E "nvme"
    Dec 10 06:19:50 semperfi emhttpd: read SMART /dev/nvme0n1
    Dec 10 06:19:50 semperfi emhttpd: read SMART /dev/nvme1n1
    Dec 10 06:19:56 semperfi emhttpd: INTEL_SSDPE2MD800G4_CVFT72450037800QGN (nvme0n1) 512 1562824368
    Dec 10 06:19:56 semperfi emhttpd: INTEL_SSDPE2MD800G4_CVFT5200000X800HGN (nvme1n1) 512 1562824368
    Dec 10 06:19:56 semperfi emhttpd: import 30 cache device: (nvme1n1) INTEL_SSDPE2MD800G4_CVFT5200000X800HGN
    Dec 10 06:19:56 semperfi emhttpd: import 31 cache device: (nvme0n1) INTEL_SSDPE2MD800G4_CVFT72450037800QGN
    Dec 10 06:19:57 semperfi emhttpd: read SMART /dev/nvme0n1
    Dec 10 06:19:57 semperfi emhttpd: read SMART /dev/nvme1n1
    Dec 10 06:20:31 semperfi emhttpd:  devid    1 size 745.21GiB used 374.06GiB path /dev/nvme0n1p1
    Dec 10 06:20:31 semperfi emhttpd:  devid    2 size 745.21GiB used 374.06GiB path /dev/nvme1n1p1
    Dec 10 06:20:31 semperfi kernel: BTRFS info (device nvme0n1p1): using crc32c (crc32c-intel) checksum algorithm
    Dec 10 06:20:31 semperfi kernel: BTRFS info (device nvme0n1p1): using free space tree
    Dec 10 06:20:31 semperfi kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
    Dec 10 06:20:31 semperfi kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
    Dec 10 06:20:31 semperfi kernel: BTRFS info (device nvme0n1p1): enabling ssd optimizations
    Dec 10 06:20:31 semperfi kernel: BTRFS info (device nvme0n1p1): cleaning free space cache v1
    Dec 10 06:20:31 semperfi kernel: BTRFS info (device nvme0n1p1): checking UUID tree
    Dec 10 06:20:31 semperfi kernel: BTRFS info (device nvme0n1p1: state M): turning on async discard
    Dec 10 06:23:56 semperfi emhttpd: spinning down /dev/nvme1n1
    Dec 10 06:23:56 semperfi emhttpd: sdspin /dev/nvme1n1 down: 1

    Dec 10 06:29:46 semperfi kernel: BTRFS info (device nvme0n1p1): scrub: started on devid 1
    Dec 10 06:29:46 semperfi kernel: BTRFS info (device nvme0n1p1): scrub: started on devid 2
    Dec 10 06:33:49 semperfi kernel: BTRFS info (device nvme0n1p1): scrub: finished on devid 2 with status: 0
    Dec 10 06:33:51 semperfi kernel: BTRFS info (device nvme0n1p1): scrub: finished on devid 1 with status: 0

     

    root@semperfi:~# dmesg | grep error
    [  473.318420] BTRFS error (device nvme0n1p1): parent transid verify failed on logical 3821051052032 mirror 2 wanted 2451649 found 2451645
    [  473.318587] BTRFS error (device nvme0n1p1): parent transid verify failed on logical 3821051052032 mirror 1 wanted 2451649 found 2451645
    [  473.321780] BTRFS error (device nvme0n1p1): open_ctree failed
    [41584.904323] BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
    [41584.906278] BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
    [41584.909332] BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
    [41584.909725] BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0

     

    At this point I wonder if there's anything else that I need to look at? Do you recommend any other diagnostics that I should run on cache to make sure it's normal?  And if it's not too much to ask, how does the log tree get messed up like that?  This time, I was away from my system, no user intervention.  In other words, how can I prevent this from happening again and I'm trying to learn a bit more.  I threw a fresh diag on the thread.

     

    I appreciate the assist JorgeB, hella of a save bro!  Thank you!

     

    semperfi-diagnostics-20231210-0701.zip

  3. I've been working a problem with my unraid sever for about 3 weeks.  The infamous tainted CPU problem (macvlan issue) kernel panic attached for giggles.  Note that the server had been up and stable since 6.9.2.  Well, I had some power issues in my area, found out that my UPS has a bad battery and when the power issue happened the server shutdown as expected when you have a bad battery.  Powered back on, everything good then within 12 hours bam! macvlan kernel errors and locked UI.  Upgraded to 6.12.4 since it promised to address macvlan stuff - nope, didnt work.  At that point I did a bunch of reading, went from macvlan to ipvlan, I made the necessary adjustments on my switch and kernel panics end - yay. 

     

    One thing worth mentioning.  With all the reboots and locks ups.  My docker image got corrupted (its what I think).  So last weekend I spent some time restoring my dockers image.  Fairly easy, no issues restoring but my VM's were there and worked just fine.  No issues.  Fast forward to today, everything was good and normal - I ran an errand and when I got back my cache was erroring out. 

     

    Syslog reports the following:
    Dec  9 13:01:16 semperfi kernel: BTRFS critical (device nvme0n1p1): corrupt leaf: block=3821052706816 slot=11 extent bytenr=3749963485184 len=8192 unknown inline ref type: 129
    Dec  9 13:01:16 semperfi kernel: BTRFS info (device nvme0n1p1): leaf 3821052706816 gen 2451645 total ptrs 166 free space 3335 owner 2

    Dec  9 13:01:16 semperfi kernel: BTRFS error (device nvme0n1p1): block=3821052706816 write time tree block corruption detected
    Dec  9 13:01:16 semperfi kernel: BTRFS: error (device nvme0n1p1) in btrfs_commit_transaction:2494: errno=-5 IO failure (Error while writing out transaction)
    Dec  9 13:01:16 semperfi kernel: BTRFS info (device nvme0n1p1: state E): forced readonly

     

    grep syslog for nvme after reboot:
    Dec  9 18:48:19 semperfi kernel: nvme nvme0: pci function 0000:01:00.0
    Dec  9 18:48:19 semperfi kernel: nvme nvme1: pci function 0000:04:00.0
    Dec  9 18:48:19 semperfi kernel: nvme nvme1: 24/0/0 default/read/poll queues
    Dec  9 18:48:19 semperfi kernel: nvme1n1: p1
    Dec  9 18:48:19 semperfi kernel: BTRFS: device fsid 1d7002b3-d702-4b6d-9706-05ab85e49ff2 devid 2 transid 2451644 /dev/nvme1n1p1 scanned by udevd (1166)
    Dec  9 18:48:19 semperfi kernel: nvme nvme0: 24/0/0 default/read/poll queues
    Dec  9 18:48:19 semperfi kernel: nvme0n1: p1
    Dec  9 18:48:19 semperfi kernel: BTRFS: device fsid 1d7002b3-d702-4b6d-9706-05ab85e49ff2 devid 1 transid 2451644 /dev/nvme0n1p1 scanned by udevd (1166)
    Dec  9 18:48:54 semperfi emhttpd: INTEL_SSDPE2MD800G4_CVFT72450037800QGN (nvme0n1) 512 1562824368
    Dec  9 18:48:54 semperfi emhttpd: INTEL_SSDPE2MD800G4_CVFT5200000X800HGN (nvme1n1) 512 1562824368
    Dec  9 18:48:54 semperfi emhttpd: import 30 cache device: (nvme1n1) INTEL_SSDPE2MD800G4_CVFT5200000X800HGN
    Dec  9 18:48:54 semperfi emhttpd: import 31 cache device: (nvme0n1) INTEL_SSDPE2MD800G4_CVFT72450037800QGN
    Dec  9 18:48:54 semperfi emhttpd: read SMART /dev/nvme0n1
    Dec  9 18:48:54 semperfi emhttpd: read SMART /dev/nvme1n1
    Dec  9 18:52:12 semperfi kernel: nvme1n1: p1
    Dec  9 18:52:17 semperfi kernel: nvme0n1: p1
    Dec  9 18:55:31 semperfi emhttpd:  devid    1 size 745.21GiB used 374.06GiB path /dev/nvme0n1p1
    Dec  9 18:55:31 semperfi emhttpd:  devid    2 size 745.21GiB used 374.06GiB path /dev/nvme1n1p1
    Dec  9 18:55:31 semperfi kernel: BTRFS info (device nvme0n1p1): using crc32c (crc32c-intel) checksum algorithm
    Dec  9 18:55:31 semperfi kernel: BTRFS info (device nvme0n1p1): using free space tree
    Dec  9 18:55:31 semperfi kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
    Dec  9 18:55:31 semperfi kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
    Dec  9 18:55:31 semperfi kernel: BTRFS info (device nvme0n1p1): enabling ssd optimizations
    Dec  9 18:55:31 semperfi kernel: BTRFS info (device nvme0n1p1): start tree-log replay
    Dec  9 18:55:31 semperfi kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on logical 3821051052032 mirror 2 wanted 2451649 found 2451645
    Dec  9 18:55:31 semperfi kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on logical 3821051052032 mirror 1 wanted 2451649 found 2451645
    Dec  9 18:55:31 semperfi kernel: BTRFS warning (device nvme0n1p1): failed to read log tree
    Dec  9 18:55:31 semperfi root: mount: /mnt/cache: can't read superblock on /dev/nvme0n1p1.
    Dec  9 18:55:31 semperfi kernel: BTRFS error (device nvme0n1p1): open_ctree failed
    Dec  9 18:57:43 semperfi kernel: nvme1n1: p1
    Dec  9 18:59:29 semperfi kernel: nvme0n1: p1

     

    Not cool.  So naturally, I reboot and when i manually bring up the array, it comes up without cache.  So no docker or VMs, the data in the array is untouched - huge plus.

     

    When I got to Fix Common Problems I see this message:

    image.thumb.png.168f4363e838bc39731d8a365344c860.png

     

    My dual NVME cache reports this:
    image.thumb.png.b7095f53e5946443332fa5643c5435f0.png

     

    All my docker and VM's should be there but of course, I don't know what to do at this point.  The above errors links me to this particular forum if I'm dealing with the "unmountable disk..." error.  So here I am.  I hope that I can get some help for this one, i rather not start building both docker and VMs after a format.  I've attached my diag to the thread.

     

    System:

    MB=Supermicro X11SPi-TF

    CPU=Xeon 4214 

    MEM=256gb (4x64gb) Micron Tech @2400MT/s

    Cache= 2 intel P3700 U.2 NVMEs

     

    dmesg | grep error
    [  473.318420] BTRFS error (device nvme0n1p1): parent transid verify failed on logical 3821051052032 mirror 2 wanted 2451649 found 2451645
    [  473.318587] BTRFS error (device nvme0n1p1): parent transid verify failed on logical 3821051052032 mirror 1 wanted 2451649 found 2451645
    [  473.321780] BTRFS error (device nvme0n1p1): open_ctree failed

     

    image.thumb.png.6d420e35477380976fc7a03a298b4025.png

     

    I read a few threads so far, I ended up here.  If this is where I need to go, then I'll just wipe the disk and get my containers back.   I'll get my VM's up as needed.  Also ppl point to memory?  I can easily swap that out if that's a thing.  I will start a memtest tonight when I go to sleep to see if anything pops - I doubt it but I'll do it anyway.
     

    Again, i was fully recovered from the macvlan\kernel panics but all of sudden this happens within 24 hours of adding VLANs.   I've never had this kind of issue before so I'm hoping I can get some guidance from my fellow unraid users.  Sorry if I'm all over the place, naturally, I'm in panic but trying to keep my cool the best I can.  I'm still going through logs, trying to find a clue.  

     

    Thanks for your feedback.

    semperfi-diagnostics-20231209-1939.zip kernel panic dec5.txt

  4. 15 minutes ago, bonienl said:

    There is a top folder on your flash device called "EFI" or "EFI-".

    Rename the folder to EFI if want to use UEFI or rename the folder to EFI- to use legacy.

     

    The setting in the "Permit UEFI boot mode" under Main -> Flash device -> Syslinux Configuration will automatically rename this folder accordingly.

     

    Dude, that work! - system is completely up and running!! The simple things right?

     

    Thank you very much bonienl, you're a rock-star! You just saved me quite a bit of work.

  5. Hi there. 

    I think that I have a relatively easy problem to solve, it's my inexperience that's keeping me from solving it.  I think the larger community might have a quick and easy answer for me to get back on track. 

     

    I've got a pretty good setup, and since it doesn't involve any hardware and configuration optioning, i don't think I need to go into that.  I will if someone thinks its needed.  here's my situation; I had a full configuration deployed - array was solid, containers were up, had a win10 vm with a rx580 pass-through + a Mojave VM with the same GPU done and done.  Everything was working soundly.  Then I went and edited my syslinux while in unraid and removed the "permit UEFI" tick and rebooted.  Hands down, in one shot my system wont boot anymore. 

     

    I've tried to change the setting on the MB to allow bios booting but it's just not working.  I think that's because when I setup in the boot creator I made it so it would use UEFI to begin with.  I know I can just run the installer again and rebuild the USB drive and again make it go UEFI but I was hoping that there was an simple way of making it do that without reformatting it and rebuilding the OS on the USB.

     

    I pulled sandisk USB and dropped into my personal rig, I can browse and see all the files, edit the syslinux, view others.  I just havent been able to find the UEFI option there.  Does someone know where I can go and enable UEFI on the USB stick itself?  If not, I'll burn through the hours it took me to build and get everything dialed in again.  Just thought I'd ask first before I lose a half a day rebuilding my server again.

     

    Thanks.

     

    As a post script, I'm super impressed with unraid, the meshing of array, vm's and containers all in one... I shut down 2 other systems in my home because of this all in one option.  Love it!

×
×
  • Create New...