Jump to content

Warrentheo

Members
  • Posts

    311
  • Joined

  • Last visited

Posts posted by Warrentheo

  1. From his lsscsi.txt file:

    [0:0:0:0]    disk    JetFlash Transcend 16GB   1100  /dev/sda   /dev/sg0 
      state=running queue_depth=1 scsi_level=7 type=0 device_blocked=0 timeout=30
      dir: /sys/bus/scsi/devices/0:0:0:0  [/sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:08.0/0000:05:00.3/usb4/4-4/4-4:1.0/host0/target0:0:0/0:0:0:0]
    [6:0:0:0]    disk    ATA      WDC WDS500G2B0A  90WD  /dev/sdb   /dev/sg1 
      state=running queue_depth=32 scsi_level=6 type=0 device_blocked=0 timeout=30
      dir: /sys/bus/scsi/devices/6:0:0:0  [/sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:0a.0/0000:07:00.0/ata6/host6/target6:0:0/6:0:0:0]
    [13:0:0:0]   disk    ATA      SanDisk SSD G5 B 00WD  /dev/sdc   /dev/sg2 
      state=running queue_depth=32 scsi_level=6 type=0 device_blocked=0 timeout=30
      dir: /sys/bus/scsi/devices/13:0:0:0  [/sys/devices/pci0000:00/0000:00:08.3/0000:0c:00.0/ata13/host13/target13:0:0/13:0:0:0]
    NVMe module may not be loaded

    and his df.txt

    Filesystem      Size  Used Avail Use% Mounted on
    rootfs           16G  610M   15G   4% /
    tmpfs            32M  652K   32M   2% /run
    devtmpfs         16G     0   16G   0% /dev
    tmpfs            16G     0   16G   0% /dev/shm
    cgroup_root     8.0M     0  8.0M   0% /sys/fs/cgroup
    tmpfs           128M  244K  128M   1% /var/log
    /dev/sda1        15G  215M   15G   2% /boot
    /dev/loop0      9.2M  9.2M     0 100% /lib/modules
    /dev/loop1      7.3M  7.3M     0 100% /lib/firmware

    so it doesn't look like they are detected...  Just the SSD's

  2. Ah sorry, I knew it defaulted to 2 Parity, but I thought you could select more from a drop down underneath the 2 like you can with the data drives...  Haven't been able to shut my array down to double check...  Maybe you would want to replace and rebuild one drive at a time then, that way you are not without parity protection...  Just in case...

  3. I think you can just shutdown the pool, and add as many parity drives as you wish, if I remember correctly...  Once the new parity drives are up and running, you can use the "New Config" tool in the tools pane to take drives out of the pool, and just check the "Parity drive is already up to date" to turn the pool back on...  afterward, you can use the old drives as you see fit...

  4. Update, 10 days later... It appears that the data for every drive in the pool somehow got very corrupted... I am very miffed about this (around 11 TB possibly lost), but I want to be clear ahead of time, I am NOT expressing anger at UnRaid or Lime Technology, since I am sure about 95% of this is my fault somehow... My Main goals here are to figure out how this happened in the first place, and if possible find a way to keep it from happening to someone else... I realize there are several steps where I got punch happy and did things I even knew at the time I should not be doing... That said, total array corruption is a pretty major error even with it probably being mostly my fault, and I want to prevent that from happening to someone else...

     

    I was trying to reset the cache pool back to empty when I got stuck trying to reboot... The data corruption occurred after I forced a hard reboot after the soft reboot seemed to fail for multiple minutes... Is some part of the array/pool file table stored on the cache drive? Did wiping the cache pool somehow erase file table for the pool? I upgraded the SSD's in my cache pool around August of 2019, and that wipe and replacement of the pool went without a hitch...

     

    My array after the corruption went from around 28 shares and around 11 TB of data to only 2 shares left, and about 40 GB of data left... This weird partial corruption is very difficult for me to figure out, because every drive appears to have these same 2 shares left, they are not spread evenly with the 4 drives, but the data corruption was not total, somehow 5% of the data remained...?

     

    The XFS recovery software I am using says that nearly the entire drive ended up in the "Lost+Found" folder, with some of it apparently recoverable...  I think I will be able to recover the large media files that way, mostly the BluRay Backup Dumps...  Figuring out the small files is going to be very tedious since a decent part of the array was a Steam Games Folder, so a huge portion of the small files are going to be meaningless game files...

     

    Is this somehow caused by hardware failure of the CPU/RAM/Mobo? I suppose I need to shutdown and do a full long form hardware scan just to make sure...

     

    I realize now that forcing the hard shutdown when the kernel clearly didn't want to was a mistake, but I would expect minor Bit Rot, not near total corruption from that sort of thing...

     

    Currently I still trying to do a post mortem of the issue, I have been trying to do data recovery tools, but they take like 2 days to run on a 6TB drive, so that has been slow... So far the best solution I have found is manual data recovery where I "UnDelete" the entire drive, then manually rename a few hundred thousand files from memory, and pray that the files are whole...

     

    Any feedback or assistance on this process, or even a pat on the back and commiseration would be appreciated...

  5. 4 hours ago, itimpi said:

    Have you tried browsing the disks from the Unraid GUI (via the folder icon on the right of every drive) to see if the files/folders you expect are actually there?

     

    Both gui, SSH and even local direct all show the same info, the data isn't in the root of the drives anymore,

     

    4 hours ago, trurl said:

    All the files are under /mnt/disk1, /mnt/disk2, ... , /mnt/cache, and also under /mnt/user because

    If the files still exist, then they are in the user shares somewhere, maybe you moved them

    No files were moved in the main pool ( I just emptied the last few files off the cache on it )

    Files were there before the hard reboot, and were all gone afterware

     

    Thank you for the replies, I am out of school, and will get home and try some of these suggestions and see where we are at...

     

  6. The cache was used primarily as a cache, and VM virtual disks, with some other minor knick knack files thrown in...  98% of the data on the server was under the main pool stored on magnetic media, only 1 share was set to "Cache Only"...  I disabled VM machines, disabled Docker, deleted their associated image files, and had moved all other files off of the cache...  it was reporting empty when I attempted to stop the pool, remove the cache drives with the "New Config" option (and only the cache drives were dropped) and perform a blkdiscard on the cache drives, I then attempted the reboot, which locked up the computer... 

     

    Those shares were there before this reboot, and after the hard reboot, it appears every drive was somehow affected and removed the files...  The drives still report the correct size of data stored on them, so I don't think it is gone, just no longer accessible through the normal shares...

     

    Is there a place where the files are stored that I can use to try and rebuild the pool, even if I have to do it manually...?

  7. 3 hours ago, johnnie.black said:

    I don't see anything out of the ordinary on the logs, are the shares not available locally and on the LAN or just on the LAN?

    I have checked the individual /mnt/disk?/ locations, they also reflect the same 2 shares that are reflected under /mnt/user/

    I am trying to find a way to restore the original files if possible...  or possibly find where the files got lost to and resetup the pool and copy things back manually...

  8. 1 hour ago, trurl said:

    Looks like your cache pool has been formatted.

    That occured just before the issue and was intentional, but I don't think it is related to the actual issue...  I had corrupted the cache pool about a week ago, and repaired the issue, however I ended up deciding the repair was less than ideal and decided to format are restore from backup the data I had on it...  

     

    The format worked as intended and I was getting ready to perform the restore, I issued a reboot command, when the system locked up for almost 15 minutes (longest previous reboot was around 5 minutes)  I was unable to shutdown with any method, so I was forced to hard shutdown the system, the pool appeared to be corrupted after this reboot...  (Empty cache pool appears to be working correctly)

  9. Hello, I need some help, I was forced to hard reboot my server (I run a windows gaming system on top of it as well). when the system returned, only 2 of the 25 or so file shares are now present...

     

    I have booted the array into maintenance mode, and did an xfs_repair on each drive in the array, but this didn't seem to help...  I know a bit about linux, but I also know that I should ask for help before proceeding...  The drives appear to still have the correct amount of space on them consumed, so I think the data is still there, I just can't seem to find a way to safely access it...

     

    Your assistance is greatly appreciated here...

    qw-diagnostics-20200712-1912.zip

  10. I just got notified that the plugin is not known to the community apps plugin or the fix common problems plugin, and when I search for it now, it doesn't show up in the list of plugins that can be downloaded....  When I search for "NVIDIA" in the community apps plugin, it no longer shows up...  Rebooting the server, and uninstalling the GPU statistics plugin changed nothing...

     

    Did this plugin just go unsupported? or is there a glitch in the community apps plugin?  if it is now unsupported, I will be very sad to see it go... 😢

     

    Still thank you to the author no matter which is the case...

  11. 4 minutes ago, johnnie.black said:

    When adding an extra device two a single device pool it will always convert to raid1, you can then convert to raid0, after that it will stay raid0 even if you add more devices.

    It is a 2Tb pool, 80% full, and was originally in raid0, the only issue it had is being booted once with one drive missing...  I am trying to see if it is possible to start it again in raid0, or is the array already corrupted?

  12. Just now, johnnie.black said:

    Do you mean raid1? Pool was converted to single profile because it was started with a single device, you can now add back the original device and it will be convert to raid1, but best to wipe old device before adding.

    That is the correct, the question is is it possible to skip the conversion to the normal raid1 mode, and leave it in the raid0 mode...

  13. Hello, I recently moved locations, and during my move, on of the cache drives that I have in a raid0 btrfs pool came disconnected... Array got started with only one of the 2 drives... After troubleshooting, I discovered the issue, and stopped the array, on the main screen the drive had become "unassigned"... When I re-add the drive, UnRaid now warns me that the pool will be formatted when the array is started (I think it is trying to boot back in normal mirror raid mode, and then will make me switch it back to a raid0 array)

     

    Is there a way to just re-add the drive in the original raid0? or am I forced to format the drives?

    qw-diagnostics-20200310-0910.zip

  14. On 11/26/2019 at 11:59 AM, mbezzo said:

    Hey, I've got that board (no wifi version) with a placeholder (3200G) CPU until I can track down a 3950x - only have Windows on it right now, but if you can tell me how to list the groups from Windows - happy to post em later!

    I don't think Windows even thinks in the terms of IOMMU groups, most likely you will need to temp boot off of a Linux Live USB/DVD image...  The commands are:

    for d in /sys/kernel/iommu_groups/*/devices/*;
    do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf '    IOMMU Group %s ' "$n"
    lspci -nns "${d##*/}"
    done

    You have the setup I am looking into getting as well, so that info would be useful 😄

  15. I am about to swap out my M.2 Raid-0 and upgrade the drives in prep to swapping to X570/PCI-e 4.0...  To that end, I need to completely remove all support for the cache from all shares and VM's, migrate everything to the main spinning array so I can nuke the current array so it can be sold, then setup the new array like a new setup...

     

    I want to make the move without any data mishaps, and was hoping to get some feedback on the steps needed to make sure that the move goes smoothly...  Any tips would be appreciated...  I am sorry if this is repeated somewhere else, I have been unable to see anything else related to this out there...

  16. Just installed 6.7.0-rc7...  It has a blank button right next to the power button on the webui...  When pressed, this adds a "flyout" effect with nothing in it...  Removing the System Buttons plugin removes the button...

     

    image.thumb.png.981e9a3b3c4788ef07023d71660bced1.png

  17. 10 hours ago, bastl said:

    @InfInIty Just a hint from my side. For me the TechPowerUp BIOSes never worked for 3 different Nvidia cards (780ti, 1050ti, 1080ti). I always had to dump the BIOS myself. Can't find the thread but a couple days ago a user had the same issue and he dumped his own BIOS and guess what happened? Issue solved 😄

    To my knowledge, only GTX 10 series and above need bios passthrough...  It was not until the 10 series that they started encrypting the BIOS... That is why you can get a program like:

    https://www.techpowerup.com/download/maxwell-ii-bios-tweaker/

    or

    https://www.techpowerup.com/download/kepler-bios-tweaker/

    For the GTX 9 series and below, but will never see one again for the GTX 10 series and above...

     

    This encryption is what is causing the issues with passthrough, and why BIOS's need to be passed through...

  18. 17 minutes ago, InfInIty said:

    So I watched his second video on this.

    I downloaded the bios from tech power up, removed the "header" with a hex editor

    I can try it with the full file and see if that improves things.

     

    Should I have Hyper-v off or on, i am getting conflicting information on that one.

    In the past it was one of the things that triggered the NVidia driver to notice it was in a VM, and so had to be turned off...  That said I have had mine on for about a year with no issues on my GTX1070...  The RTX card may be different, but I doubt it... For troubleshooting, I would start with it off for now, and try turning it on when everything else is working...

  19. 14 hours ago, InfInIty said:

    So I have added all 4 Devices into the syslinux file

    Downloaded the bios and removed the nvidia header per the SpaceInvader video

    Still same issue.  Code 43

    If you have all the devices correctly passed, than this most likely means that the BIOS file is not correct...

    Ever since the NVidia 10 series and the RTX series, NVidia started encrypting their BIOS and the connection between their drivers and their BIOS...

    I have a GTX1070, and was able to use the SpaceInvader video to just dump my card with no other modification to get it to work...  The RTX cards may be slightly different, but from other reports I have seen, this just looks like the same sort of things that the GTX 10 series people have been having to deal with so I doubt it...

     

    For me, when the BIOS was not correct, but I rebooted the host and auto-launched the Windows VM, it would still work during that boot, but then would not be able to reset the PCIe slot, so if I "rebooted" the VM, it would no longer talk to the card and error 43...  I suspect you are at this stage of the process...

     

×
×
  • Create New...