Warrentheo

Members
  • Posts

    311
  • Joined

  • Last visited

Posts posted by Warrentheo

  1. From his lsscsi.txt file:

    [0:0:0:0]    disk    JetFlash Transcend 16GB   1100  /dev/sda   /dev/sg0 
      state=running queue_depth=1 scsi_level=7 type=0 device_blocked=0 timeout=30
      dir: /sys/bus/scsi/devices/0:0:0:0  [/sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:08.0/0000:05:00.3/usb4/4-4/4-4:1.0/host0/target0:0:0/0:0:0:0]
    [6:0:0:0]    disk    ATA      WDC WDS500G2B0A  90WD  /dev/sdb   /dev/sg1 
      state=running queue_depth=32 scsi_level=6 type=0 device_blocked=0 timeout=30
      dir: /sys/bus/scsi/devices/6:0:0:0  [/sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:0a.0/0000:07:00.0/ata6/host6/target6:0:0/6:0:0:0]
    [13:0:0:0]   disk    ATA      SanDisk SSD G5 B 00WD  /dev/sdc   /dev/sg2 
      state=running queue_depth=32 scsi_level=6 type=0 device_blocked=0 timeout=30
      dir: /sys/bus/scsi/devices/13:0:0:0  [/sys/devices/pci0000:00/0000:00:08.3/0000:0c:00.0/ata13/host13/target13:0:0/13:0:0:0]
    NVMe module may not be loaded

    and his df.txt

    Filesystem      Size  Used Avail Use% Mounted on
    rootfs           16G  610M   15G   4% /
    tmpfs            32M  652K   32M   2% /run
    devtmpfs         16G     0   16G   0% /dev
    tmpfs            16G     0   16G   0% /dev/shm
    cgroup_root     8.0M     0  8.0M   0% /sys/fs/cgroup
    tmpfs           128M  244K  128M   1% /var/log
    /dev/sda1        15G  215M   15G   2% /boot
    /dev/loop0      9.2M  9.2M     0 100% /lib/modules
    /dev/loop1      7.3M  7.3M     0 100% /lib/firmware

    so it doesn't look like they are detected...  Just the SSD's

  2. Ah sorry, I knew it defaulted to 2 Parity, but I thought you could select more from a drop down underneath the 2 like you can with the data drives...  Haven't been able to shut my array down to double check...  Maybe you would want to replace and rebuild one drive at a time then, that way you are not without parity protection...  Just in case...

  3. I think you can just shutdown the pool, and add as many parity drives as you wish, if I remember correctly...  Once the new parity drives are up and running, you can use the "New Config" tool in the tools pane to take drives out of the pool, and just check the "Parity drive is already up to date" to turn the pool back on...  afterward, you can use the old drives as you see fit...

  4. Update, 10 days later... It appears that the data for every drive in the pool somehow got very corrupted... I am very miffed about this (around 11 TB possibly lost), but I want to be clear ahead of time, I am NOT expressing anger at UnRaid or Lime Technology, since I am sure about 95% of this is my fault somehow... My Main goals here are to figure out how this happened in the first place, and if possible find a way to keep it from happening to someone else... I realize there are several steps where I got punch happy and did things I even knew at the time I should not be doing... That said, total array corruption is a pretty major error even with it probably being mostly my fault, and I want to prevent that from happening to someone else...

     

    I was trying to reset the cache pool back to empty when I got stuck trying to reboot... The data corruption occurred after I forced a hard reboot after the soft reboot seemed to fail for multiple minutes... Is some part of the array/pool file table stored on the cache drive? Did wiping the cache pool somehow erase file table for the pool? I upgraded the SSD's in my cache pool around August of 2019, and that wipe and replacement of the pool went without a hitch...

     

    My array after the corruption went from around 28 shares and around 11 TB of data to only 2 shares left, and about 40 GB of data left... This weird partial corruption is very difficult for me to figure out, because every drive appears to have these same 2 shares left, they are not spread evenly with the 4 drives, but the data corruption was not total, somehow 5% of the data remained...?

     

    The XFS recovery software I am using says that nearly the entire drive ended up in the "Lost+Found" folder, with some of it apparently recoverable...  I think I will be able to recover the large media files that way, mostly the BluRay Backup Dumps...  Figuring out the small files is going to be very tedious since a decent part of the array was a Steam Games Folder, so a huge portion of the small files are going to be meaningless game files...

     

    Is this somehow caused by hardware failure of the CPU/RAM/Mobo? I suppose I need to shutdown and do a full long form hardware scan just to make sure...

     

    I realize now that forcing the hard shutdown when the kernel clearly didn't want to was a mistake, but I would expect minor Bit Rot, not near total corruption from that sort of thing...

     

    Currently I still trying to do a post mortem of the issue, I have been trying to do data recovery tools, but they take like 2 days to run on a 6TB drive, so that has been slow... So far the best solution I have found is manual data recovery where I "UnDelete" the entire drive, then manually rename a few hundred thousand files from memory, and pray that the files are whole...

     

    Any feedback or assistance on this process, or even a pat on the back and commiseration would be appreciated...

  5. 4 hours ago, itimpi said:

    Have you tried browsing the disks from the Unraid GUI (via the folder icon on the right of every drive) to see if the files/folders you expect are actually there?

     

    Both gui, SSH and even local direct all show the same info, the data isn't in the root of the drives anymore,

     

    4 hours ago, trurl said:

    All the files are under /mnt/disk1, /mnt/disk2, ... , /mnt/cache, and also under /mnt/user because

    If the files still exist, then they are in the user shares somewhere, maybe you moved them

    No files were moved in the main pool ( I just emptied the last few files off the cache on it )

    Files were there before the hard reboot, and were all gone afterware

     

    Thank you for the replies, I am out of school, and will get home and try some of these suggestions and see where we are at...

     

  6. The cache was used primarily as a cache, and VM virtual disks, with some other minor knick knack files thrown in...  98% of the data on the server was under the main pool stored on magnetic media, only 1 share was set to "Cache Only"...  I disabled VM machines, disabled Docker, deleted their associated image files, and had moved all other files off of the cache...  it was reporting empty when I attempted to stop the pool, remove the cache drives with the "New Config" option (and only the cache drives were dropped) and perform a blkdiscard on the cache drives, I then attempted the reboot, which locked up the computer... 

     

    Those shares were there before this reboot, and after the hard reboot, it appears every drive was somehow affected and removed the files...  The drives still report the correct size of data stored on them, so I don't think it is gone, just no longer accessible through the normal shares...

     

    Is there a place where the files are stored that I can use to try and rebuild the pool, even if I have to do it manually...?

  7. 3 hours ago, johnnie.black said:

    I don't see anything out of the ordinary on the logs, are the shares not available locally and on the LAN or just on the LAN?

    I have checked the individual /mnt/disk?/ locations, they also reflect the same 2 shares that are reflected under /mnt/user/

    I am trying to find a way to restore the original files if possible...  or possibly find where the files got lost to and resetup the pool and copy things back manually...

  8. 1 hour ago, trurl said:

    Looks like your cache pool has been formatted.

    That occured just before the issue and was intentional, but I don't think it is related to the actual issue...  I had corrupted the cache pool about a week ago, and repaired the issue, however I ended up deciding the repair was less than ideal and decided to format are restore from backup the data I had on it...  

     

    The format worked as intended and I was getting ready to perform the restore, I issued a reboot command, when the system locked up for almost 15 minutes (longest previous reboot was around 5 minutes)  I was unable to shutdown with any method, so I was forced to hard shutdown the system, the pool appeared to be corrupted after this reboot...  (Empty cache pool appears to be working correctly)

  9. Hello, I need some help, I was forced to hard reboot my server (I run a windows gaming system on top of it as well). when the system returned, only 2 of the 25 or so file shares are now present...

     

    I have booted the array into maintenance mode, and did an xfs_repair on each drive in the array, but this didn't seem to help...  I know a bit about linux, but I also know that I should ask for help before proceeding...  The drives appear to still have the correct amount of space on them consumed, so I think the data is still there, I just can't seem to find a way to safely access it...

     

    Your assistance is greatly appreciated here...

    qw-diagnostics-20200712-1912.zip

  10. I just got notified that the plugin is not known to the community apps plugin or the fix common problems plugin, and when I search for it now, it doesn't show up in the list of plugins that can be downloaded....  When I search for "NVIDIA" in the community apps plugin, it no longer shows up...  Rebooting the server, and uninstalling the GPU statistics plugin changed nothing...

     

    Did this plugin just go unsupported? or is there a glitch in the community apps plugin?  if it is now unsupported, I will be very sad to see it go... 😢

     

    Still thank you to the author no matter which is the case...

  11. 4 minutes ago, johnnie.black said:

    When adding an extra device two a single device pool it will always convert to raid1, you can then convert to raid0, after that it will stay raid0 even if you add more devices.

    It is a 2Tb pool, 80% full, and was originally in raid0, the only issue it had is being booted once with one drive missing...  I am trying to see if it is possible to start it again in raid0, or is the array already corrupted?

  12. Just now, johnnie.black said:

    Do you mean raid1? Pool was converted to single profile because it was started with a single device, you can now add back the original device and it will be convert to raid1, but best to wipe old device before adding.

    That is the correct, the question is is it possible to skip the conversion to the normal raid1 mode, and leave it in the raid0 mode...

  13. Hello, I recently moved locations, and during my move, on of the cache drives that I have in a raid0 btrfs pool came disconnected... Array got started with only one of the 2 drives... After troubleshooting, I discovered the issue, and stopped the array, on the main screen the drive had become "unassigned"... When I re-add the drive, UnRaid now warns me that the pool will be formatted when the array is started (I think it is trying to boot back in normal mirror raid mode, and then will make me switch it back to a raid0 array)

     

    Is there a way to just re-add the drive in the original raid0? or am I forced to format the drives?

    qw-diagnostics-20200310-0910.zip

  14. On 11/26/2019 at 11:59 AM, mbezzo said:

    Hey, I've got that board (no wifi version) with a placeholder (3200G) CPU until I can track down a 3950x - only have Windows on it right now, but if you can tell me how to list the groups from Windows - happy to post em later!

    I don't think Windows even thinks in the terms of IOMMU groups, most likely you will need to temp boot off of a Linux Live USB/DVD image...  The commands are:

    for d in /sys/kernel/iommu_groups/*/devices/*;
    do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf '    IOMMU Group %s ' "$n"
    lspci -nns "${d##*/}"
    done

    You have the setup I am looking into getting as well, so that info would be useful 😄

  15. I am about to swap out my M.2 Raid-0 and upgrade the drives in prep to swapping to X570/PCI-e 4.0...  To that end, I need to completely remove all support for the cache from all shares and VM's, migrate everything to the main spinning array so I can nuke the current array so it can be sold, then setup the new array like a new setup...

     

    I want to make the move without any data mishaps, and was hoping to get some feedback on the steps needed to make sure that the move goes smoothly...  Any tips would be appreciated...  I am sorry if this is repeated somewhere else, I have been unable to see anything else related to this out there...

  16. Just installed 6.7.0-rc7...  It has a blank button right next to the power button on the webui...  When pressed, this adds a "flyout" effect with nothing in it...  Removing the System Buttons plugin removes the button...

     

    image.thumb.png.981e9a3b3c4788ef07023d71660bced1.png

  17. You have both systems setup to consume nearly 100% of available physical ram, unfortunately lock ups are the expected behavior when you do that...  It is not actually locking up, it is trying to use your swapfile like RAM, because the fact that it is a swapfile is not told to the VM's...  This can cause all kinds of processes to have a timeout cascade...  If you really need to not upgrade the RAM on your system, then set the VM's to use much less physical RAM, and setup swapfiles in the individual VM's...  However there is no real substitute for just having the free RAM to begin with...

  18. Everything on Linux is a file, its very different from windows that way...  If you don't want to point it at:

    /mnt/cache/domains/Windows/vdisk1.img

    then point it at something like this instead:

    /dev/disk/by-id/ata-Hitachi_HDS5C3030ALA630_MJ1321YNG06KNA

    this points it to the devices folder, selects a block device disk, and this is important, it selects it "by-id" so that if you ever change your partitions or drives it doesn't break where the VM is pointing...  there are others, you can select by several others, check out what you have in your /dev/disk/ folder on the UnRaid host...

     

    Be Careful, Linux will happily let you specify the wrong one and let you overwrite an important data drive, so double-check you got the correct one...

  19. Here is a review I found on the amazon page for your motherboard:

    https://www.amazon.com/Supermicro-X9DAI-LGA2011-USB3-0-Motherboard/dp/B007HVZBWM#customerReviews

    Bottom line, this appears to be a motherboard issue...

     

    Quote

    Good:
    Outstanding features, PCIe 3.0, a good number of slots, Very stable if you don't mess with BIOS to try and get Windows 8/Server 2012 running.

    Facts:
    EATX form factor = large size, while I have dealt with large size before, this just seems bulky, and the need to shoe-horn the MB into a case. Also only 1/2 of the EATX mounting holes lined up, so I had to use the good old, white motherboard standoffs from before the industry started to standardize.

    BAD BAD BAD:
    The BIOS is the down fall of this motherboard, can not install Windows 8/Server 2012, even with all the options turned off in BIOS, and no BIOS update is out yet, however Windows 2008R2 and Windows 7 install just fine, with all options enabled in BIOS

    PCIe 3.0 doesn't work, if you have a PCIe Video card (Nvidia) 3.0 it will not boot, and if it does like my Windows 7 install did, it somehow uninstalled the video driver, and I was unable to get video back in windows 7 (had to reinstall). Forcing BIOS to PCIe 2.0 everything works just fine, maybe it will work when LGA2011 Ivy Bridge comes out, But Sandy Bride E, forget about it.

    Only discovered the BIOS problems because I was trying to get Windows 2012 to load:
    BIOS seems to be a bit buggy, if something is not set just right it can hang in BIOS, reboot after reboot until it boots. Also I mistakenly changed a memory setting, and now the darn thing just hangs at memory detection, reguardless of what type of memory is installed, with no memory it beeps, just like it should, memory installed it gets stuck. Been working with tech support, swap this swap that, it should work, well it doesn't, well its your power supply then, no its not (next topic), (yes I tryed CMOS Clear), sending it back for RMA

    Yes tech changes every day, but BIOS is 20+ years of experiance, wow, I mean I can't remember having a real BIOS issue in the last 10 years, that clear CMOS didn't fix.

    Power Supply:
    This motherboard is very very picky about power, would not boot with most of my power supplys that I had on hand, PC Power and Cooling 1000w, Antec 800w, seasonic 750w.

    Stangly enough Vantec 650w what the heck? its so old it still has the old style flat connecors for pre atx motherboards. But this one worked just fine, well, until I was stupid, and spun up my GPU Nvidia 660, playing a game, after about 3 hours, it said no more. Replaced the Blown fuse, nope, still dead.

    Bought a brand new Cosair 850w, thinking the others were only EPS 2.3 and not 2.31 as the motherboard manual said it needed to be, and also wanted 2 native 8 pin EPS connectors. Also as the 650w seemed to be amost enough, surely 850w would be enough, it worked for about a week and died. Next bought a Cosair AX1200w seems to be big enough to handle it.

    There is a 3.3 Version of your BIOS (you currently have 3.2) this may or may not help...

    https://www.supermicro.com/products/motherboard/Xeon/C600/X9DAi.cfm (Then click update BIOS)