highgear

Members
  • Posts

    42
  • Joined

  • Last visited

Posts posted by highgear

  1. 15 minutes ago, JorgeB said:

    If you used /dev/sdX instead of /dev/mdX parity would not be updated, so a few sync errors would be normal, suggest running a correcting check.

    Ok running a correcting check now. I've run a number of parity checks since running XFS repair on disk 6 FYI.

  2. Hello all,

     

    I just had a scheduled non-correcting parity check finish up that returned a fairly large number of errors in the GUI, around 800. I'm not aware of any unclean shutdowns recently. All my disks look ok to me as far as SMART data. I did have some filesystem corruption on a disk a while back that I thought was fixed with XFS repair (disk 6).

     

    What's the best course of action at this point? I'm attaching my diagnostics. I haven't shutdown the server since the parity check FYI.

     

    Thanks in advance for the help. 

    tower-diagnostics-20231205-0924.zip

  3. 13 hours ago, itimpi said:

    That just means that parity agrees at the bit level with what is on the drives, not that there is no corruption at the file system level.   I would recommend running a file system check (and repair if needed) on all drives.


    Ok thanks for explaining. So what command do you recommend using first? Just -n?

     

    Run this on all drives in maintenance mode, even parity? Thanks again.

  4. 12 hours ago, JorgeB said:

    If you're using ECC RAM unlikely that the problem is there, any unclean shutdowns? Could also be something in the storage subsystem.


     

    Ok I am using ECC so good to know. No unclean shutdowns in at least 3-4 months. I thought as long as parity checks came back ok everything was fine?
     

    What would I look for in the storage subsystem? Thanks

  5. 21 hours ago, JorgeB said:

    Parity cannot help with filesystem corruption, assuming it's in sync it would rebuild the same thing.

    Gotcha that's what I thought. How can I prevent this in the future? This is the second time I've lost a significant amount of data due to filesystem corruption. I recently swapped to all new SATA cables but I suppose the damage could have already occurred.

     

    I use a Supermicro mb and a solid Seasonic PSU w/ server grade RAM. Should I run memtest?

  6. 23 minutes ago, JorgeB said:

    Try on the CLI, array must still be started in maintenance mode:

     

    xfs_repair -v /dev/md6

     

     

    Thanks! Ok that ran and completed without any errors. Now that disk has 1.2+TB free whereas before I'd guess it only had <200gb free. There's also a bunch of files in a new lost+found folder.

     

    Should I continue using this disk? Is there anything I can do to recover the lost files assuming I don't have a backup of this disk? Anything I should do moving forward to prevent this? 

  7. Hello, I noticed the other day that a number of my media files were inaccessible within Plex. I investigated and found that these certain files no longer showed at all within the media folder. After looking into it I found that disk6 shows in midnight commander in red as "?disk6" and when I click on it I get a message saying "Cannot read directory contents". All the other disks appear to be functioning normally.

     

    Does this mean that disk6 is corrupted? What could have caused this/what can be done to prevent it in the future? If so, am I able to remove the drive and rebuild using parity data on a new or even the same disk to recover my files?

     

    If the disk is corrupted, why would the disk still show as normal/green from within Unraid?

     

    Thanks for the help any and all assistance is appreciated! I've attached my diagnostics I just pulled.

     

    Edit:

     

    I reviewed the logs and I keep seeing these repeated, among other errors:

     

    emhttpd: error: get_fs_sizes, 6081: Input/output error (5): statfs: /mnt/user/downloads

    sys_disk_free: VFS disk_free failed. Error was : Input/output error

    smbd[25122]:   chdir_current_service: vfs_ChDir(/mnt/user/Hyperspin) failed: Input/output error.

     

    I'm thinking I need to run xfs_repair on disk6 but I want to confirm there's nothing I should do first. If I need to run xfs_repair, I should be using this command correct?

     

    xfs_repair -v /dev/md6

     

    tower-diagnostics-20230425-1300.zip

  8. 5 hours ago, itimpi said:

    The general rule of thumb is something like twice the size of the largest file you normally expect to write to the share.   Another thing to think about is that if for any reason you get file system corruption on the drive it is a good idea to have something like 10-20 GB free for the check/repair process to use. 

    Ok thanks that makes sense.

     

    4 hours ago, trurl said:

    Maybe this?

    dynamix.cache.dirs.plg - 2023.02.04  (Up to date)
    

     

    Ok cool I’ll stop that in the future. Chugging along at 160MB now so looking good. Thanks for the help everyone.

  9. 39 minutes ago, JorgeB said:

    SMART looks good, assuming the emulated disk contents look correct you can rebuild on top.

     

    https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

     

    Ok sounds good, thanks! Rebuilding onto the drive now. I'm currently only getting speeds of around 50 MB/sec on the rebuild though. Docker is disabled and there shouldn't be any other reads/writes happening that I'm aware of. For parity checks I normally get 100MB+ speeds. Any thoughts?

     

     

    10 minutes ago, trurl said:

    Can't tell from these latest diagnostics since the array wasn't started. Previous diagnostics had emulated disk3 mostly full. Some other disks mostly full too.

     

    So after you rebuild, you need to plan for expansion.

     

    Thanks yeah I have plenty of extra space currently but my allocation settings are a bit wonky and always fill my drives up more than I'd like. What's a reasonable amount of minimum free space on each drive you like to see?

  10. 1 minute ago, trurl said:

    Just to make sure we aren't missing anything, why do you not have anything assigned as disks 1, 2?

     

    Sure. I just recently shrunk my array and removed a few drives. I used the newer process and used the user script to clear the drives and then removed from the array. There was a parity check completed with 0 errors after. This was a few weeks ago and those drives are under unassigned devices.

  11. Hello all,

     

    I just discovered that one of my drives is disabled and there are 2 errors on the drive. I suspect it's a cable issue but I'm not 100%. Everything is properly seated. I do have some new cables that I can swap out as well as cold pre cleared disks if needed. I was considering replacing the cable and rebuilding the drive onto itself. Any thoughts here? I've attached my diagnostics.

     

    Thanks in advance I appreciate it!

     

    tower-diagnostics-20230213-0843.zip

  12. 19 hours ago, xthursdayx said:

    I'll contact Steef about it to make sure he's planning to update the core image. If not, I'll roll my own and update here. I'll report back here once I know though. 

     

    Edit: the necessary libraries are already installed in this image, so no changes are necessary. This has already been tasted by someone running the beta update. Please see this Github issue for more info. Cheers!


    Thanks for the reply, great news!

  13. @xthursdayxThanks for the info. I did a little more tinkering just now and got it working using my old files! The steps that worked for me are I deleted the roonserver folder from appdata. Then I reinstalled from Community Apps using the appropriate directory mapping. This gave me a fresh install and I verified it was working. Then I stopped the docker and copied over all the files/folders as described by you on page 5 into the appropriate locations in the roonserver folder. Prior to copying from my backup I deleted the fresh install files/folders. I then restarted the docker and it worked! Everything appears to be intact and I'm initiating a new back up through Roon.

     

    I think the issue was me running the newperms command last time. Thanks for your work on this docker, looking forward to no update issues in the future!

  14. Hello, I'm trying to move from ronch/roon-server version of this docker to xthursdayx's version discussed here to fix the updating issues. I moved over all of the directories like discussed on page 5. I also ran the newperms command on the base directory of the roonserver app data directory.

     

    After installing the new xthursdayx docker and setting all the directories, I can't get it to run. Here's what the log for the docker says:

     

    /run.sh: line 8: /app/RoonServer/start.sh: Permission denied

     

    Any ideas on how to fix this? I assume I need to change the file/folder permissions somehow but I'm not sure what to change. Before running the newperms command on the old base directory I did notice the top directory originally had the owner:group set root:root. Then the application directory was set to nobody:users, Appliance/RoonMono/Server/VERSION/check.sh/start.sh set to guest:Unknowngroup, library to the nobody:users, RAATServer, RoonGoer, RoonServer set to root:root. Not sure what to make of all the different file/directory permissions in my original setup. Thanks in advance for the help.

  15. 33 minutes ago, johnnie.black said:

    I just run chkdsk on my desktop and it's fixed

     

    Forgot to mention, now when doing an upgrade I look a the syslog, if I see lines similar to:

    
    xz decompression failed, data probably corrupt

    If this happens I know it won't reboot properly and chkdsk is needed.

    Ok thanks for the info. I may just wait until the next update to try upgrading again.

  16. 3 minutes ago, johnnie.black said:

    I have some suspicions that from a while back something changed with the flash drive that makes it corrupt more often, I never used to have issues upgrading and last releases it happens some times (with multiple servers) that after an upgrade (either using the GUI or by manually copying the files) I need to run chkdsk, or it will panic on boot, other times get a lot of corrupted line scrolling on the server, but like mentioned it's only a suspicion.

    So if this is what is occurring here for me, what would the suggested fix be? I see some have had luck simply rebooting the server again which I actually didn't try (only because I had to physically hard power it down). For reference, this actually happened twice. I upgraded from 6.8 to 6.8.1, rebooted, it hung and was inaccessible. Then I reverted to 6.8. I then attempted this process again with the same result and had to pull the USB and revert a second time.