Jump to content

tr3bjockey

Members
  • Posts

    71
  • Joined

  • Last visited

Posts posted by tr3bjockey

  1. Hello, can anyone help with this?

     

    I aborted the backup because it was getting huge (I had to exclude some things).  After making the changes, it created a backup.  I then tried to delete the bad backup that is +14gb with the "-failed" name and it won't let me delete it.  It has the r-x permissions.  How do I delete this?  I'm using binhex-krusader as a file editor.

    nowritepermissioninfile.jpg.e12a2672236d9ef0811b9560f01f4e98.jpg

  2. On 10/30/2023 at 5:36 AM, Frank1940 said:

    There is a setting designed to do this:

    image.thumb.png.f58c7ccbbae7de31594f87a098b03ab7.png

     

    This setting is one that you can determine by timing the actual time in seconds after you manually click on the STOP button to take the array off line.  (I consider this setting to be the one that is the last thing that can prevent an unclean shutdown.  As I understand, if the array is still is not stopped when this timer expires, Unraid will then force a shutdown and it probably will be an unclean one!) 

     

    There is this section in the Manual:

    https://docs.unraid.net/unraid-os/manual/troubleshooting/#unclean-shutdowns

     

    In that documentation, you will find a number of settings to force things to stop depending on what features Of Unraid you are using.  Notice the 'warning' about Bash scripts and ssh sessions and how to terminate them! 

    Thanks for the tip.  My shutdown time-out was set to 90.  I bumped it up to 120.  I timed stopping the array and it too 45 seconds.  (my cache stats showed 10+gb).  What's weird is that it had been over an hour since I copied a large file to the array (which gets copied first to the ssd raid 1 cache drive, the manually I press move to move it to the mechanical drives.  I assumed that when I stopped the array, it finally dumped the cache of the move to the mechanical drive.  When I restarted the array, the cache stats showed less than 4gb.  Any ideas why it holds the data hostage in cache instead of writing it to the array immediately?

     

    I don't have the server on a UPS, so now I'm even more paranoid about corruption in case of power failure.

    Please correct me if I'm wrong but to make sure that there are no corruption issues, every time I copy data to the array to the array I will have to:

    1.  after copying, press the move to move it from the SSD cache to the mechanical drives.

    2.  after the move is done, stop all dockers.

    3.  once all dockers are stopped, then stop the array and wait for the array operation section to tell me "stopped. configuration valid"

    4.  If I'm not done for the day using the server, restart the array.

    5.  If I'm done for the day, then go to the power menu and do a shutdown.

     

     

    On 11/1/2023 at 2:17 AM, JorgeB said:

    If the previous check was correct do a non correct one now.

    Thank you for responding JorgeB.  I did a non correct one and no errors.  Is there a command besides "stop array" that I can run to flush the cache without needing to stop the array, to save me some steps as detailed above?

  3. 7 hours ago, itimpi said:

    Spinning down the disks will not guarantee a clean shutdown.    You would need to stop the array before the shutdown to guarantee this.

    For next time I'll stop the array before the next shutdown.

     

    1.  Is there anything in the diagnostics that point to anything being wrong?

    2.  Should I perform another parity check? If yes, should I check/uncheck the box?

    3.  When do I know it's safe, to shutdown after I stop the array?

    4.  Is there a setting that I can turn on to tell Unraid to flush the cache instantly without needing to shutdown the array to force it?

    5.  Is there a version of unraid, that is later than my current version, that automatically flushes the cache when a shutdown command is given, and does not power off the server until it's done writing?

  4. On 4/2/2023 at 6:26 PM, JonathanM said:

    I'd go a step further, personally I stop the array before doing a shutdown if at all possible. If the array doesn't stop fairly quickly, it's likely that the shutdown process by itself won't complete cleanly.

     

    On 3/29/2023 at 1:23 AM, JorgeB said:

    Was there any unclean shutdown? In any case everything appears to be fine for now, just keep monitoring.

    The same issue is happening again that happen 7 months ago.  I've been making sure there's clean shutdowns by clicking the spin-down button first, waiting for all the disks to show that they've spin-down, and then clicking the shutdown button.  I still get all of a sudden 4355 errors on a parity check.  The only that has changed since the last parity check on 9/18/23, is that after that parity check, I upgraded from 6.10.3 to 6.12.4.  So this is the first Parity check I've done after upgrading OS.

     

    What do you suggest I do for troubleshooting?

     

    FYI, I don't use VM's.  I just use unraid for plex.  I don't leave the server up 24x7, I only turn it on to backup my PC, add movies to plex, and watch movies on plex.  My disks have power on hours ranging from 2 to 6 months, but load cycles vary from 611, to 25000. 

    1758207713_parityhistory.thumb.jpg.0b7089463014eb54f28d6c4bc1c27206.jpg

    tower-diagnostics-20231029-1102.zip

  5. On 3/29/2023 at 1:23 AM, JorgeB said:

    Was there any unclean shutdown? In any case everything appears to be fine for now, just keep monitoring.

    No unclean shutdowns.  The only thing that comes to mind is that I did a clean shutdown during a parity check without first cancelling the parity check. 

     

    I'm not sure how a clean shutdown proceeds to stop process like a parity check in progress.  Could the shutdown process timeouts be to short and not allow enough time for the parity check process to end cleanly?  Should I cancel parity check first, then do a clean shutdown?

     

    Again, thank you very much for taking the time to assist me with this issue. 🙂

  6. On 3/27/2023 at 2:01 AM, JorgeB said:

    Run a correcting check, then a non correcting one, all without rebooting, then post new diags.

    Thank you very much for your assistance in this.  I appreciate it.  I ran a correcting check last night, shows 0 errors.  Have not shutdown or rebooted.  Now running a non correcting check that completes in 10 hours from now.  Regardless of the result, I will post another diag as instructed.

     

    I've very puzzled that the correcting check that finished this morning shows 0 now.  I'm guessing this might be what you were expecting to happen, or are you as puzzled as I am without further analysis of the new diag after completion?

  7. 22 hours ago, BRiT said:

    I don't know if those drive are supported by WDidle3 or if they have their own program to adjust their behaviors, so do your due diligence to know if that is needed or even supported. But in the past many have had to use that program on WD drives to get their load cycle counts under control.

     

     

    Thanks for the tip!  Some people say they're supported others not. 

  8. I purchased 3 WDC_WD40EZAZ drives a few years ago.  Made one of them my parity drive, and the other 2 my data drive.  Two of the drive are showing around 20 thousand load cycles and the 3rd is only showing 414.  The power on time for the 3 drives are about 9 months total.   It's only used for plex for a few hours a week.

     

    My parity drive is showing excessive load cycles (head parking)

    193    Load cycle count    0x0032    194    194    000    Old age    Always    Never    19907

     

    My data drive is showing high levels also.

    193Load cycle count0x0032193193000Old ageAlwaysNever23004

     

    This data drive is showing normal levels.

    193Load cycle count0x0032200200000Old ageAlwaysNever414

  9. For the past month, I'm seeing thousands of errors reported when I click on the parity check button.   

    Some history...

    The IO CREST Internal 5 Port Non-Raid SATA III 6GB/SJMB585 SI-PEX40139 is installed but no cables are plugged into it.  ( stopped using it before 12/1/22 because I was still getting UDMA CRC errors pointing to bad cables, which were all replaced, and I never bothered removing it from the PCIE slot)

     

    I instead use the JEYI NVMe M.2 to 5 Sata Adapter, Internal 5 Port Non-RAID SATA III 6GB/s plugged into GLOTRENDS M.2 PCIe X1 Adapter that was installed before 12/1/22 and it fixed the UDMA CRC errors I was seeing with the io crest card above.  I even did a complete (100%) parity check 12/23/22 and there were 0 errors.  On 1/2/23, parity check had started, but I cancelled it after 4 minutes.  On 1/9/23, ran another parity check for 10 hours (about 90% done), but I also cancelled it accidentally (due to shut down without knowing a parity check was running).  Still 0 errors.  On 2/12/23 ran a full parity check, got 2252 errors, that I did not noticed was showing in the parity check history.  On 3/12/23, noticed the 2252 errors on the history (never ever had this happen before in the past years), and started a parity check again.  Within 39 min, 672 errors were showing and I cancelled it.

     

    I only use the server about once or twice a week for a couple hours.  So the smart power on total is about 7 months >>9Power on hours0x0012100100000Old ageAlwaysNever5159 (7m, 2d, 23h)

     

    I recently upgraded unraid from 6.9.2 to 6.10.3, and there errors might have started happening after?  If there's some log that has that date of upgrade, it would be good to know if there's any correlation.  Yes, I know correlation is not causation, lol.

     

     

    I have attached tower diagnostics.

     

    Questions:

    1. Does running further parity checks with the write corrections to parity, cause my currently stored data files to corrupt more or is this only affecting parity data stored for recovery?  I.E. should I not do this until a cause is found?

     

    2.  Are the errors happening on the actual drives, or is something else failing?

     

    3. Should I not be adding anymore files to these hard drives?

     

    3.  Suggestion for troubleshooting this? 

     

     

     

    1574718895_unraidmain.thumb.jpg.e1ea6201292c525c3f53f124dbe37c2a.jpg

     

    591970922_paritycheckhistorylog.thumb.jpg.50748cd5f01ae93aae49d5de612209a7.jpg

    tower-diagnostics-20230326-1515.zip

  10. On 2/26/2023 at 4:47 PM, lepa71 said:

     

    I'm wondering if this will work in my setup. I have windows server 2016 essentials running on older Asus P8H67-M EVO motherboard.

     

     

    Do you think using your setup would work better than just the card?

     

    Thanks in advanced.

    I'm only using the card with unraid.  I did not test this card with any microsoft operating system.  Maybe ask someone in a microsoft forum?

  11. On 10/12/2020 at 1:21 PM, Hoopster said:

    With PCIe x1 you can really only get a 2-port SATA card. 

     

    Look for something based on the JMicron JMB582 or ASMedia ASM1061 controller chip.  There are some from Syba, IO Crest, etc.

    I combined this https://www.amazon.com/dp/B09N8MCDTZ?psc=1&ref=ppx_yo2ov_dt_b_product_details

    with this https://www.amazon.com/dp/B09VGTMX7W

     

    I didn't think this would work at all, but so far it's working (past hour). 

     

    UPDATE:  This has worked flawlessly and no more UDMA CRC error counts.  Going to toss this IO CREST Internal 5 Port Non-Raid SATA III in the trash which was causing my problems.

    image.png.36374ab89e50e234be4d0448eabd6a86.png

     

    image.thumb.png.600a87cf7a918a0cd70408ed9ed53bda.png

    image.png.297c84ce3d44d99f2c365cb164d1b5ee.png

     

    20221201_110629.jpg

    • Upvote 1
  12. 23 hours ago, Squid said:

    This is my first guess

    /tmp
    total 0
    drwxr-xr-x 3 nobody users  60 Apr 19 12:54 Transcode

    You have (plex I imagine) transcoding to RAM.  No problems with that.  My thought is that in the Plex template you've got /transcode mapped to /tmp and within Plex you're telling it to transcode to /transcode/transcode.  Shouldn't be an issue per se (that I can think of), but I can see tons of very weird things happening if /tmp ever gets deleted by an App.

    According to an article on the unraid forum, it's configured correctly.  Do you think I should change it in some other way?

    "There are two tweaks to be made in order to move your transcoding into RAM.  One is to the Docker Container you are running and the other is a setting from within the Plex web client itself.

     

    Step 1:  Changing your Plex Container Properties

    From within the webGui, click on "Docker" and click on the name of the PlexMediaServer container.  From here, add a new volume mapping:

     

    /transcode to /tmp

     

    Click "Apply" and the container will be started with the new mapping.

     

    Step 2:  Changing the Plex Media Server to use the new transcode directory

    Connect to the Plex web interface from a browser (e.g. http://tower:32400/web).  From there, click the wrench in the top right corner of the interface to get to settings.  Now click the "Server" tab at the top of this page.  On the left, you should see a setting called "Transcoder."  Clicking on that and then clicking the "Show Advanced" button will reveal the magical setting that let's you redirect the transcoding directory.  Type "/transcode" in there and click apply and you're all set.  You can tweak some of the other settings if desired to see if that improves your media streaming experience."

  13. I was not watching plex at the time this happened.  I know it still runs in the background. FYI

     

    Did you see any errors related to every single drive including unraid usb disappearing?  Would the system do this if the USB was going bad and disappeared from the system or would it behave completely differently if that were the case?

     

    Still on pass 1/4 for memtest.  30 minutes in and no errors...  I also noticed I was getting a few errors on the parity drive related to bad cable.  I switched out all the sata cables with new cables from https://www.amazon.com/gp/product/B01IBA335Y/  last week.  With one of the admin's help, I had an SSD disappear and reappear and it was corrupt.  Got that added back in to the cache pool as a raid 1.  So far (besides the data loss I had from a partition getting corrupted on my rust spinner), it looked like it was stable.

     

    Do you think I need to do the full 4/4 passes of memtest or by now we should have seen something, before I reboot with my unraid usb?  BTW, thanks for the lightning fast response!

    20220419_145343.jpg

  14. Downloaded memtest86 and running it off of a usb slot...  So far no memory errors on pass 1/4.  Will post a pic of the finished test.

     

    Just don't understand why I can't catch a break.  LOL

     

    Sorry admins for being such a pain in the butt.  I know I've had 3 different issues in the past two weeks.  On the positive side, I think it ran fine for over a year without issues.  (using it about 16 hours a week)

  15. Everything was going fine except that all of a sudden, my drives all disappeared from the array.  4 of the rust spinners are connected to the add-on card (see signature) 2 of the SSD's connected to MB.  Watched a movie from plex yesterday and didn't have any issues.  Copied more stuff to the drive and right after the copy job...I get the screen below. 1st screen...everything is gone including the unraid thumbdrive.  Then a few seconds later, I see the drives come back, but they are all unassigned.  Then they all disappear again, and then again show up as unassigned.  So I go into the diagnostics to make a diagnostic log file for you guys and I get an error trying to do that.

     

    This week I replacement my 16GB of ram which weren't working with the Patriot Viper memory sent to me by patriot as part of a lifetime warranty exchange.  Yup, I bought the ram in 2008 and they sent me brand new sticks to replace them under warranty.

     

    1.jpg

    2.jpg

    3.jpg

  16. On 4/16/2022 at 6:14 PM, trurl said:

    Format is NEVER part of rebuild.

    When you formatted the emulated disk, parity was updated so it agreed the disk had been formatted. So rebuilding can only result in a formatted disk.

    Now that actually explains it and makes sense. 

     

    On 4/16/2022 at 6:14 PM, trurl said:

    If the disk was unmountable it would allow you to format it, but you should never format a disk that should have data on it. So you should have stopped at that point for further advice.

     

    You are absolutely correct.

     

     

    On 4/16/2022 at 6:14 PM, trurl said:

    You should have repaired the filesystem instead of formatting, and then rebuilt the repaired filesystem.  https://wiki.unraid.net/Manual/Storage_Management#Drive_shows_as_unmountable

    I did attempt a repair of the drive (see above diagnostics) first.  I wrongly assumed that formatting is preparing the drive to be made usable after an unmountable condition.  I'll know better for next time, but still will consult with admins here to make sure I'm doing the steps correctly.  Thank you explaining what happened.  🙂

     

    Quick question to clarify.

    1.  How can you tell if the disk with the partition damage is being emulated, I know the disk could have had 50-150gb of stuff on it but the files I save there gets also saved to another disk so now way to know what was lost.  I don't think I saw the folder icon next to the disk to check.

    2. Is it possible that if the partition is damaged that the disk is not being emulated?

    3.  If the drive was being emulated, how could you copy just the files from the bad drive to a USB drive?

    4. After shutting down the array, then removing the bad drive, then you restart array, is the drive still being emulated?

     

     

     

  17. On 4/8/2022 at 12:05 AM, JorgeB said:

    That error means there's a partition problem, not a filesystem one, so xfs_repair won't help, if parity is valid you can rebuild the disk and let Unraid will recreate the correct partition, to test you can stop the array, unassign that disk, start the array, if the emulated disk mounts and contents look correct you can rebuild on top, if the emulated disk doesn't mount post new diags.

    I unassigned the disk from array, reassigned it, it said it needed to be formatted, formatted it, then it started rebuilding, but it rebuild a blank disk instead of my 1TB of stuff.  ;-(  I think I screwed up or something...maybe I needed to move the disk out of the array and format it there.  Not sure.  Would like to know your opinion for next time on what I might have done wrong.

  18. On 4/9/2022 at 2:37 AM, opentoe said:

    See if you can get GPARTED to run on your system. It is a powerful partition tool that barely takes up space, but at least you will have a visual of what's going on. Since I didn't want to fool around with my server I created a GPARTED live USB bootable stick. I used a PC I had that was running Windows 10. I just connected the drive in question to the SATA port on the computer and then booted to the USB stick which ran the GPARTED GUI.

     

    Here you can download the GPARTED LIVE ISO

     

    Just "burn" the ISO file with the utility RUFUS to the USB stick and away you go.

    Thanks for the tip!  I tried to use gparted, and told it to repair but it seemed like it froze.  Then i walked away for 2 hours, came back, the monitor was on sleep mode.  No moving of mouse or keyboard woke up gparted.  So I went ahead with the recovery procedure but I messed up.  Lost about a TB of stuff because I didn't pay attention.

  19. 14 hours ago, JorgeB said:

    That error means there's a partition problem, not a filesystem one, so xfs_repair won't help, if parity is valid you can rebuild the disk and let Unraid will recreate the correct partition, to test you can stop the array, unassign that disk, start the array, if the emulated disk mounts and contents look correct you can rebuild on top, if the emulated disk doesn't mount post new diags.

    What would cause a partition problem?

     

    Would the rebuild take the same amount of time as a parity check?  (all the drives are shingled variety)

     

    Are there issues with doing a rebuild on shingled drives?

  20. So I finally got my IO CREST Internal 5 Port Non-Raid SATA III 6GB/S Pci-E X4 Controller Card for Desktop PC Support SSD and HDD with Low Profile Bracket. JMB585 Chipset SI-PEX40139.  Everything was going well but I suddenly got the "unmountable: unsupported drive.  Per the unraid manual, I started the array in maintenance mode and ran xfs_repair /dev/md2 from the command line.  The issue is with disk 2 so I think I did the correct command.  I restarted the array (uncheck maintenance mode) and the drive still says unmountable...etc.  What am I suppose to do next? 

     

    root@Tower:~# xfs_repair /dev/md2
    Phase 1 - find and verify superblock...
            - reporting progress in intervals of 15 minutes
    Phase 2 - using internal log
            - zero log...
            - 20:26:15: zeroing log - 38155 of 38155 blocks done
            - scan filesystem freespace and inode maps...
            - 20:26:15: scanning filesystem freespace - 50 of 50 allocation groups done
            - found root inode chunk
    Phase 3 - for each AG...
            - scan and clear agi unlinked lists...
            - 20:26:15: scanning agi unlinked lists - 50 of 50 allocation groups done
            - process known inodes and perform inode discovery...
            - agno = 15
            - agno = 45
            - agno = 0
            - agno = 30
            - agno = 46
            - agno = 16
            - agno = 47
            - agno = 48
            - agno = 17
            - agno = 49
            - agno = 31
            - agno = 18
            - agno = 32
            - agno = 19
            - agno = 33
            - agno = 20
            - agno = 34
            - agno = 35
            - agno = 21
            - agno = 36
            - agno = 37
            - agno = 22
            - agno = 38
            - agno = 39
            - agno = 23
            - agno = 1
            - agno = 40
            - agno = 41
            - agno = 24
            - agno = 42
            - agno = 43
            - agno = 44
            - agno = 25
            - agno = 26
            - agno = 27
            - agno = 2
            - agno = 28
            - agno = 29
            - agno = 3
            - agno = 4
            - agno = 5
            - agno = 6
            - agno = 7
            - agno = 8
            - agno = 9
            - agno = 10
            - agno = 11
            - agno = 12
            - agno = 13
            - agno = 14
            - 20:26:15: process known inodes and inode discovery - 3456 of 3456 inodes done
            - process newly discovered inodes...
            - 20:26:15: process newly discovered inodes - 50 of 50 allocation groups done
    Phase 4 - check for duplicate blocks...
            - setting up duplicate extent list...
            - 20:26:15: setting up duplicate extent list - 50 of 50 allocation groups done
            - check for inodes claiming duplicate blocks...
            - agno = 0
            - agno = 2
            - agno = 1
            - agno = 7
            - agno = 5
            - agno = 6
            - agno = 3
            - agno = 8
            - agno = 9
            - agno = 10
            - agno = 11
            - agno = 12
            - agno = 13
            - agno = 14
            - agno = 15
            - agno = 16
            - agno = 17
            - agno = 18
            - agno = 19
            - agno = 20
            - agno = 21
            - agno = 22
            - agno = 23
            - agno = 25
            - agno = 26
            - agno = 24
            - agno = 27
            - agno = 28
            - agno = 29
            - agno = 30
            - agno = 31
            - agno = 33
            - agno = 34
            - agno = 35
            - agno = 36
            - agno = 37
            - agno = 38
            - agno = 39
            - agno = 40
            - agno = 41
            - agno = 42
            - agno = 32
            - agno = 43
            - agno = 45
            - agno = 46
            - agno = 48
            - agno = 44
            - agno = 49
            - agno = 4
            - agno = 47
            - 20:26:15: check for inodes claiming duplicate blocks - 3456 of 3456 inodes done
    Phase 5 - rebuild AG headers and trees...
            - 20:26:18: rebuild AG headers and trees - 50 of 50 allocation groups done
            - reset superblock...
    Phase 6 - check inode connectivity...
            - resetting contents of realtime bitmap and summary inodes
            - traversing filesystem ...
            - traversal finished ...
            - moving disconnected inodes to lost+found ...
    Phase 7 - verify and correct link counts...
            - 20:26:18: verify and correct link counts - 50 of 50 allocation groups done
    done

    tower-diagnostics-20220407-2035.zip

  21. The miser part of me decided to just buy the jmb585 card, since I have a video card currently using a 1x (just need it to play with the bios).  I disconnected the sata drive that was causing the problem and the system booted up fine, remoted into it fine, and all the dockers seems to be working.  Once I get this card, I'll transfer over the 4 spinning rust to the card and use the mobo sata port that were previously plugged into the spinning rust.  I bought new sata and power splitters, and I'm not using the same port that might have been the issue too.  Do you know if I can run two of these cards at the same time?

    61LTsOto3FL._AC_SL1200_.jpg

×
×
  • Create New...