Jump to content

elbobo

Members
  • Posts

    54
  • Joined

  • Last visited

Posts posted by elbobo

  1. I made a terrible mistake, still unsure how I did it.

    Background:

    Dual Parity system previously 4TB drives, now 6TB, only 6TB in system are the parity drives

    Did a swap for Parity 1 - Copy Successful, rebuild successful

    Did a swap for Parity 2 - Copy Successful, rebuild currently running

    Get alert that Parity 2 has raw read error (in the thousands) - Odd for a new drive... This is where I discover my terrible mistake, I've accidently placed a drive that had 3.5 years of spin time on it from my other Unraid box instead of the new 6TB.

     

    At this point I have paused the rebuild of the previous Parity drive (now a disk in the array) because I don't think I should trust the Parity 2.

     

    What's the best way to resolve this with the least risk to data? Everything important on the array is offsite backed up... but if I can avoid bringing it back (and losing the unimportant stuff) I would like that. 

     

    Should I (and can I even) pull parity 2 and restart the data-rebuild of Disk 4 (the swapped disk) then once that is complete, add a new 6TB drive as parity 2 and have that start it's process?

     

    Still unsure how this drive made it from my "destroy/recycle" pile back to the "on hand to swap if necessary" stack... I will be putting some failsafes in place to prevent that going forward. I cannot believe I did this... I am certainly kicking myself...

  2. I updated last evening and tried to log in today, UI never shows and the log shows:

    Quote

    2020-09-06 13:32:58,725 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22829345373872 for <Subprocess at 22829345240976 with name sickchill in state STARTING> (stdout)>
    2020-09-06 13:32:58,725 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 22829345660736 for <Subprocess at 22829345240976 with name sickchill in state STARTING> (stderr)>
    2020-09-06 13:32:58,726 INFO exited: sickchill (exit status 2; not expected)
    2020-09-06 13:32:58,726 DEBG received SIGCHLD indicating a child quit
    2020-09-06 13:33:01,733 INFO spawned: 'sickchill' with pid 61
    2020-09-06 13:33:01,786 DEBG 'sickchill' stderr output:
    /usr/sbin/python2: can't open file '/opt/sickchill/SickBeard.py': [Errno 2] No such file or directory

    2020-09-06 13:33:01,787 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22829345241696 for <Subprocess at 22829345240976 with name sickchill in state STARTING> (stdout)>
    2020-09-06 13:33:01,787 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 22829344995168 for <Subprocess at 22829345240976 with name sickchill in state STARTING> (stderr)>
    2020-09-06 13:33:01,788 INFO exited: sickchill (exit status 2; not expected)
    2020-09-06 13:33:01,788 DEBG received SIGCHLD indicating a child quit
    2020-09-06 13:33:02,789 INFO gave up: sickchill entered FATAL state, too many start retries too quickly

    Looking at the github it looks like they might have upgraded to requiring Python3.X on their end. Is there a fix for this?

     

    Thanks!

  3. Currently I am just trying to get everything configured over VNC, my goal after that is done is to use a remote connection tool like TeamViewer or something similar. Unfortunately I am running into an issue with the vanilla build where the mouse will just stop working (The pointer will follow the dot from VNC for a while... then the pointer "sticks" and that's the end, the dot still moved but the arrow sticks, clicking does not respond to what is under the arrow either). Restarting the VNC window doesn't make a difference, I have to restart the VM and then it will work again for a short time. So far, it hasn't run long enough for me to get TeamViewer installed. I've now downloaded TeamViewer on another system and hope I can browse to the network and run it before the mouse disassociates. I have tried each of the VNC settings for the mouse with no luck. 

    Any pointers would be great (I am running with the vanilla XML after the mouse line was removed)

    Thanks!

  4. Adding one more comment in case it helps with a solution:

    Drive 3 that I replaced was throwing a ton of errors, the same as parity, the only SMART issue was UDMA CRC error count so I am assuming that this was also related to the controller. If that's the case, i have that drive as it was when I removed it from the system, so I could possibly rebuild parity off of that (Except: it was a 3TB drive that when I replaced I upgraded to a 4TB drive) 

    Just throwing that out there so all of my information is available... I won't do anything until I hear back.

  5.  

    Sorry, hopefully the last post on this:

    Got the new LSI Card today and installed it. Booted up but due to the read errors issue with the parity drive it is in the disabled state.

    Drive 3 is in an enabled state but because it "rebuilt" with millions of read errors from the parity drive I do not trust that it is actually rebuilt.

    Where do I go from here? 

    Thank you

    tower-diagnostics-20190331-1707.zip

  6. Is there a controller (or even just a manufacturer) you’d recommend? I built this about 5 years ago and haven’t really looked at changes and recommendations since then. 

    If I replace that is it as simple as replacing the card, reseating cables and booting up, then removing and readding drive 3 to rebuild again?

    drive 3 was on the card prior to a swap out now it’s using the MB SATA. 

  7. Before I could check for replies it claims it has completed:

    Quote

     

    Total size:4 TB

    Elapsed time:8 hours, 3 minutes

    Current position:4 TB (100.0 %)

    Estimated speed:753.0 MB/sec

    Estimated finish:completed

     

    It does claim parity is valid, I'm doubtful. 

     

    I'm assuming it's because of the 866mil "read error" messages but when I try to gather diagnostics I get this:

    Quote

    Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134094880 bytes) in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(418) : eval()'d code on line 73

     

    image.thumb.png.1b224c20e5fe96151516f28daf551897.png

    I don't know how it (Disk 3) could be complete with only 843.382 writes either there is 2.24TB of data on that drive. 

  8. The parity drive is about 3 months old and had no signs of issues during pre-clear, I believe my issue is a bad cable on a 4 drive cage causing the Parity drive to have read errors. The concern I have is I am currently doing a rebuild of drive # 3 in my system, I can't imagine it will finish correctly with 32mil+ read errors on parity. 

    What is the best way to handle this? 

    The log file is 100% filled with read error lines. 

    Do I shutdown, reseat cables and restart the drive rebuild? 

  9. I had an issue where a port on my MB went bad causing my cache to come up as bad BTRFS. Since I had recently been given a better system anyway I decided it would be best to just move everything over. 

    I have done that and the system boots. Unfortunately it is saying every drive is "wrong" 

    Apparently the new system is using an _ in the drive name where the old used a 0. I'm unsure how or why but this obviously is preventing the array from starting. 
    any help would be greatly appreciated.

     

    Thank you,

    Wrong.png

  10. 16 hours ago, johnnie.black said:

    This shouldn't happen, do you by any chance have the syslog/diags covering the swap?

     

     

    Attached... I misspoke (mistyped) in my previous reply, the failing disk has not been physically pulled, it is in the system but is in the unassigned devices. 

    Thankfully this means my current diagnostics still has the entire process in it. 

    tower-diagnostics-20181201-1413.zip

  11. 20 hours ago, John_M said:

    The quickest way to force a parity re-sync is to stop the array, un-assign parity, start the array without parity, stop the array, re-assign parity, and finally start the array and let it build. Grab new diagnostics - there might be some indication as to what went wrong.

     

    But is that what you want to do? Since Disk 3 is failing it might not be entirely readable, which was the reason for doing the parity swap in the first place - so that you could then rebuild Disk 3 onto a new disk.

     

    The failing disk has been pulled. 

    I made it through the entire parity swap without any issues, and was able to rebuild the failing disk onto the former parity disk also without errors. 

    It was only during the monthly parity check that this issue was discovered. (hindsight I should have run a parity when it was complete)

     

  12. I did the parity swap procedure as suggested and everything seemed great. Last night my monthly parity check ran and it came back with 183141001 errors. Reading the forum it appears that during the process the extra 1TB that my parity is as compared to the drives in the system didn't properly wipe during the parity swap procedure. 

    In the conversations that followed there was the recommendation to do another parity sync instead of a parity check (fix) because the it is less intensive (write to parity vs read from parity, compare value, write if necessary) for all the sectors that have an error. 

    I cannot find how to just do a new Parity sync. 

     

    Thank you,

  13. On 11/19/2018 at 6:28 PM, John_M said:

    That disk is failing.

    Thank you, with that being said

    I have a 4TB drive that can be dropped in, but I know this would need to become the Parity since it is the largest in the system. 

     

    How risky is doing a full new parity while that disk has these errors? (Should I hunt down a 3TB to not risk it?) 

    Does it make any sense to move the parity to replace the drive with the errors when the parity drive has 5y and 5 days of up time? The SMART looks clean and I cannot afford to replace all 5 at this time but I can grab another 1 If it would be silly to go through the work of rebuilding on such an old drive (and replace the others as time goes on) 

  14. 3 hours ago, John_M said:

    Can you mount them manually? You can refer to them as /dev/sdb and /dev/sdc if it's just a temporary measure to copy off your data, or by their /dev/disk/by-id names, maybe in a little script, to save typing, if you want to mount them regularly. Just open a terminal session and use the mount command. You'll have to create their mount points manually first. Something like

    
    mkdir /mnt/disks/HP1
    mount /dev/sdb /mnt/disks/HP1
    # Alternatively,
    # mount /dev/disk/by-id/LOGICAL_VOLUME_5001438010F32760_3600508b1001c3ef5300bf0e49df986fa /mnt/disks/HP1
    mkdir /mnt/disks/HP2
    mount /dev/sdb /mnt/disks/HP2
    # Alternatively,
    # mount /dev/disk/by-id/LOGICAL_VOLUME_5001438010F32760_3600508b1001c3d2b732d4816c14b3a1e /mnt/disks/HP2
    
    ...
    
    umount /mnt/disks/HP1
    rmdir /mnt/disks/HP1
    umount /mnt/disks/HP2
    rmdir /mnt/disks/HP2

    You might need to add a mount option or two.

    I was able to let Unassigned-Devices mount the prior one by removing the partition on the newer one. I then created a share with a preference to cache (just for speed) and did:

    cp -r /mnt/disks/* /mnt/cache/RecoverVMS/

    one that was complete I connected to the HP server and recreated my 8 disc SSD array as a single RAID 5+0 mounted that as an unassigned device with the same mount point of the previous and then did a copy in the reverse direction 

    I have restarted my VMs and they are functioning, I'll count that as a success.

     

    Thank you for all of your help, in the end I decided to risk the data for the opportunity to consolidate the space into a single disc running with it's own redundancy. That way in the future the data is better protected. 

  15. 19 hours ago, John_M said:

    They could in fact be different but so long that they are getting truncated somewhere.

    You were right, the names that HP assigned were:

    LOGICAL_VOLUME_5001438010F32760_3600508b1001c3ef5300bf0e49df986fa (sdb)
    LOGICAL_VOLUME_5001438010F32760_3600508b1001c3d2b732d4816c14b3a1e (sdc)

    Unfortunately it does not appear that it is something that can be changed

     

    If it isn't possible to mount them both should there be any issue with me copying the items off the currently mounted unassigned device to the array, recreating the logical volume as a single volume and then copying the images back. Most specifically, I just want to make sure that shouldn't cause issues with my VMs

     

    Thank you

  16. 1st off, thank you for the plugin it has been working wonderfully for me for quite a while. 

    I'm running on an old HP Server and I had split my 2.5in SAS raid block as 2 separate partitions to be used as a cache and as an unassigned space to mount VMs

    I have now upgraded to a larger SSD Cache and was just going to mount the previous cache as a second unassigned device. Unfortunately when I mount it I can no longer browse the 1st. It appears the issue is the SAS is identified by the same ID, so even though I have sdb and sdc when I change the mounting point for one it changes for the other. I tried two different FS just to see and that did not change anything. 

    Is it possible to mount them separately?

         If not, if I copy the 147GB partition to my array, and then merge the two within the HP utility, and then remount and copy the files back would I run into any issues within my VMs (if I keep the naming convention)?

              Would you just suggest a cp -r /mnt/disks/HP_LOGICAL_VOLUME/* /mnt/cache/ from shell or another method to move it?

     

    Thank you for any assistance or guidance you can provide. 

    UnAssigned.PNG

×
×
  • Create New...