elbobo

November 22, 2021

Thank you for your help!

I have unassigned parity 2 and restarted the array, it is rebuilding Disk4 and I am optimistic that this will be successful.

Once it is done (and I get a new 6TB in) I will readd my Parity2 removing the unassigned drive and marring/tagging/disposing of it so this never happens again.

November 22, 2021

Diagnostics attached.

No, the original disk 4 does not exist anymore.

Thank you

tower-diagnostics-20211121-2043.zip

November 22, 2021

I made a terrible mistake, still unsure how I did it.

Background:

Dual Parity system previously 4TB drives, now 6TB, only 6TB in system are the parity drives

Did a swap for Parity 1 - Copy Successful, rebuild successful

Did a swap for Parity 2 - Copy Successful, rebuild currently running

Get alert that Parity 2 has raw read error (in the thousands) - Odd for a new drive... This is where I discover my terrible mistake, I've accidently placed a drive that had 3.5 years of spin time on it from my other Unraid box instead of the new 6TB.

At this point I have paused the rebuild of the previous Parity drive (now a disk in the array) because I don't think I should trust the Parity 2.

What's the best way to resolve this with the least risk to data? Everything important on the array is offsite backed up... but if I can avoid bringing it back (and losing the unimportant stuff) I would like that.

Should I (and can I even) pull parity 2 and restart the data-rebuild of Disk 4 (the swapped disk) then once that is complete, add a new 6TB drive as parity 2 and have that start it's process?

Still unsure how this drive made it from my "destroy/recycle" pile back to the "on hand to swap if necessary" stack... I will be putting some failsafes in place to prevent that going forward. I cannot believe I did this... I am certainly kicking myself...

September 6, 2020

I updated last evening and tried to log in today, UI never shows and the log shows:

Quote

2020-09-06 13:32:58,725 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22829345373872 for <Subprocess at 22829345240976 with name sickchill in state STARTING> (stdout)>
2020-09-06 13:32:58,725 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 22829345660736 for <Subprocess at 22829345240976 with name sickchill in state STARTING> (stderr)>
2020-09-06 13:32:58,726 INFO exited: sickchill (exit status 2; not expected)
2020-09-06 13:32:58,726 DEBG received SIGCHLD indicating a child quit
2020-09-06 13:33:01,733 INFO spawned: 'sickchill' with pid 61
2020-09-06 13:33:01,786 DEBG 'sickchill' stderr output:
/usr/sbin/python2: can't open file '/opt/sickchill/SickBeard.py': [Errno 2] No such file or directory

2020-09-06 13:33:01,787 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22829345241696 for <Subprocess at 22829345240976 with name sickchill in state STARTING> (stdout)>
2020-09-06 13:33:01,787 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 22829344995168 for <Subprocess at 22829345240976 with name sickchill in state STARTING> (stderr)>
2020-09-06 13:33:01,788 INFO exited: sickchill (exit status 2; not expected)
2020-09-06 13:33:01,788 DEBG received SIGCHLD indicating a child quit
2020-09-06 13:33:02,789 INFO gave up: sickchill entered FATAL state, too many start retries too quickly

Looking at the github it looks like they might have upgraded to requiring Python3.X on their end. Is there a fix for this?

Thanks!

October 29, 2019

Currently I am just trying to get everything configured over VNC, my goal after that is done is to use a remote connection tool like TeamViewer or something similar. Unfortunately I am running into an issue with the vanilla build where the mouse will just stop working (The pointer will follow the dot from VNC for a while... then the pointer "sticks" and that's the end, the dot still moved but the arrow sticks, clicking does not respond to what is under the arrow either). Restarting the VNC window doesn't make a difference, I have to restart the VM and then it will work again for a short time. So far, it hasn't run long enough for me to get TeamViewer installed. I've now downloaded TeamViewer on another system and hope I can browse to the network and run it before the mouse disassociates. I have tried each of the VNC settings for the mouse with no luck.

Any pointers would be great (I am running with the vanilla XML after the mouse line was removed)

Thanks!

April 6, 2019

Sorry for the late reply, had some other issues and wasn't able to give this a crack. It worked perfectly, thank you so much for all of your help!

March 31, 2019

Adding one more comment in case it helps with a solution:

Drive 3 that I replaced was throwing a ton of errors, the same as parity, the only SMART issue was UDMA CRC error count so I am assuming that this was also related to the controller. If that's the case, i have that drive as it was when I removed it from the system, so I could possibly rebuild parity off of that (Except: it was a 3TB drive that when I replaced I upgraded to a 4TB drive)

Just throwing that out there so all of my information is available... I won't do anything until I hear back.

March 31, 2019

Sorry, hopefully the last post on this:

Got the new LSI Card today and installed it. Booted up but due to the read errors issue with the parity drive it is in the disabled state.

Drive 3 is in an enabled state but because it "rebuilt" with millions of read errors from the parity drive I do not trust that it is actually rebuilt.

Where do I go from here?

Thank you

tower-diagnostics-20190331-1707.zip

March 31, 2019

I have ordered an LSI controller which will arrive today, I have kept my system off and will follow your advice once I have the new card and have it installed.

Thank you for all of your help and guidance!

March 29, 2019

Is there a controller (or even just a manufacturer) you’d recommend? I built this about 5 years ago and haven’t really looked at changes and recommendations since then.

If I replace that is it as simple as replacing the card, reseating cables and booting up, then removing and readding drive 3 to rebuild again?

drive 3 was on the card prior to a swap out now it’s using the MB SATA.

March 29, 2019

diagnostics worked, thank you. Attached

thank you also for all of your help on this!

tower-diagnostics-20190329-0905.zip

March 29, 2019

Before I could check for replies it claims it has completed:

Quote

Total size:4 TB

Elapsed time:8 hours, 3 minutes

Current position:4 TB (100.0 %)

Estimated speed:753.0 MB/sec

Estimated finish:completed

It does claim parity is valid, I'm doubtful.

I'm assuming it's because of the 866mil "read error" messages but when I try to gather diagnostics I get this:

Quote

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134094880 bytes) in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(418) : eval()'d code on line 73

I don't know how it (Disk 3) could be complete with only 843.382 writes either there is 2.24TB of data on that drive.

March 29, 2019

The parity drive is about 3 months old and had no signs of issues during pre-clear, I believe my issue is a bad cable on a 4 drive cage causing the Parity drive to have read errors. The concern I have is I am currently doing a rebuild of drive # 3 in my system, I can't imagine it will finish correctly with 32mil+ read errors on parity.

What is the best way to handle this?

The log file is 100% filled with read error lines.

Do I shutdown, reseat cables and restart the drive rebuild?

February 3, 2019

Tools -> New Config

Sorry I should have looked there first.

Parity is being rebuilt

Thank you

February 3, 2019

Yeah, as I have them right now is accurate (the last 10 digits didn't change)

What do I have to do to do a new config? Right now it is telling me my config has too many wrong and will not let me assign anything to any slots.

February 3, 2019

I had an issue where a port on my MB went bad causing my cache to come up as bad BTRFS. Since I had recently been given a better system anyway I decided it would be best to just move everything over.

I have done that and the system boots. Unfortunately it is saying every drive is "wrong"

Apparently the new system is using an _ in the drive name where the old used a 0. I'm unsure how or why but this obviously is preventing the array from starting.
any help would be greatly appreciated.

Thank you,

December 5, 2018

I'm happy to know that it wasn't something I did. I unassigned and reassigned the parity to begin a new sync. Once it completes I will run a parity check for piece of mind.

I've gone ahead and marked this as solved. Thank you for your help

December 2, 2018

16 hours ago, johnnie.black said:

This shouldn't happen, do you by any chance have the syslog/diags covering the swap?

Attached... I misspoke (mistyped) in my previous reply, the failing disk has not been physically pulled, it is in the system but is in the unassigned devices.

Thankfully this means my current diagnostics still has the entire process in it.

tower-diagnostics-20181201-1413.zip

December 2, 2018

20 hours ago, John_M said:

The quickest way to force a parity re-sync is to stop the array, un-assign parity, start the array without parity, stop the array, re-assign parity, and finally start the array and let it build. Grab new diagnostics - there might be some indication as to what went wrong.

But is that what you want to do? Since Disk 3 is failing it might not be entirely readable, which was the reason for doing the parity swap in the first place - so that you could then rebuild Disk 3 onto a new disk.

The failing disk has been pulled.

I made it through the entire parity swap without any issues, and was able to rebuild the failing disk onto the former parity disk also without errors.

It was only during the monthly parity check that this issue was discovered. (hindsight I should have run a parity when it was complete)

December 1, 2018

I did the parity swap procedure as suggested and everything seemed great. Last night my monthly parity check ran and it came back with 183141001 errors. Reading the forum it appears that during the process the extra 1TB that my parity is as compared to the drives in the system didn't properly wipe during the parity swap procedure.

In the conversations that followed there was the recommendation to do another parity sync instead of a parity check (fix) because the it is less intensive (write to parity vs read from parity, compare value, write if necessary) for all the sectors that have an error.

I cannot find how to just do a new Parity sync.

Thank you,

November 21, 2018

On 11/19/2018 at 6:28 PM, John_M said:

That disk is failing.

Thank you, with that being said

I have a 4TB drive that can be dropped in, but I know this would need to become the Parity since it is the largest in the system.

How risky is doing a full new parity while that disk has these errors? (Should I hunt down a 3TB to not risk it?)

Does it make any sense to move the parity to replace the drive with the errors when the parity drive has 5y and 5 days of up time? The SMART looks clean and I cannot afford to replace all 5 at this time but I can grab another 1 If it would be silly to go through the work of rebuilding on such an old drive (and replace the others as time goes on)

November 21, 2018

3 hours ago, John_M said:
Can you mount them manually? You can refer to them as /dev/sdb and /dev/sdc if it's just a temporary measure to copy off your data, or by their /dev/disk/by-id names, maybe in a little script, to save typing, if you want to mount them regularly. Just open a terminal session and use the mount command. You'll have to create their mount points manually first. Something like
mkdir /mnt/disks/HP1
mount /dev/sdb /mnt/disks/HP1
# Alternatively,
# mount /dev/disk/by-id/LOGICAL_VOLUME_5001438010F32760_3600508b1001c3ef5300bf0e49df986fa /mnt/disks/HP1
mkdir /mnt/disks/HP2
mount /dev/sdb /mnt/disks/HP2
# Alternatively,
# mount /dev/disk/by-id/LOGICAL_VOLUME_5001438010F32760_3600508b1001c3d2b732d4816c14b3a1e /mnt/disks/HP2

...

umount /mnt/disks/HP1
rmdir /mnt/disks/HP1
umount /mnt/disks/HP2
rmdir /mnt/disks/HP2
You might need to add a mount option or two.

I was able to let Unassigned-Devices mount the prior one by removing the partition on the newer one. I then created a share with a preference to cache (just for speed) and did:

cp -r /mnt/disks/* /mnt/cache/RecoverVMS/

one that was complete I connected to the HP server and recreated my 8 disc SSD array as a single RAID 5+0 mounted that as an unassigned device with the same mount point of the previous and then did a copy in the reverse direction

I have restarted my VMs and they are functioning, I'll count that as a success.

Thank you for all of your help, in the end I decided to risk the data for the opportunity to consolidate the space into a single disc running with it's own redundancy. That way in the future the data is better protected.

November 20, 2018

19 hours ago, John_M said:

They could in fact be different but so long that they are getting truncated somewhere.

You were right, the names that HP assigned were:

LOGICAL_VOLUME_5001438010F32760_3600508b1001c3ef5300bf0e49df986fa (sdb)
LOGICAL_VOLUME_5001438010F32760_3600508b1001c3d2b732d4816c14b3a1e (sdc)

Unfortunately it does not appear that it is something that can be changed

If it isn't possible to mount them both should there be any issue with me copying the items off the currently mounted unassigned device to the array, recreating the logical volume as a single volume and then copying the images back. Most specifically, I just want to make sure that shouldn't cause issues with my VMs

Thank you

November 20, 2018

1st off, thank you for the plugin it has been working wonderfully for me for quite a while.

I'm running on an old HP Server and I had split my 2.5in SAS raid block as 2 separate partitions to be used as a cache and as an unassigned space to mount VMs

I have now upgraded to a larger SSD Cache and was just going to mount the previous cache as a second unassigned device. Unfortunately when I mount it I can no longer browse the 1st. It appears the issue is the SAS is identified by the same ID, so even though I have sdb and sdc when I change the mounting point for one it changes for the other. I tried two different FS just to see and that did not change anything.

Is it possible to mount them separately?

If not, if I copy the 147GB partition to my array, and then merge the two within the HP utility, and then remount and copy the files back would I run into any issues within my VMs (if I keep the naming convention)?

Would you just suggest a cp -r /mnt/disks/HP_LOGICAL_VOLUME/* /mnt/cache/ from shell or another method to move it?

Thank you for any assistance or guidance you can provide.

November 19, 2018

Attached is the outcome of the extended SMART test. Thank you!

WDC_WD30EZRZ-00WN9B0_WD-WCC4E6ZYVE2Y-20181119-1755.txt

elbobo

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by elbobo

(Solved) Failed Parity Drive (on 2 parity system)

(Solved) Failed Parity Drive (on 2 parity system)

(Solved) Failed Parity Drive (on 2 parity system)

[Support] binhex - SickChill

[Support] SpaceinvaderOne - Macinabox

(Solved) massive read errors on Parity during rebuild of another drive

(Solved) massive read errors on Parity during rebuild of another drive

(Solved) massive read errors on Parity during rebuild of another drive

(Solved) massive read errors on Parity during rebuild of another drive

(Solved) massive read errors on Parity during rebuild of another drive

(Solved) massive read errors on Parity during rebuild of another drive

(Solved) massive read errors on Parity during rebuild of another drive

(Solved) massive read errors on Parity during rebuild of another drive

SOLVED Moving system to new box - All drives "Wrong"

SOLVED Moving system to new box - All drives "Wrong"

SOLVED Moving system to new box - All drives "Wrong"

(SOLVED) Replacing a drive with a larger size drive

(SOLVED) Replacing a drive with a larger size drive

(SOLVED) Replacing a drive with a larger size drive

(SOLVED) Replacing a drive with a larger size drive

(SOLVED) Replacing a drive with a larger size drive

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

(SOLVED) Replacing a drive with a larger size drive