Upgrade parity drive with failed drive question

CyberMew · May 20, 2023

I currently have a failed drive (6TB) in my Disk5 slot (red ball icon). My 2 parity drives are 10TB.

I shutdown my server, and I have removed the failed drive (6TB) and put in a new 18TB drive.

I am preclearing my 18TB drive now. Array is currently stopped. Disk5 slot is now unassigned. I hope so far this is correct move by me.. please do correct me if I am wrong.

Going by the swap procedure here https://wiki.unraid.net/The_parity_swap_procedure (which I didn't quite grasp, would be nice to have photos..), can I confirm that once the preclear is done, I need unassigned the Parity2 (or Parity1, but I am going to choose Parity2) drive (10TB), reassigned that parity 10TB drive to my Disk5 slot, and then assign the precleared 18TB drive into my Parity2 slot?

If that is correct, I am not sure how that will 'restore' my previous Disk5 data and keep the parity.. Maybe I haven't reached that step yet, but can I assume, when I do those 2 reassignment actions, 2 things will happen:

1. Copy Disk5 data (which is my Parity2 drive data) to the new 18TB drive (in my Parity2 slot). While array is not started.

2. Disk5 remains a drive with parity data. Start array to rebuild Disk5 actual drive data based on the 2 parity drives (10TB and 18TB).

Hope someone can confirm or correct me if I am wrong.. thank you!!

itimpi · May 20, 2023

You have got the actions that will happen in the wrong order! What happens is:

the current contents of the 10 TB drive that was parity2 is copied to the new 18 TB parity2 drive.
when that completes then the old 10 TB parity2 drive that is now in the disk5 slot is rebuilt with the contents of the emulated disk5 drive.

CyberMew · May 20, 2023

Yes that is what I meant, thanks for helping to clear that up!

May I check, do I do the assignment both at once? My current state is Parity2 (10TB) still assigned, Disk5 empty (previously 6TB failed). 18TB (unassigned) being precleared (and assume it will be cleared for use).

Meaning do I now, (1) Unassign Parity2 slot (10TB drive), (2) Assign that 10TB parity2 drive into Disk5 slot, (3) Assign 18TB into Parity2 slot, (4) Copy Disk5 (parity2 contents) into Parity2 (empty 18TB), (5) start array to rebuilt Disk5 (overriding its parity2 contents)?

Really appreciate some confirmation as I do not want to screw this up.. thank you!

itimpi · May 20, 2023

You do steps 1) to 3) and then start the array for Unraid to do 4) and 5).

CyberMew · May 20, 2023

31 minutes ago, itimpi said:

You do steps 1) to 3) and then start the array for Unraid to do 4) and 5).

Thanks but I still have a question.. that would mean the array is started, but Slot5 is invalid data and Parity2 is also empty. Wouldn’t this cause an issue though?

itimpi · May 20, 2023

1 minute ago, CyberMew said:

Thanks but I still have a question.. that would mean the array is started, but Slot5 is invalid data and Parity2 is also empty. Wouldn’t this cause an issue though?

Why would you think this? Step 2) assigns the new parity2 drive, and step 3) assigns the old parity2 drive as disk5. These are both before you start the array.

CyberMew · May 20, 2023

Because earlier you mentioned the array will be started, hence I came to that conclusion. Oh so you mean as I start the array, 4 and 5 will happen before the array is actually started and the drives starts to be in use?

itimpi · May 20, 2023

2 hours ago, CyberMew said:

Oh so you mean as I start the array, 4 and 5 will happen before the array is actually started and the drives starts to be in use?

This will initiate steps 14 and 15 that are documented in the online documentation for parity swap procedure.

CyberMew · May 20, 2023

Ah I see! Got it now. Thanks a lot for clearing my doubts! Hope everything goes well 🤞

CyberMew · June 1, 2023

I got around to doing this today, and for some reason Parity 1 slot is not showing the new drive, but Parity 2 slot is able to show/select it. I want to use the bigger HDD for Parity 1 instead. Is this not possible?

Current:

2100361378_CleanShot2023-06-02at06_41.09@2x.png.e5ef7afd16a22790aa16d6be47225ea4.png

Edit: after moving the drives around the new drive is somehow appearing under Parity 1 to be selectable. Not sure if this was a bug in 6.9.2.

Edit2: After selecting the new drive in Parity 1 slot, it does not show up at all! Is the wiki instructions outdated?

Edited June 1, 2023 by CyberMew

JorgeB · June 2, 2023

9 hours ago, CyberMew said:

Edit: after moving the drives around the new drive is somehow appearing under Parity 1 to be selectable. Not sure if this was a bug in 6.9.2.

Looks more like a device problem, post new diags.

CyberMew · June 2, 2023

1 hour ago, JorgeB said:

Looks more like a device problem, post new diags.

I have attached as requested, hope you can advise on next steps.. What happened was Disk5 errored out, had to preclear new drive so unassigned Disk5 and started array to run Preclear on new disk. Now that its done, array has been stopped.

tower-diagnostics-20230602-1746.zip

JorgeB · June 2, 2023

Disk itself looks OK but it's generating this error:

Jun  2 06:38:09 Tower kernel: md: import disk0: lock_bdev error: -13

which I don't remember seeing before, try swapping that disk with one using the onboard SATA controller to rule out some compatibility issue with the HBA.

CyberMew · June 4, 2023

On 6/2/2023 at 6:52 PM, JorgeB said:
Disk itself looks OK but it's generating this error:
Jun  2 06:38:09 Tower kernel: md: import disk0: lock_bdev error: -13
which I don't remember seeing before, try swapping that disk with one using the onboard SATA controller to rule out some compatibility issue with the HBA.

Thanks! I switch it and it seems to be working fine as expected. Copied without errors over the past day and rebuilding disk5 now.

Edit: is the add on card problematic? Should I change it? Any brands or model to recommend?

Edited June 4, 2023 by CyberMew

CyberMew · June 4, 2023

Also, just to check, is it normal that Docker and VM is turned off when doing this parity swap and data rebuilding?

Docker: Docker Service failed to start.

VMs: Libvirt Service failed to start.

Anything we should know about other services being turned off as well?

itimpi · June 4, 2023

1 minute ago, CyberMew said:

Also, just to check, is it normal that Docker and VM is turned off when doing this parity swap and data rebuilding?

You do not normally need to touch these services.

CyberMew · June 4, 2023

2 minutes ago, itimpi said:

You do not normally need to touch these services.

Thanks for the info, but is it supposed to be temporary turned off? I didn't see this in the guide, so am wondering if this is normal or not. Worried if it isn't..

itimpi · June 4, 2023

5 minutes ago, CyberMew said:

Thanks for the info, but is it supposed to be temporary turned off? I didn't see this in the guide, so am wondering if this is normal or not. Worried if it isn't..

I have never turned the services off.

CyberMew · June 4, 2023

1 minute ago, itimpi said:

I have never turned the services off.

I am confused about your response because I didn't turn it off. This is my settings:

So any idea why it is stopped? Is it because of the parityswap/data rebuild going on?

JorgeB · June 4, 2023

3 hours ago, CyberMew said:

Docker: Docker Service failed to start.

VMs: Libvirt Service failed to start.

Post new diags.

CyberMew · June 5, 2023

21 hours ago, JorgeB said:

Post new diags.

Have attached as requested (it was after data rebuild). Data rebuild was completed 2 hours ago and I still don’t see Docker and VM up.. so something must be wrong.. thanks a lot for helping to look into it!

tower-diagnostics-20230605-1612.zip

JorgeB · June 5, 2023

I don't see any attempt to start the docker service, try disabling the service apply, re-enable, apply and post new diags.

P.S. you should run an extended SMART test on disk1, appears to be failing.

itimpi · June 5, 2023

You are getting the following error repeating in the syslog:

Jun  4 10:48:01 Tower crond[1974]: failed parsing crontab for user root: #015

You should perhaps post the output of

cat /etc/crond.d/root

if it is not obvious what is causing that.

You also have the following:

Jun  5 02:33:34 Tower kernel: sd 10:0:3:0: [sdi] tag#1210 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=6s
Jun  5 02:33:34 Tower kernel: sd 10:0:3:0: [sdi] tag#1210 Sense Key : 0x3 [current] [descriptor] 
Jun  5 02:33:34 Tower kernel: sd 10:0:3:0: [sdi] tag#1210 ASC=0x11 ASCQ=0x0 
Jun  5 02:33:34 Tower kernel: sd 10:0:3:0: [sdi] tag#1210 CDB: opcode=0x88 88 00 00 00 00 02 8f 36 16 20 00 00 04 00 00 00
Jun  5 02:33:34 Tower kernel: blk_update_request: critical medium error, dev sdi, sector 10992621088 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
Jun  5 02:33:34 Tower kernel: md: disk1 read error, sector=10992621024
Jun  5 02:33:34 Tower kernel: md: disk1 read error, sector=10992621032
Jun  5 02:33:34 Tower kernel: md: disk1 read error, sector=10992621040

which looks like it may be a genuine disk issue so you should consider running an extended SMART test on it as a check.

There are also a lot of FCP warnings that it might be a good idea to consider tidying up.

CyberMew · June 5, 2023

5 hours ago, JorgeB said:

I don't see any attempt to start the docker service, try disabling the service apply, re-enable, apply and post new diags.

P.S. you should run an extended SMART test on disk1, appears to be failing.

I googled and found a similar issue

I restarted and that seemed to do the trick! Attached new diags just in case.tower-diagnostics-20230605-2330.zip

I am also running the extended SMART test now. Thanks a lot!

5 hours ago, itimpi said:
You are getting the following error repeating in the syslog:
Jun  4 10:48:01 Tower crond[1974]: failed parsing crontab for user root: #015
You should perhaps post the output of
cat /etc/crond.d/root
if it is not obvious what is causing that.

You also have the following:
Jun  5 02:33:34 Tower kernel: sd 10:0:3:0: [sdi] tag#1210 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=6s
Jun  5 02:33:34 Tower kernel: sd 10:0:3:0: [sdi] tag#1210 Sense Key : 0x3 [current] [descriptor] 
Jun  5 02:33:34 Tower kernel: sd 10:0:3:0: [sdi] tag#1210 ASC=0x11 ASCQ=0x0 
Jun  5 02:33:34 Tower kernel: sd 10:0:3:0: [sdi] tag#1210 CDB: opcode=0x88 88 00 00 00 00 02 8f 36 16 20 00 00 04 00 00 00
Jun  5 02:33:34 Tower kernel: blk_update_request: critical medium error, dev sdi, sector 10992621088 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
Jun  5 02:33:34 Tower kernel: md: disk1 read error, sector=10992621024
Jun  5 02:33:34 Tower kernel: md: disk1 read error, sector=10992621032
Jun  5 02:33:34 Tower kernel: md: disk1 read error, sector=10992621040
which looks like it may be a genuine disk issue so you should consider running an extended SMART test on it as a check.

There are also a lot of FCP warnings that it might be a good idea to consider tidying up.

Thanks for the suggestions! Yes I should definitely get around to FCP..

The command "cat /etc/crond.d/root" is not working but I think you meant this:

~# cat /etc/cron.d/root

# Generated docker monitoring schedule:

10 0 * * 1 /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/dockerupdate.php check &> /dev/null

# Generated system monitoring schedule:

*/1 * * * * /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null

# Generated mover schedule:

40 4 * * * /usr/local/sbin/mover &> /dev/null

# Generated parity check schedule:

0 0 1 * * /usr/local/sbin/mdcmd check &> /dev/null

# Generated plugins version check schedule:

10 0 * * * /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugincheck &> /dev/null

#Refresh plex tv shows at 3pm everyday 0 15

36 20 * * * logger -tscriptRefreshPlexLibrary5[$$] "Refreshing Plex library 5" && curl "http://127.0.0.1:32400/library/sections/5/refresh?deep=0&X-Plex-Token=xxx" &> /dev/null <- any idea how to remove this?

# Generated array status check schedule:

20 0 * * * /usr/local/emhttp/plugins/dynamix/scripts/statuscheck &> /dev/null

#Running subliminal At minute 0 past every 4th hour. cat /etc/cron.d/root

0 */4 * * * logger -tscriptSubliminal[$$] "Subliminal checking for subs" && /mnt/user/subliminal2/subliminal/checklast2dayspath.sh > /dev/null <- any idea how to remove this as well?

# Generated cron settings for docker autoupdates

0 0 * * 0 /usr/local/emhttp/plugins/ca.update.applications/scripts/updateDocker.php >/dev/null 2>&1

# Generated cron settings for plugin autoupdates

0 0 * * * /usr/local/emhttp/plugins/ca.update.applications/scripts/updateApplications.php >/dev/null 2>&1

# CRON for CA background scanning of applications

34 * * * * php /usr/local/emhttp/plugins/community.applications/scripts/notices.php > /dev/null 2>&1

# Generated ssd trim schedule:

0 3 * * * /sbin/fstrim -a -v | logger &> /dev/null

Not sure what is # 015..

JorgeB · June 5, 2023

47 minutes ago, CyberMew said:

I restarted and that seemed to do the trick!

So it's solved?

Upgrade parity drive with failed drive question

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation