Data drive swap

surface · January 24

Hello,

I am upgrading all my drives in one of my unRAID servers. I'm running 6.12.6 on both.

First, I swapped out the parity drive from a 4TB SATA drive to a 10TB SAS drive following this documentation. After about ~20 hours, everything was fine and great. Everything was green.

Next, I swapped out Disk1 which was also a 4TB SATA drive for another 10TB SAS drive following this documentation. However, in the documentation step 9 says "Put a check in the Yes, I'm sure checkbox (next to the information indicating the drive will be rebuilt), and click the Start button" But a check box was not there. However, I did see that check box when swapping the parity drive. So I clicked the Start Button anyway. And again, about ~20 hours later, it was done. But then I had a red X by the drive, and it said "Device contents emulated"

So I stopped the array. Then started the array back up. It still showed the same thing. Then I stopped the array again, and restarted the server. The server is back up, but drive still says disabled and contents emulated. So I stop the array again, unassign the disk, start the array, stop the array, assign the disk back, and now it says it's rebuilding the data. But this time there is no timer where it shows me how long it's been going, and how much longer it estimates it's going to take.

Is there something I'm missing? I thought the steps were pretty straight forward minus the check box not being there for "Yes, I'm sure"

surface · January 25

I forgot to mention that the disk capacity showed up with 3.9 TB used, and 6.1 TB available after the initial rebuild. And it still shows that now as it's rebuilding again. All drives in the array are XFS file system. And there are no SMART errors or disk errors on any of the drives.

I'm fairly new to unRAID, so if I've left out info please let me know and I'll update. I appreciate any help I can get, and your time spent helping me

itimpi · January 25

Once a disk gets disabled (red ‘x’) for any reason then rebuilding it is the way you clear this status.

You are likely to get better informed feedback if you attach your system’s diagnostics zip file to your next post in this thread. it is always a good idea when asking questions to supply your diagnostics so we can see details of your system, how you have things configured, and the current syslog.

The syslog in the diagnostics is the RAM copy and only shows what happened since the reboot. It could be worth enabling the syslog server to get a log that survives a reboot so we can see what happened prior to the reboot. The mirror to flash option is the easiest to set up, but if you are worried about excessive wear on the flash drive you can put your server’s address into the Remote Server field.

surface · January 25

Thank you for your reply. I have attached the diagnostics file. I've also turned on the syslog server on both my servers. Thank you for the suggestion!

diagnostics-20240124-2052.zip

trurl · January 25

You are reading old documentation. The current documentation is available from the links at top and bottom of the forum, and from the 'manual' link in lower right corner of your Unraid webUI.

surface · January 25

11 minutes ago, trurl said:

You are reading old documentation. The current documentation is available from the links at top and bottom of the forum, and from the 'manual' link in lower right corner of your Unraid webUI.

I appreciate you pointing that out. I have found the current documentation regarding replacing a disk to increase capacity. My steps were pretty similar

Parity check was run and was/is valid.
Stopped the array
unassigned the disk
Started the array
clicked the red x to forget the disk
Stopped the array
Shutdown (this isn't in the documentation, but did it for good measure. I hope that didn't mess with anything)
removed old 4 TB drive
installed new 10 TB drive
Powered on the server
Logged in and stopped the array
Formatted the disk
assigned the new disk in place of the old one
Started array, and it started rebuilding.

~20 hours later it was done, but Disk1 (which was just upgraded from 4 TB to 10 TB) shows "device is disabled, contents emulated" But "Size", "Used" and "Free" columns all showed correctly with 10 TB, 3.9 TB and 6.1 TB respectively under each column header. Is this normal behavior? Was the initial 20+ hours the pre-clear, and now the next 20 hours is rebuilding data?

itimpi · January 25

The fact the drive is disabled suggests a write to it failed. You are likely to get better informed feedback if you attach your system’s diagnostics zip file to your next post in this thread. it is always a good idea when asking questions to supply your diagnostics so we can see details of your system, how you have things configured, and the current syslog.

i am a bit worried by the fact you mentioned a format - that is not part of the normal process of replacing s drive. Users have been known to lose the contents of the emulated drive by accidentally formatting it and end up rebuilding an empty drive. However the figures you quote seem about right so maybe this is not what you did but I am not sure what you used to do the unneeded format as the rebuild process would wipe away any format anyway.

surface · January 25

9 hours ago, itimpi said:

The fact the drive is disabled suggests a write to it failed. You are likely to get better informed feedback if you attach your system’s diagnostics zip file to your next post in this thread. it is always a good idea when asking questions to supply your diagnostics so we can see details of your system, how you have things configured, and the current syslog.

Post #4 has my diag attached unless you're asking me to post a new one?

itimpi · January 25

9 minutes ago, surface said:

Post #4 has my diag attached unless you're asking me to post a new one?

Is the rebuild finished - I thought you implied it had failed. In the earlier diagnostics I could only see the rebuild starting - not it completing or being cancelled.

surface · January 25

No, it isn't finished. I'll post diag when it's done. Looks like about 6 more hours. Thank you for clarification /hi5

trurl · January 25

16 hours ago, surface said:

Parity check was run and was/is valid.

Stopped the array

unassigned the disk

Started the array

clicked the red x to forget the disk

Stopped the array

Shutdown (this isn't in the documentation, but did it for good measure. I hope that didn't mess with anything)

removed old 4 TB drive

installed new 10 TB drive

Powered on the server

Logged in and stopped the array

Formatted the disk

assigned the new disk in place of the old one

Started array, and it started rebuilding.

Basically, all you have to do is assign the new disk to the same slot as the disk it is replacing, and start the array to begin rebuild. All the rest isn't really necessary.

I am concerned that you mention "format" in the middle of all this though. Format is never part of rebuild. It sounds as if you didn't format the disk in the array, though, so should be OK, though totally pointless to format a disk that is going to have every bit overwritten during rebuild.

Does the rebuilding disk show all of the data you expect?

surface · January 26

Here are a few screenshots, and the diag. It's still showing as "Device is disabled, contents emulated" after the rebuild. But all the folders are there

rknas02-diagnostics-20240125-1755.zip

surface · January 26

Just noticed 64 errors that I hadn't noticed before. Also in syslog, I just found this

Jan 25 14:35:34 RKNAS02 kernel: critical target error, dev sdj, sector 19532742384 op 0x1:(WRITE) flags 0x0 phys_seg 64 prio class 2
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742320
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742328
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742336
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742344
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742352
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742360
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742368
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742376
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742384
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742392
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742400
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742408
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742416
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742424
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742432
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742440
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742448
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742456
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742464
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742472
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742480
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742488
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742496
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742504
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742512
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742520
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742528
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742536
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742544
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742552
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742560
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742568
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742576
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742584
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742592
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742600
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742608
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742616
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742624
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742632
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742640
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742648
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742656
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742664
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742672
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742680
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742688
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742696
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742704
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742712
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742720
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742728
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742736
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742744
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742752
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742760
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742768
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742776
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742784
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742792
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742800
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742808
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742816
Jan 25 14:35:34 RKNAS02 kernel: md: disk1 write error, sector=19532742824
Jan 25 14:35:34 RKNAS02 kernel: md: recovery thread: exit status: -4

Maybe this drive is bad?

JorgeB · January 26

SMART looks OK, run a long SMART test on that disk.

surface · January 26

Just as an FYI, this same issue is happening on my nas02 (the one we've been talking about) as well as nas03. I ran a long SMART test on both the 10TB SAS disks that are having issues on nas02 and nas03 about 15 hours ago. They're almost done. Nas03 is at 100% but looks like it's still running. Nas02 is at 98%. Both drives are 10TB Seagate SAS drives. They were purchased used and came from the same seller.
image.png.0dc6f3aafda49e397be37d832ae8a81c.png

Edited January 26 by surface

JorgeB · January 26

Not sure the GUI supports running tests on SAS drives, up until recently it didn't, but to show 100% I guess it does now, you could also try running them manually.

surface · January 26

Manually via a plugin? Or is there a different way you recommend? I guess my question really is, how would I manually run a SMART test on the drive?

JorgeB · January 26

smartctl -t long /dev/sdX

surface · January 26

Both say they're at 100% complete, but they're also showing that they're still running. Should I just stop both, then run that command from terminal?

surface · January 26

I appreciate your quick responses, and sorry for my rapid fire questions.

EDIT: I've started the SMART tests via terminal. I'll update in ~16 hours. Thanks for your help!

Also, here are the logs from the SMART tests

rknas03-smart-20240126-0725.zip rknas02-smart-20240126-1033.zip

Edited January 26 by surface

surface · February 9

What I did to fix this was, and I'm not saying this is going to work for everyone or even if this is correct, but I stopped the array, unassigned the disk, started the array, stopped the array again, assigned the disk. Then I went to tools, and new config, and preserve current assignments. Then started the array again. My disk was then accepted. The weird thing is I didn't see this step in the documentation. So I'm not sure this is the correct way of accomplishing this, but it worked for me

Edited February 9 by surface

itimpi · February 9

3 minutes ago, surface said:

What I did to fix this was, and I'm not saying this is going to work for everyone or even if this is correct, but I stopped the array, unassigned the disk, started the array, stopped the array again, assigned the disk. Then I went to tools, and new config, and preserve current assignments. Then started the array again. My disk was then accepted. The weird thing is I didn't see this step in the documentation. So I'm not sure this is the correct way of accomplishing this, but it worked for me

This approach will lose any updates made to the drive since it was disabled so you can have data loss. It is normally only the last ditch attempt after everything else has failed.

The correct approach is covered here the online documentation accessible via the Manual link at the bottom of the Unraid GUI. In addition every forum page has a DOCS link at the top and a Documentation link at the bottom. The Unraid OS->Manual section covers most aspects of the current Unraid release.

JonathanM · February 9

53 minutes ago, surface said:

it worked for me

You will need to run a correcting parity check. That will take just as long as the rebuild would have, and the rebuild would have included all the writes that happened to that drive slot after the physical disk was disabled.

trurl · February 10

On 1/26/2024 at 12:36 PM, surface said:

I'll update in ~16 hours

If you had we probably would have told you how to proceed.

surface · February 10

I appreciate all of your (the mods) help. I had 10 more drives to upgrade and I didn't want to potentially do this 10 more times. So after it worked, I opted to take the data loss, changed the 10 drives all at once, then restored all my files from backup. All 35TB of it.

Again, I appreciate all of you, including the time you take and work you do to help the community

Data drive swap

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation