SOLVED: Rebooted While Contents Emulated. Now Drives Assignments are Lost.


lostwebb
Go to solution Solved by JorgeB,

Recommended Posts

Hi there, thanks in advance for the help.

 

I had a disk errored for a few weeks. The contents were emulated and I had been using the array.I had to reboot the server after changing some network settings and when it booted back up it had reset the drive assignments. Now any disk assignments I made show up as a new config and I only have the "parity is valid option". I don't remember which disk was disabled and the logs folder is empty.

 

Is there anything I can do at this point to get the array back to its contents emulated state?

 

I know its a bit of a mess but I appreciate the help.

 

Thank you :)

Edited by lostwebb
Added solved tag.
Link to comment
14 minutes ago, JorgeB said:

There's a way, do you know all the disk assignments, single or dual parity?

 

Yes I know the disk assignments, but I don't recall the disk that was disabled. Ive looked through as all the logs I could find and cant find any sign of which one it was. It is single parity.

 

Thanks Jorge!

Link to comment
12 hours ago, JorgeB said:

That's pretty important, post current diagnostics, if a disk is clearly failing it should be visible on SMART.

I have just realised it says in the DISK_ASSIGNMENTS.txt that I get User Scripts to create once a week. I believe it was disk 12 that was disabled. Not sure if you can collaborate that with the diags? But I believe it to be accurate as it was a higher number disk I remember being disabled and the user scripts disk list was created 5 days ago.

 

So it looks like I need to get Unraid to believe that disk 12 is disabled again and emulate its contents from parity. I will buy a replacement disk and a second parity right away to stop this from happening again.

 

Thank you for your help.

Screenshot 2021-09-18 at 22.11.34.png

tower-diagnostics-20210918-2212.zip

Link to comment

According to that yes, it was disk12, disk itself looks OK, just a lot of CRC errors which suggest a cable problem, if you didn't write anything to the emulated disk once disk12 got disable you could just re-sync parity (after replacing SATA cables on disk12) but if you want to go back to how it was you can do this:

 

-Since you've already re-assigned all the disks, so just make sure all assignments are correct.
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk12
-Start array (in normal mode now), ideally the emulated disk will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk
-If the emulated disk mounts and contents look correct you can then rebuild to a new disk or the old one.
 

 

 

Link to comment
7 hours ago, JorgeB said:

According to that yes, it was disk12, disk itself looks OK, just a lot of CRC errors which suggest a cable problem, if you didn't write anything to the emulated disk once disk12 got disable you could just re-sync parity (after replacing SATA cables on disk12) but if you want to go back to how it was you can do this:

 

-Since you've already re-assigned all the disks, so just make sure all assignments are correct.
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk12
-Start array (in normal mode now), ideally the emulated disk will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk
-If the emulated disk mounts and contents look correct you can then rebuild to a new disk or the old one.
 

 

 

Thank you very much, it has worked perfectly! I did exactly what you said and it came back first time.

 

Thank you!!!

  • Like 1
Link to comment
  • lostwebb changed the title to SOLVED: Rebooted While Contents Emulated. Now Drives Assignments are Lost.
  • 3 months later...
On 9/19/2021 at 8:48 AM, JorgeB said:

According to that yes, it was disk12, disk itself looks OK, just a lot of CRC errors which suggest a cable problem, if you didn't write anything to the emulated disk once disk12 got disable you could just re-sync parity (after replacing SATA cables on disk12) but if you want to go back to how it was you can do this:

 

-Since you've already re-assigned all the disks, so just make sure all assignments are correct.
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk12
-Start array (in normal mode now), ideally the emulated disk will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk
-If the emulated disk mounts and contents look correct you can then rebuild to a new disk or the old one.
 

 

 

 

Hi again

 

I have just had the same happen again, had to wait for a disk to arrive, wrote contents to the emulated disks. Since the original issue I have added a second parity. I rebooted, lost the disk assignments, assigned all disks back, including the new disk to the missing slot, stated in "maintenance mode" with "parity is valid", stopped the array, unassigned the disk and started in normal mode and am getting unmountable error and I am unable to start a disk check in maintenance mode. Does this same fix work with dual parity or is there some other issue?

 

Thanks again for the help.

tower-diagnostics-20211230-2349.zip

Edited by lostwebb
Added diags
Link to comment

There are read errors in multiple disks:

 

Dec 31 11:41:40 Tower kernel: md: disk1 read error, sector=5860527944
..
Dec 31 11:41:40 Tower kernel: md: disk6 read error, sector=5860527944
..
Dec 31 11:41:40 Tower kernel: md: disk2 read error, sector=3911257848

 

Need to fix that first, looks like a connection/power problem.
 

Link to comment
4 hours ago, JorgeB said:

There are read errors in multiple disks:

 

Dec 31 11:41:40 Tower kernel: md: disk1 read error, sector=5860527944
..
Dec 31 11:41:40 Tower kernel: md: disk6 read error, sector=5860527944
..
Dec 31 11:41:40 Tower kernel: md: disk2 read error, sector=3911257848

 

Need to fix that first, looks like a connection/power problem.
 

Thanks

Screenshot 2021-12-31 at 17.39.17.png

tower-diagnostics-20211231-1739.zip

Link to comment
  • Solution
Dec 31 17:38:45 Tower kernel: attempt to access beyond end of device
Dec 31 17:38:45 Tower kernel: md12: rw=4096, want=7814037064, limit=7814034952
Dec 31 17:38:45 Tower kernel: XFS (md12): last sector read failed

 

This suggests you used a slightly smaller 4TB disk in this position, maybe this one:

 

Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68WT0N0
Serial Number:    WD-WCC4E1826444
User Capacity:    4,000,785,948,160 bytes [4.00 TB]

 

Notice that it's slightly smaller than the other ones:

 

User Capacity:    4,000,787,030,016 bytes [4.00 TB]

 

Maybe a shucked disk?

  • Thanks 1
Link to comment
Posted (edited)
6 hours ago, JorgeB said:
Dec 31 17:38:45 Tower kernel: attempt to access beyond end of device
Dec 31 17:38:45 Tower kernel: md12: rw=4096, want=7814037064, limit=7814034952
Dec 31 17:38:45 Tower kernel: XFS (md12): last sector read failed

 

This suggests you used a slightly smaller 4TB disk in this position, maybe this one:

 

Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68WT0N0
Serial Number:    WD-WCC4E1826444
User Capacity:    4,000,785,948,160 bytes [4.00 TB]

 

Notice that it's slightly smaller than the other ones:

 

User Capacity:    4,000,787,030,016 bytes [4.00 TB]

 

Maybe a shucked disk?

Ahh OK yes I just bought it off eBay so its probably shucked. I have a new one coming next week I'll replace it with that.

 

How can I get the array back up and running in the mean time? Do I need to add the failing disk back just to assign it and then un assign it?

 

Thanks and happy new year

 

I did the process with the failed disk and it has come back online fine. I have a new disk coming so I will replace the failed one with that and use the shucked one as a new drive in my backup Unraid. 

 

Thank you so much for the help as usual.

Edited by lostwebb
Resolved
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.