I thought I had it running. Now I am so lost I cannot get the array to start

September 21, 20241 yr

I am an old guy (78) who has loved PCs for a long time. For example, I had a $1900 Apple IIe in the mid-80s. My dream was to get a NAS server running with a Plex/Usenet ecosystem of dockers to transfer, download, and run my TV_shows and Movies. I bought a pro UNRAID license and built a modest array with some 2TB drives and a 3TB parity using instructions mostly from Spaceinvader. Everything was working great for weeks. Lots of shares were viewable by Plex and Windows.

Then I got the bright idea of putting in a refurbished HGST 10TB parity drive as the first step in increasing the compacity of the array. I now understand that there is an issue with replacing a parity drive with a drive significantly larger than the array's drives but I do not know how to address it. A suggestion from this forum was to run a disk check which I did but it did not work.

Also, I could not get the HGST drive to be seen or start up without a special sata power cable that they supply. Plus I needed to use an available HBA SAS/sata cable to get it to work because the mobo sata cable would not.
That is when the problems started. After replacing the 3TB drive with the 10TB and rechecking the cache, I started getting errors in the array. I tried using info from this forum and Reddit to fix them for several days but I went down a rabbit hole. I keep checking parity and the last reboot I did it told me: Too many wrong and/or missing disks! One disk that UNRAID removed from the array after reboot showed up as an unassigned drive but UNRAID will not allow me to put it back into the array. Before this, I lost all my exportable shares but I could still run about 60% of my content remotely via Plex. This was even with even 3 disks showing UNMOUNTABLE. Now with a dead array, I cannot do anything.

The diag file is below.

I would be so happy and grateful to get my array and exportable shares back without error and to retain at least some of my media files. I can replace missing content if I have a clean array that is working with my shares.

I cannot tell you how important this server is to me personally but I know I am out of my depth in trying to fix it myself.

I am quite new to UNRAID so please be very specific in telling me what I might try.

Even though I have hundreds of hours invested, I will consider starting from scratch but before I go down that road, I want to know what happened and how I can prevent it in the future.

Many thanks to anyone reading this who might be able to help.

Warm regards,

Mike

plexnas-diagnostics-20240921-1332.zip

Quote

September 21, 20241 yr

Author

UPDATE: A few minutes after I posted above. I looked at the SMART info on the drive that Unraid kicked out of the Array. After it kicked it out, it put it in unassigned drives and would not allow it to be assigned back into the array. When I check the SMART download it was for an entirely different drive, a 120gb SSD that I put into the server along with 3 others as unassigned. I then took ALL of them out and rebooted.

The server and array came up! The array came up with errors, but it came up. See screenshot. Disk 3 is disabled with contents emulated. Three of the disks including 3 say they are unmountable, unsupported or no file system. All the shares came up with a warning that array shares were unprotected. Also, these is notice that there are no exportable disk shares. Plex is working again with about 60 percent of the previous content available. I am redoing and reattaching the diagnostics.

As I said above, if I can get the array and shares to show up without negative messages or errors, I will be eternally grateful and will name my next child after you.

plexnas-smart-20240921-1450.zip

Quote

September 22, 20241 yr

Community Expert
Solution

There are constant ATA errors spamming the log, replace cables for this disk, and post new diags after array start:

Device Model:     WDC WD20EFAX-68B2RN1
Serial Number:    WD-WXW2AB23AURD

Quote

September 22, 20241 yr

Community Expert

And make sure that you double check that all SATA data and power connectors are firmly seated after you do what @JorgeB suggested. (The SATA connector design is the poster child for how not to design any connector system!)

Quote

September 23, 20241 yr

Author

Many thanks for getting back to me. I switched out the sata connector with another one that was attached to the mobo. I also have some HBA card SAS/sata connectors available to try depending on what the diagnostics tell you. Can you please tell me for my education, what part of the diagnostics log you saw the constant ATA errors? I checked all the sata connects on the array drives (on the 5 array drives - 4 are SAS/sata and one (the one you flagged WD-WXW2AB23AURD) was connected via mobo sata. I thought it was OK to mix and match mobo sata and HBA SAS/sata in the array. Is that right?

I looked at the tightness of the sata and power connector to the array and all seem to be OK. I checked 4 of the 6 sata connectors to the mobo and they seem OK. To get to the other 2 sata connectors on the mobo I would have to take the large GPU out of the box which I can if you think it is necessary.

I am attaching the diagnostics; the array screenshot remains unchanged from my first post.

Thank again,

Mike

plexnas-diagnostics-20240923-1141.zip

Quote

September 23, 20241 yr

Community Expert

1 minute ago, demanding-chief3698 said:

Can you please tell me for my education, what part of the diagnostics log you saw the constant ATA errors?

In the syslog, they are logged like this:

Sep 21 13:02:39 PlexNAS kernel: ata4.00: status: { DRDY DF ERR }
Sep 21 13:02:39 PlexNAS kernel: ata4.00: error: { ABRT }
Sep 21 13:02:39 PlexNAS kernel: ata4.00: configured for UDMA/133
Sep 21 13:02:39 PlexNAS kernel: ata4: EH complete
Sep 21 13:02:39 PlexNAS kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep 21 13:02:39 PlexNAS kernel: ata4.00: irq_stat 0x40000001
Sep 21 13:02:39 PlexNAS kernel: ata4.00: failed command: READ DMA
Sep 21 13:02:39 PlexNAS kernel: ata4.00: cmd c8/00:08:00:b0:0b/00:00:00:00:00/e0 tag 18 dma 4096 in
Sep 21 13:02:39 PlexNAS kernel:         res 61/04:08:00:b0:0b/00:00:00:00:00/e0 Emask 0x1 (device error)
Sep 21 13:02:39 PlexNAS kernel: ata4.00: status: { DRDY DF ERR }
Sep 21 13:02:39 PlexNAS kernel: ata4.00: error: { ABRT }
Sep 21 13:02:40 PlexNAS kernel: ata4.00: configured for UDMA/133
Sep 21 13:02:40 PlexNAS kernel: ata4: EH complete
Sep 21 13:02:40 PlexNAS kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep 21 13:02:40 PlexNAS kernel: ata4.00: irq_stat 0x40000001
Sep 21 13:02:40 PlexNAS kernel: ata4.00: failed command: READ DMA
Sep 21 13:02:40 PlexNAS kernel: ata4.00: cmd c8/00:08:00:b0:0b/00:00:00:00:00/e0 tag 31 dma 4096 in
Sep 21 13:02:40 PlexNAS kernel:         res 61/04:08:00:b0:0b/00:00:00:00:00/e0 Emask 0x1 (device error)

No errors so far in the latest diags, check filesystem for the 3 unmountable disks, run it without -n

Quote

September 23, 20241 yr

Author

I ran the check filesystem for the 3 drives. The results are below. Looks to me like Disk 4 is OK and 2 and 3 need attention.

I do not mind losing content, I can replace it, as long as the array is running without error and the shares are protected and exportable.

I think I understand the instructions but I would very much like your advice on exactly what I should do next. At this point I do not understand how to mount an "unmountable" disk.

Again, many thanks for your guidance, it is most appreciated

Disk 2:

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed. Mount the filesystem to replay the log, and unmount it before

re-running xfs_repair. If you are unable to mount the filesystem, then use

the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.

Disk 3:

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

Log inconsistent (didn't find previous header)

empty log check failed

zero_log: cannot find log head/tail (xlog_find_tail=5)

ERROR: The log head and/or tail cannot be discovered. Attempt to mount the

filesystem to replay the log or use the -L option to destroy the log and

attempt a repair.

Disk 4:

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

totally zeroed log

- scan filesystem freespace and inode maps...

clearing needsrepair flag and regenerating metadata

- found root inode chunk

Phase 3 - for each AG...

- scan and clear agi unlinked lists...

- process known inodes and perform inode discovery...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- process newly discovered inodes...

Phase 4 - check for duplicate blocks...

- setting up duplicate extent list...

- check for inodes claiming duplicate blocks...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

Phase 5 - rebuild AG headers and trees...

- reset superblock...

Phase 6 - check inode connectivity...

- resetting contents of realtime bitmap and summary inodes

- traversing filesystem ...

- traversal finished ...

- moving disconnected inodes to lost+found ...

Phase 7 - verify and correct link counts...

Maximum metadata LSN (4:35330) is ahead of log (0:0).

Format log to cycle 7.

done

Quote

September 23, 20241 yr

Community Expert

Use -L for disks 2 and 3

Quote

September 23, 20241 yr

Author

Thank you JorgeB. Sorry to be so thick but should I apply -L to both at the same time or one at a time? If one at a time, should 3 go first since it is disabled?

Quote

September 23, 20241 yr

Community Expert

Run it one at a time, or it can be slower, order doesn't matter.

Quote

September 23, 20241 yr

Author

Dear JorgeB, I did the following:

stopped the array and restarted in maintenance

ran disk check on disk 2 -L

stopped array

started array

Disk 2 came back!

stopped the array and restarted in maintenance

ran disk check on disk 3 -L

That produced thousands of lines of messages

stopped array

started array

Disk 3 is still disabled and shows up as "Unmountable. Unsupported or no file system".

under Array Operation disk 3 is listed as Unmountable and I am offered the option of formatting it with the note: Format will create a file system in all Unmountable disks.

Is this reformating my next move or is there something else I should be doing like rerunning disk check without -n or -L?

The latest diag attached.

plexnas-diagnostics-20240923-1451.zip

Quote

September 24, 20241 yr

Community Expert

Don't format, that will delete all data from that disk, run xfs_repair again and post the full output you get.

Quote

September 24, 20241 yr

Author

Dear JorgeB:

I ran xfs_repair on disk 3 with -n, producing thousands of lines of text. It took about 5 minutes to highlight the text for copying. A 1.1 MB file with this output is attached.

Again, many thanks for all your help.

Regards,

Mike

disk repair text.docx

Quote

September 24, 20241 yr

Community Expert

With -n it doesn't do anything, run it again without -n, and attach a txt file instead please.

Quote

September 24, 20241 yr

Author

Reran the repair without -n. The resulting 1.2 mb text file is attached.

Restarted array and drive 3 is still disabled/emulated.

Text file xfs-repair status .txt

Quote

September 24, 20241 yr

Community Expert

The end is cut-off, did it finish the repair?

Quote

September 24, 20241 yr

Author

Sorry about that. disk repair complete.txtNot sure what happened. I converted it from MS Word to text and it seemed to lose something. Attached is a second attempt at making a txt file plus the original MS Word file. Hope it comes through..

disk repair doc.docx

Quote

September 24, 20241 yr

Community Expert

That was with -n

Quote

September 24, 20241 yr

Author

Sorry about that I will go back and do it again so you can see it tomorrow...

Quote

September 24, 20241 yr

Author

I went back and ran it again 3 times without -n just to be sure. The file I sent you earlier is the correct output of this process. It does look like it was cut-off but I do not know what else to do to make it go (or show) completion. The last lines in this output are:

Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...

Text file xfs-repair status .txt

Quote

September 24, 20241 yr

Author

JorgeB,

I found the following advice online as one way to get a disabled/emulated drive going again. See below:

"Stop the array, unassign the disk, start the array in maintenance mode, stop it again, and reassign the drive to the same slot. The idea is to start the array temporarily with the drive “missing” so it changes from “disabled” to “emulated” status, then to stop it and “replace” the drive to get it back to “active” status."

My array shares are back and seem to be complete. There is a note that none of the shares are exportable. I assume because of the disabled drive.

Do you think the above approach is advisable?

Thank you for all your help. My array has come a long way back toward normal.

Quote

September 25, 20241 yr

Community Expert

10 hours ago, demanding-chief3698 said:

I found the following advice online as one way to get a disabled/emulated drive going again. See below:

You shouldn't have done that with an unmountable disk, post current diags.

Quote

September 25, 20241 yr

Author

I did not do it. I was asking if I could. The disk is still disabled and indicates it is unmountable. The last xfs status output is a few posts back. I do not think it finished. current diags below:

plexnas-diagnostics-20240925-0810.zip

Quote

September 25, 20241 yr

Community Expert

1 hour ago, demanding-chief3698 said:

I did not do it.

Good, I misunderstood, if xfs_repair cannot repair the emulated disk, unassign disk3 and see if it the actual disk mounts with the UD plugin

Quote

September 25, 20241 yr

Author

JorgeB, many thanks for hanging in there with me.

Please forgive me for being thick; I do not want to make any more stupid mistakes.

From what you said I plan to do the following:

stop the array
unassign disk 3 by clicking "no device"
go to actions in the unassigned disk plugin and try to mount it if it shows up on unassigned drives

Is the above sequence of actions correct?

Again, many thanks, Mike

Quote

I thought I had it running. Now I am so lost I cannot get the array to start

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)