Jump to content

I thought I had it running. Now I am so lost I cannot get the array to start


Go to solution Solved by JorgeB,

Recommended Posts

Posted

 

 

I am an old guy (78) who has loved PCs for a long time.  For example, I had a $1900 Apple IIe in the mid-80s.  My dream was to get a NAS server running with a Plex/Usenet ecosystem of dockers to transfer, download, and run my TV_shows and Movies.  I bought a pro UNRAID license and built a modest array with some 2TB drives and a 3TB parity using instructions mostly from Spaceinvader.   Everything was working great for weeks.   Lots of shares were viewable by Plex and Windows. 

Then I got the bright idea of putting in a refurbished HGST 10TB parity drive as the first step in increasing the compacity of the array.   I now understand that there is an issue with replacing a parity drive with a drive significantly larger than the array's drives but I do not know how to address it.   A suggestion from this forum was to run a disk check which I did but it did not work.

Also, I could not get the HGST drive to be seen or start up without a special sata power cable that they supply.  Plus I needed to use an available HBA SAS/sata cable to get it to work because the mobo sata cable would not. 
That is when the problems started.  After replacing the 3TB drive with the 10TB and rechecking the cache, I started getting errors in the array.  I tried using info from this forum and Reddit to fix them for several days but I went down a rabbit hole.  I keep checking parity and the last reboot I did it told me: Too many wrong and/or missing disks!  One disk that UNRAID removed from the array after reboot showed up as an unassigned drive but UNRAID will not allow me to put it back into the array.   Before this, I lost all my exportable shares but I could still run about 60% of my content remotely via Plex.   This was even with even 3 disks showing UNMOUNTABLE.  Now with a dead array, I cannot do anything. 

The diag file is below.  

I would be so happy and grateful to get my array and exportable shares back without error and to retain at least some of my media files.  I can replace missing content if I have a clean array that is working with my shares. 

I cannot tell you how important this server is to me personally but I know I am out of my depth in trying to fix it myself.

I am quite new to UNRAID so please be very specific in telling me what I might try.

Even though I have hundreds of hours invested,  I will consider starting from scratch but before I go down that road, I want to know what happened and how I can prevent it in the future.

Many thanks to anyone reading this who might be able to help.

Warm regards,

Mike

plexnas-diagnostics-20240921-1332.zip

Posted

UPDATE:  A few minutes after I posted above.  I looked at the SMART info on the drive that Unraid kicked out of the Array.  After it kicked it out, it put it in unassigned drives and would not allow it to be assigned back into the array.   When I check the SMART download it was for an entirely different drive, a 120gb SSD that I put into the server along with 3 others as unassigned.  I then took ALL of them out and rebooted.

The server and array came up!  The array came up with errors, but it came up.  See screenshot.   Disk 3 is disabled with contents emulated. Three of the disks including 3 say they are unmountable, unsupported or no file system.  All the shares came up with a warning that array shares were unprotected.  Also, these is notice that there are no exportable disk shares.   Plex is working again with about 60 percent of the previous content available.  I am redoing and reattaching the diagnostics

As I said above, if I can get the array and shares to show up without negative messages or errors, I will be eternally grateful and will name my next child after you.

 

2024-09-21 (2).png

plexnas-smart-20240921-1450.zip

  • Solution
Posted

There are constant ATA errors spamming the log, replace cables for this disk, and post new diags after array start:

 

Device Model:     WDC WD20EFAX-68B2RN1
Serial Number:    WD-WXW2AB23AURD

 

Posted

Many thanks for getting back to me.  I switched out the sata connector with another one that was attached to the mobo.   I also have some HBA card SAS/sata connectors available to try depending on what the diagnostics tell you.   Can you please tell me for my education, what part of the diagnostics log you saw the constant ATA errors?  I checked all the sata connects on the array drives (on the 5 array drives - 4 are SAS/sata and one (the one you flagged WD-WXW2AB23AURD) was connected via mobo sata.  I thought it was OK to mix and match mobo sata and HBA SAS/sata in the array.  Is that right?

I looked at the tightness of the sata and power connector to the array and all seem to be OK.  I checked 4 of the 6 sata connectors to the mobo and they seem OK.  To get to the other 2 sata connectors on the mobo I would have to take the large GPU out of the box which I can if you think it is necessary.

I am attaching the diagnostics; the array screenshot remains unchanged from my first post.

Thank again,

Mike

 

plexnas-diagnostics-20240923-1141.zip

Posted
1 minute ago, demanding-chief3698 said:

Can you please tell me for my education, what part of the diagnostics log you saw the constant ATA errors?

In the syslog, they are logged like this:

 

Sep 21 13:02:39 PlexNAS kernel: ata4.00: status: { DRDY DF ERR }
Sep 21 13:02:39 PlexNAS kernel: ata4.00: error: { ABRT }
Sep 21 13:02:39 PlexNAS kernel: ata4.00: configured for UDMA/133
Sep 21 13:02:39 PlexNAS kernel: ata4: EH complete
Sep 21 13:02:39 PlexNAS kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep 21 13:02:39 PlexNAS kernel: ata4.00: irq_stat 0x40000001
Sep 21 13:02:39 PlexNAS kernel: ata4.00: failed command: READ DMA
Sep 21 13:02:39 PlexNAS kernel: ata4.00: cmd c8/00:08:00:b0:0b/00:00:00:00:00/e0 tag 18 dma 4096 in
Sep 21 13:02:39 PlexNAS kernel:         res 61/04:08:00:b0:0b/00:00:00:00:00/e0 Emask 0x1 (device error)
Sep 21 13:02:39 PlexNAS kernel: ata4.00: status: { DRDY DF ERR }
Sep 21 13:02:39 PlexNAS kernel: ata4.00: error: { ABRT }
Sep 21 13:02:40 PlexNAS kernel: ata4.00: configured for UDMA/133
Sep 21 13:02:40 PlexNAS kernel: ata4: EH complete
Sep 21 13:02:40 PlexNAS kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep 21 13:02:40 PlexNAS kernel: ata4.00: irq_stat 0x40000001
Sep 21 13:02:40 PlexNAS kernel: ata4.00: failed command: READ DMA
Sep 21 13:02:40 PlexNAS kernel: ata4.00: cmd c8/00:08:00:b0:0b/00:00:00:00:00/e0 tag 31 dma 4096 in
Sep 21 13:02:40 PlexNAS kernel:         res 61/04:08:00:b0:0b/00:00:00:00:00/e0 Emask 0x1 (device error)

 

No errors so far in the latest diags, check filesystem for the 3 unmountable disks, run it without -n

Posted

I ran the check filesystem for the 3 drives.  The results are below.   Looks to me like Disk 4 is OK and 2 and 3 need attention.

 

I do not mind losing content,  I can replace it, as long as the array is running without error and the shares are protected and exportable. 

 

I think I understand the instructions but I would very much like your advice on exactly what I should do next.   At this point I do not understand how to mount an "unmountable" disk. 

 

Again, many thanks for your guidance, it is most appreciated

 

Disk 2: 

Phase 1 - find and verify superblock...

Phase 2 - using internal log

        - zero log...

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed.  Mount the filesystem to replay the log, and unmount it before

re-running xfs_repair.  If you are unable to mount the filesystem, then use

the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.

 

Disk 3:

Phase 1 - find and verify superblock...

Phase 2 - using internal log

        - zero log...

Log inconsistent (didn't find previous header)

empty log check failed

zero_log: cannot find log head/tail (xlog_find_tail=5)

ERROR: The log head and/or tail cannot be discovered. Attempt to mount the

filesystem to replay the log or use the -L option to destroy the log and

attempt a repair.

 

Disk 4:

Phase 1 - find and verify superblock...

Phase 2 - using internal log

        - zero log...

totally zeroed log

        - scan filesystem freespace and inode maps...

clearing needsrepair flag and regenerating metadata

        - found root inode chunk

Phase 3 - for each AG...

        - scan and clear agi unlinked lists...

        - process known inodes and perform inode discovery...

        - agno = 0

        - agno = 1

        - agno = 2

        - agno = 3

        - process newly discovered inodes...

Phase 4 - check for duplicate blocks...

        - setting up duplicate extent list...

        - check for inodes claiming duplicate blocks...

        - agno = 0

        - agno = 1

        - agno = 2

        - agno = 3

Phase 5 - rebuild AG headers and trees...

        - reset superblock...

Phase 6 - check inode connectivity...

        - resetting contents of realtime bitmap and summary inodes

        - traversing filesystem ...

        - traversal finished ...

        - moving disconnected inodes to lost+found ...

Phase 7 - verify and correct link counts...

Maximum metadata LSN (4:35330) is ahead of log (0:0).

Format log to cycle 7.

done

 

Posted

Dear JorgeB, I did the following:

 

stopped the array and restarted in maintenance

ran disk check on disk 2 -L

stopped array

started array

Disk 2 came back!

stopped the array and restarted in maintenance

ran disk check on disk 3   -L

That produced thousands of lines of messages

stopped array

started array

Disk 3 is still disabled and shows up as "Unmountable. Unsupported or no file system". 

under Array Operation disk 3  is listed as Unmountable and I am offered the option of formatting it with the note:  Format will create a file system in all Unmountable disks.

 

Is this reformating my next move or is there something else I should be doing like rerunning disk check without -n or -L?

 

The latest diag attached.

plexnas-diagnostics-20240923-1451.zip

Posted

I went back and ran it again 3 times without -n just to be sure.   The file I sent you earlier is the correct output of this process.  It does look like it was cut-off but I do not know what else to do to make it go (or show) completion.  The last lines in this output are:  

Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ..
.

 

Text file xfs-repair status .txt

Posted

JorgeB,

 

I found the following advice online as one way to get a disabled/emulated drive going again.  See below:

 

"Stop the array, unassign the disk, start the array in maintenance mode, stop it again, and reassign the drive to the same slot. The idea is to start the array temporarily with the drive “missing” so it changes from “disabled” to “emulated” status, then to stop it and “replace” the drive to get it back to “active” status."

 

My array shares are back and seem to be complete.  There is a note that none of the shares are exportable.  I assume because of the disabled drive.

Do you think the above approach is advisable?

Thank you for all your help.  My array has come a long way back toward normal.

Posted
10 hours ago, demanding-chief3698 said:

I found the following advice online as one way to get a disabled/emulated drive going again.  See below:

You shouldn't have done that with an unmountable disk, post current diags.

Posted

JorgeB,  many thanks for hanging in there with me.

Please forgive me for being thick; I do not want to make any more stupid mistakes.

From what you said I plan to do the following:

  • stop the array
  • unassign disk 3 by clicking "no device"
  • go to actions in the unassigned disk plugin and try to mount it if it shows up on unassigned drives

Is the above sequence of actions correct?

Again, many thanks, Mike

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...