demanding-chief3698 Posted September 21, 2024 Posted September 21, 2024 I am an old guy (78) who has loved PCs for a long time. For example, I had a $1900 Apple IIe in the mid-80s. My dream was to get a NAS server running with a Plex/Usenet ecosystem of dockers to transfer, download, and run my TV_shows and Movies. I bought a pro UNRAID license and built a modest array with some 2TB drives and a 3TB parity using instructions mostly from Spaceinvader. Everything was working great for weeks. Lots of shares were viewable by Plex and Windows. Then I got the bright idea of putting in a refurbished HGST 10TB parity drive as the first step in increasing the compacity of the array. I now understand that there is an issue with replacing a parity drive with a drive significantly larger than the array's drives but I do not know how to address it. A suggestion from this forum was to run a disk check which I did but it did not work. Also, I could not get the HGST drive to be seen or start up without a special sata power cable that they supply. Plus I needed to use an available HBA SAS/sata cable to get it to work because the mobo sata cable would not. That is when the problems started. After replacing the 3TB drive with the 10TB and rechecking the cache, I started getting errors in the array. I tried using info from this forum and Reddit to fix them for several days but I went down a rabbit hole. I keep checking parity and the last reboot I did it told me: Too many wrong and/or missing disks! One disk that UNRAID removed from the array after reboot showed up as an unassigned drive but UNRAID will not allow me to put it back into the array. Before this, I lost all my exportable shares but I could still run about 60% of my content remotely via Plex. This was even with even 3 disks showing UNMOUNTABLE. Now with a dead array, I cannot do anything. The diag file is below. I would be so happy and grateful to get my array and exportable shares back without error and to retain at least some of my media files. I can replace missing content if I have a clean array that is working with my shares. I cannot tell you how important this server is to me personally but I know I am out of my depth in trying to fix it myself. I am quite new to UNRAID so please be very specific in telling me what I might try. Even though I have hundreds of hours invested, I will consider starting from scratch but before I go down that road, I want to know what happened and how I can prevent it in the future. Many thanks to anyone reading this who might be able to help. Warm regards, Mike plexnas-diagnostics-20240921-1332.zip Quote
demanding-chief3698 Posted September 21, 2024 Author Posted September 21, 2024 UPDATE: A few minutes after I posted above. I looked at the SMART info on the drive that Unraid kicked out of the Array. After it kicked it out, it put it in unassigned drives and would not allow it to be assigned back into the array. When I check the SMART download it was for an entirely different drive, a 120gb SSD that I put into the server along with 3 others as unassigned. I then took ALL of them out and rebooted. The server and array came up! The array came up with errors, but it came up. See screenshot. Disk 3 is disabled with contents emulated. Three of the disks including 3 say they are unmountable, unsupported or no file system. All the shares came up with a warning that array shares were unprotected. Also, these is notice that there are no exportable disk shares. Plex is working again with about 60 percent of the previous content available. I am redoing and reattaching the diagnostics. As I said above, if I can get the array and shares to show up without negative messages or errors, I will be eternally grateful and will name my next child after you. plexnas-smart-20240921-1450.zip Quote
Solution JorgeB Posted September 22, 2024 Solution Posted September 22, 2024 There are constant ATA errors spamming the log, replace cables for this disk, and post new diags after array start: Device Model: WDC WD20EFAX-68B2RN1 Serial Number: WD-WXW2AB23AURD Quote
Frank1940 Posted September 22, 2024 Posted September 22, 2024 And make sure that you double check that all SATA data and power connectors are firmly seated after you do what @JorgeB suggested. (The SATA connector design is the poster child for how not to design any connector system!) Quote
demanding-chief3698 Posted September 23, 2024 Author Posted September 23, 2024 Many thanks for getting back to me. I switched out the sata connector with another one that was attached to the mobo. I also have some HBA card SAS/sata connectors available to try depending on what the diagnostics tell you. Can you please tell me for my education, what part of the diagnostics log you saw the constant ATA errors? I checked all the sata connects on the array drives (on the 5 array drives - 4 are SAS/sata and one (the one you flagged WD-WXW2AB23AURD) was connected via mobo sata. I thought it was OK to mix and match mobo sata and HBA SAS/sata in the array. Is that right? I looked at the tightness of the sata and power connector to the array and all seem to be OK. I checked 4 of the 6 sata connectors to the mobo and they seem OK. To get to the other 2 sata connectors on the mobo I would have to take the large GPU out of the box which I can if you think it is necessary. I am attaching the diagnostics; the array screenshot remains unchanged from my first post. Thank again, Mike plexnas-diagnostics-20240923-1141.zip Quote
JorgeB Posted September 23, 2024 Posted September 23, 2024 1 minute ago, demanding-chief3698 said: Can you please tell me for my education, what part of the diagnostics log you saw the constant ATA errors? In the syslog, they are logged like this: Sep 21 13:02:39 PlexNAS kernel: ata4.00: status: { DRDY DF ERR } Sep 21 13:02:39 PlexNAS kernel: ata4.00: error: { ABRT } Sep 21 13:02:39 PlexNAS kernel: ata4.00: configured for UDMA/133 Sep 21 13:02:39 PlexNAS kernel: ata4: EH complete Sep 21 13:02:39 PlexNAS kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 21 13:02:39 PlexNAS kernel: ata4.00: irq_stat 0x40000001 Sep 21 13:02:39 PlexNAS kernel: ata4.00: failed command: READ DMA Sep 21 13:02:39 PlexNAS kernel: ata4.00: cmd c8/00:08:00:b0:0b/00:00:00:00:00/e0 tag 18 dma 4096 in Sep 21 13:02:39 PlexNAS kernel: res 61/04:08:00:b0:0b/00:00:00:00:00/e0 Emask 0x1 (device error) Sep 21 13:02:39 PlexNAS kernel: ata4.00: status: { DRDY DF ERR } Sep 21 13:02:39 PlexNAS kernel: ata4.00: error: { ABRT } Sep 21 13:02:40 PlexNAS kernel: ata4.00: configured for UDMA/133 Sep 21 13:02:40 PlexNAS kernel: ata4: EH complete Sep 21 13:02:40 PlexNAS kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 21 13:02:40 PlexNAS kernel: ata4.00: irq_stat 0x40000001 Sep 21 13:02:40 PlexNAS kernel: ata4.00: failed command: READ DMA Sep 21 13:02:40 PlexNAS kernel: ata4.00: cmd c8/00:08:00:b0:0b/00:00:00:00:00/e0 tag 31 dma 4096 in Sep 21 13:02:40 PlexNAS kernel: res 61/04:08:00:b0:0b/00:00:00:00:00/e0 Emask 0x1 (device error) No errors so far in the latest diags, check filesystem for the 3 unmountable disks, run it without -n Quote
demanding-chief3698 Posted September 23, 2024 Author Posted September 23, 2024 I ran the check filesystem for the 3 drives. The results are below. Looks to me like Disk 4 is OK and 2 and 3 need attention. I do not mind losing content, I can replace it, as long as the array is running without error and the shares are protected and exportable. I think I understand the instructions but I would very much like your advice on exactly what I should do next. At this point I do not understand how to mount an "unmountable" disk. Again, many thanks for your guidance, it is most appreciated Disk 2: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Disk 3: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... Log inconsistent (didn't find previous header) empty log check failed zero_log: cannot find log head/tail (xlog_find_tail=5) ERROR: The log head and/or tail cannot be discovered. Attempt to mount the filesystem to replay the log or use the -L option to destroy the log and attempt a repair. Disk 4: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... totally zeroed log - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (4:35330) is ahead of log (0:0). Format log to cycle 7. done Quote
demanding-chief3698 Posted September 23, 2024 Author Posted September 23, 2024 Thank you JorgeB. Sorry to be so thick but should I apply -L to both at the same time or one at a time? If one at a time, should 3 go first since it is disabled? Quote
JorgeB Posted September 23, 2024 Posted September 23, 2024 Run it one at a time, or it can be slower, order doesn't matter. Quote
demanding-chief3698 Posted September 23, 2024 Author Posted September 23, 2024 Dear JorgeB, I did the following: stopped the array and restarted in maintenance ran disk check on disk 2 -L stopped array started array Disk 2 came back! stopped the array and restarted in maintenance ran disk check on disk 3 -L That produced thousands of lines of messages stopped array started array Disk 3 is still disabled and shows up as "Unmountable. Unsupported or no file system". under Array Operation disk 3 is listed as Unmountable and I am offered the option of formatting it with the note: Format will create a file system in all Unmountable disks. Is this reformating my next move or is there something else I should be doing like rerunning disk check without -n or -L? The latest diag attached. plexnas-diagnostics-20240923-1451.zip Quote
JorgeB Posted September 24, 2024 Posted September 24, 2024 Don't format, that will delete all data from that disk, run xfs_repair again and post the full output you get. Quote
demanding-chief3698 Posted September 24, 2024 Author Posted September 24, 2024 Dear JorgeB: I ran xfs_repair on disk 3 with -n, producing thousands of lines of text. It took about 5 minutes to highlight the text for copying. A 1.1 MB file with this output is attached. Again, many thanks for all your help. Regards, Mike disk repair text.docx Quote
JorgeB Posted September 24, 2024 Posted September 24, 2024 With -n it doesn't do anything, run it again without -n, and attach a txt file instead please. Quote
demanding-chief3698 Posted September 24, 2024 Author Posted September 24, 2024 Reran the repair without -n. The resulting 1.2 mb text file is attached. Restarted array and drive 3 is still disabled/emulated. Text file xfs-repair status .txt Quote
JorgeB Posted September 24, 2024 Posted September 24, 2024 The end is cut-off, did it finish the repair? Quote
demanding-chief3698 Posted September 24, 2024 Author Posted September 24, 2024 Sorry about that. disk repair complete.txtNot sure what happened. I converted it from MS Word to text and it seemed to lose something. Attached is a second attempt at making a txt file plus the original MS Word file. Hope it comes through.. disk repair doc.docx Quote
demanding-chief3698 Posted September 24, 2024 Author Posted September 24, 2024 Sorry about that I will go back and do it again so you can see it tomorrow... Quote
demanding-chief3698 Posted September 24, 2024 Author Posted September 24, 2024 I went back and ran it again 3 times without -n just to be sure. The file I sent you earlier is the correct output of this process. It does look like it was cut-off but I do not know what else to do to make it go (or show) completion. The last lines in this output are: Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... Text file xfs-repair status .txt Quote
demanding-chief3698 Posted September 24, 2024 Author Posted September 24, 2024 JorgeB, I found the following advice online as one way to get a disabled/emulated drive going again. See below: "Stop the array, unassign the disk, start the array in maintenance mode, stop it again, and reassign the drive to the same slot. The idea is to start the array temporarily with the drive “missing” so it changes from “disabled” to “emulated” status, then to stop it and “replace” the drive to get it back to “active” status." My array shares are back and seem to be complete. There is a note that none of the shares are exportable. I assume because of the disabled drive. Do you think the above approach is advisable? Thank you for all your help. My array has come a long way back toward normal. Quote
JorgeB Posted September 25, 2024 Posted September 25, 2024 10 hours ago, demanding-chief3698 said: I found the following advice online as one way to get a disabled/emulated drive going again. See below: You shouldn't have done that with an unmountable disk, post current diags. Quote
demanding-chief3698 Posted September 25, 2024 Author Posted September 25, 2024 I did not do it. I was asking if I could. The disk is still disabled and indicates it is unmountable. The last xfs status output is a few posts back. I do not think it finished. current diags below: plexnas-diagnostics-20240925-0810.zip Quote
JorgeB Posted September 25, 2024 Posted September 25, 2024 1 hour ago, demanding-chief3698 said: I did not do it. Good, I misunderstood, if xfs_repair cannot repair the emulated disk, unassign disk3 and see if it the actual disk mounts with the UD plugin Quote
demanding-chief3698 Posted September 25, 2024 Author Posted September 25, 2024 JorgeB, many thanks for hanging in there with me. Please forgive me for being thick; I do not want to make any more stupid mistakes. From what you said I plan to do the following: stop the array unassign disk 3 by clicking "no device" go to actions in the unassigned disk plugin and try to mount it if it shows up on unassigned drives Is the above sequence of actions correct? Again, many thanks, Mike Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.