Unraid unable to boot - kernel panic, not syncing VFS, unable to mount root fs

mattiapsu · March 29

This topic is on the forum a few times, and most result in a new USB. However, my problems started last night and have cascaded to this. I wanted to get help before I start with the USB. Logs are attached and description of events below.

1) Last night, a few of my docker containers were not working, I thought it was due to bad update, and I was able to roll a couple back and they were working.

2) This morning at 7:42am - something happened and the server became unresponsive (in the logs)

3) Performed a bad shutdown - power button hold

4) I would normally start back up and see what I've got, but in prep for a new case and additional drives I opened up the case to take inventory, took some pics, closed back up and started up

5) Upon restart, Disk 1 had UDMA CRC errors and went offline - as this can be connection issue, I did a clean shutdown and made sure all the connections were good.

6) Put back up and started up, and now getting kernel panic, not syncing VFS, unable to mount root fs

Note: I know I do have an error on my cache drive, that I have not formatted, and I had to rebuild a VM when first errored, but has been fine since deleting the corrupted files. Haven't taken the time to move everything off and reformat to hopefully correct.

Any help is appreciated as I'm totally down now.

unraid-2.log

JorgeB · March 29

First you need to fix the flash drive, you can try redoing it first, if the same replace it, then post new diags.

mattiapsu · March 29

Back up with a flash restore to the same USB. Disk 1 emulated. Logs and diagnostics attached. I did not attempt to start the array.

syslog.txt oldmain-diagnostics-20240329-1245.zip

JorgeB · March 29

Post new diags with the array started so we can see if the emulated disk1 is mounting.

mattiapsu · March 29

Looks like it's not mounting... unmountable / unsupported or no file system

oldmain-diagnostics-20240329-1307.zip oldmain-syslog-20240329-1708.zip

mattiapsu · March 29

And the SMART report on the drive.

oldmain-smart-20240329-1314.zip

JorgeB · March 29

Check filesystem on the emulated disk1, run it without -n, and if it asks for -L use it

mattiapsu · March 29

Ran the check, without -n, then with -L as instructed. Output is below. It appears that it completed successfully. However, when I start the array (non-maintenance), the drive is still showing unmountable. Again, much appreciated for the support.


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 5
        - agno = 2
        - agno = 4
        - agno = 6
        - agno = 7
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:976460) is ahead of log (1:2).
Format log to cycle 4.
done

Edited March 29 by mattiapsu

JorgeB · March 29

Are you sure? Post new diags after array start in normal mode.

mattiapsu · March 29

Still have the red x and unmountable message across Size-Used-Free.

Disk log:

Mar 29 14:27:07 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:27:11 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:27:12 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:27:13 OldMain kernel: ata5.00: configured for UDMA/133
Mar 29 14:32:07 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:32:11 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:32:12 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:32:13 OldMain kernel: ata5.00: configured for UDMA/133
Mar 29 14:35:15 OldMain emhttpd: read SMART /dev/sdf
Mar 29 14:37:36 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:37:40 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:37:41 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:37:42 OldMain kernel: ata5.00: configured for UDMA/133
Mar 29 14:37:47 OldMain emhttpd: WDC_WD80EMAZ-00WJTA0_7HKHX2BJ (sdf) 512 15628053168
Mar 29 14:37:47 OldMain kernel: mdcmd (2): import 1 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7HKHX2BJ
Mar 29 14:37:47 OldMain kernel: md: import disk1: (sdf) WDC_WD80EMAZ-00WJTA0_7HKHX2BJ size: 7814026532 
Mar 29 14:37:53 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:37:57 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:37:58 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:37:59 OldMain kernel: ata5.00: configured for UDMA/133
Mar 29 14:37:59 OldMain emhttpd: read SMART /dev/sdf
Mar 29 14:40:49 OldMain emhttpd: shcmd (1772): echo 128 > /sys/block/sdf/queue/nr_requests
Mar 29 14:49:20 OldMain emhttpd: read SMART /dev/sdf
Mar 29 14:49:22 OldMain emhttpd: WDC_WD80EMAZ-00WJTA0_7HKHX2BJ (sdf) 512 15628053168
Mar 29 14:49:22 OldMain kernel: mdcmd (2): import 1 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7HKHX2BJ
Mar 29 14:49:22 OldMain kernel: md: import disk1: (sdf) WDC_WD80EMAZ-00WJTA0_7HKHX2BJ size: 7814026532 
Mar 29 14:49:22 OldMain emhttpd: read SMART /dev/sdf
Mar 29 14:49:53 OldMain emhttpd: shcmd (1848): echo 128 > /sys/block/sdf/queue/nr_requests
Mar 29 14:54:23 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:54:27 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:54:28 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:54:29 OldMain kernel: ata5.00: configured for UDMA/133

oldmain-diagnostics-20240329-1450.zip oldmain-syslog-20240329-1851.zip

Edited March 29 by mattiapsu

JorgeB · March 29

Disabled is normal, but the fs is still corrupt, did you run xfs_repair from the GUI or manually?

mattiapsu · March 29

I ran through GUI... make sure I did it correctly. To repair, it's running Check without -n and in my case with the -L argument?

I can follow the article to run manually if that's next.

JorgeB · March 29

Just do it again then, first just without -n, use -L only if asked.

JorgeB · March 29

Forgot to mention, first make sure you check/replace cables for parity, to see if it fixes all those ATA errors.

mattiapsu · March 29

Thanks, I'll get those cables replaced hopefully soon. Probably when I move to a new case, not a lot of free time right now. Is parity at risk with those ATA errors?

The Disk is back up. I can read the emulated contents. It looks good from a file standpoint. There's a few files in the lost+found, not sure what those might be, but only a few, not critical for me.

Sorry, don't really know what next steps are.

I found your other posts on rebuilding the drive... I assume I'll just follow those steps.

oldmain-diagnostics-20240329-1516.zip oldmain-syslog-20240329-1917.zip

Edited March 29 by mattiapsu
added logs

JorgeB · March 30

13 hours ago, mattiapsu said:

Is parity at risk with those ATA errors?

It can create other issues, like disk1 needing to be repaired twice, I would really recommend replacing the cable before attempting a rebuild, also replace cable for disk1, since most likely it got disabled because of a cable problem.

mattiapsu · March 30

I started the rebuild and got horrible speeds, so I stopped immediately. I replaced the parity cable and reconnected disk 1 at the drive and MB. I got about 25 ATA errors in the first minute, but nothing since and the speeds have averaged about 140MB/sec for the rebuild. Would the errors be any indication of drive integrity or just purely connection? I see some 'slow to respond' messages as well, but don't know if that's just my hardware, as I started this for fun, and never looked for top specs.

I'll plan on replacing more cables when I move cases and add disks in the next month (or if more issues pop up, I'll do sooner).

Any other words of wisdom? Thanks for getting me this far.

syslog.txt oldmain-diagnostics-20240330-0907.zip

JorgeB · March 31

Still looks like a power/connection issue, besides the SATA cable, also check/replace the power cables

mattiapsu · April 6

Quick update on this... I had another read error, so I took the server down, and I rebuilt into the new case. I replaced with as many new cables I had, and I'm still using the same power splitter off the molex power cable (I'm using some shucked drives). I'm back up and running, added 2 new drives, and added 1 to the array successfully. It's been up for 2 days with no issues so far.

So in the end, I had to:

1) reload my flash drive from backup

2) replace / reconnect SATA and power cables

3) rebuild an array drive

4) ran scrub on my cache ssd

5) delete and replace my docker and libvert image as these were corrupted along the way.

So far so good. Thanks for the help, @JorgeB. I'm sure I'll be back in the future.

Unraid unable to boot - kernel panic, not syncing VFS, unable to mount root fs

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation