Jump to content

Unraid unable to boot - kernel panic, not syncing VFS, unable to mount root fs


Go to solution Solved by mattiapsu,

Recommended Posts

This topic is on the forum a few times, and most result in a new USB. However, my problems started last night and have cascaded to this. I wanted to get help before I start with the USB. Logs are attached and description of events below.

 

1) Last night, a few of my docker containers were not working, I thought it was due to bad update, and I was able to roll a couple back and they were working.

2) This morning at 7:42am - something happened and the server became unresponsive (in the logs)

3) Performed a bad shutdown - power button hold

4) I would normally start back up and see what I've got, but in prep for a new case and additional drives I opened up the case to take inventory, took some pics, closed back up and started up

5) Upon restart, Disk 1 had UDMA CRC errors and went offline - as this can be connection issue, I did a clean shutdown and made sure all the connections were good.

6) Put back up and started up, and now getting kernel panic, not syncing VFS, unable to mount root fs

 

Note: I know I do have an error on my cache drive, that I have not formatted, and I had to rebuild a VM when first errored, but has been fine since deleting the corrupted files. Haven't taken the time to move everything off and reformat to hopefully correct.

 

Any help is appreciated as I'm totally down now.

unraid-2.log

Link to comment
Posted (edited)

Ran the check, without -n, then with -L as instructed. Output is below. It appears that it completed successfully. However, when I start the array (non-maintenance), the drive is still showing unmountable. Again, much appreciated for the support.


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 5
        - agno = 2
        - agno = 4
        - agno = 6
        - agno = 7
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:976460) is ahead of log (1:2).
Format log to cycle 4.
done

 

Edited by mattiapsu
Link to comment
Posted (edited)

Still have the red x and unmountable message across Size-Used-Free.

Disk log:

Mar 29 14:27:07 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:27:11 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:27:12 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:27:13 OldMain kernel: ata5.00: configured for UDMA/133
Mar 29 14:32:07 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:32:11 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:32:12 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:32:13 OldMain kernel: ata5.00: configured for UDMA/133
Mar 29 14:35:15 OldMain emhttpd: read SMART /dev/sdf
Mar 29 14:37:36 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:37:40 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:37:41 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:37:42 OldMain kernel: ata5.00: configured for UDMA/133
Mar 29 14:37:47 OldMain emhttpd: WDC_WD80EMAZ-00WJTA0_7HKHX2BJ (sdf) 512 15628053168
Mar 29 14:37:47 OldMain kernel: mdcmd (2): import 1 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7HKHX2BJ
Mar 29 14:37:47 OldMain kernel: md: import disk1: (sdf) WDC_WD80EMAZ-00WJTA0_7HKHX2BJ size: 7814026532 
Mar 29 14:37:53 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:37:57 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:37:58 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:37:59 OldMain kernel: ata5.00: configured for UDMA/133
Mar 29 14:37:59 OldMain emhttpd: read SMART /dev/sdf
Mar 29 14:40:49 OldMain emhttpd: shcmd (1772): echo 128 > /sys/block/sdf/queue/nr_requests
Mar 29 14:49:20 OldMain emhttpd: read SMART /dev/sdf
Mar 29 14:49:22 OldMain emhttpd: WDC_WD80EMAZ-00WJTA0_7HKHX2BJ (sdf) 512 15628053168
Mar 29 14:49:22 OldMain kernel: mdcmd (2): import 1 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7HKHX2BJ
Mar 29 14:49:22 OldMain kernel: md: import disk1: (sdf) WDC_WD80EMAZ-00WJTA0_7HKHX2BJ size: 7814026532 
Mar 29 14:49:22 OldMain emhttpd: read SMART /dev/sdf
Mar 29 14:49:53 OldMain emhttpd: shcmd (1848): echo 128 > /sys/block/sdf/queue/nr_requests
Mar 29 14:54:23 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0)
Mar 29 14:54:27 OldMain kernel: ata5: COMRESET failed (errno=-16)
Mar 29 14:54:28 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 29 14:54:29 OldMain kernel: ata5.00: configured for UDMA/133

oldmain-diagnostics-20240329-1450.zip oldmain-syslog-20240329-1851.zip

Edited by mattiapsu
Link to comment
Posted (edited)

Thanks, I'll get those cables replaced hopefully soon. Probably when I move to a new case, not a lot of free time right now. Is parity at risk with those ATA errors?

 

The Disk is back up. I can read the emulated contents. It looks good from a file standpoint. There's a few files in the lost+found, not sure what those might be, but only a few, not critical for me.

 

Sorry, don't really know what next steps are.

 

I found your other posts on rebuilding the drive... I assume I'll just follow those steps.

oldmain-diagnostics-20240329-1516.zip oldmain-syslog-20240329-1917.zip

Edited by mattiapsu
added logs
Link to comment
13 hours ago, mattiapsu said:

Is parity at risk with those ATA errors?

It can create other issues, like disk1 needing to be repaired twice, I would really recommend replacing the cable before attempting a rebuild, also replace cable for disk1, since most likely it got disabled because of a cable problem.

 

 

 

 

 

 

Link to comment

I started the rebuild and got horrible speeds, so I stopped immediately. I replaced the parity cable and reconnected disk 1 at the drive and MB. I got about 25 ATA errors in the first minute, but nothing since and the speeds have averaged about 140MB/sec for the rebuild. Would the errors be any indication of drive integrity or just purely connection? I see some 'slow to respond' messages as well, but don't know if that's just my hardware, as I started this for fun, and never looked for top specs. 

 

I'll plan on replacing more cables when I move cases and add disks in the next month (or if more issues pop up, I'll do sooner).

 

Any other words of wisdom? Thanks for getting me this far.

syslog.txt oldmain-diagnostics-20240330-0907.zip

Link to comment
  • Solution

Quick update on this... I had another read error, so I took the server down, and I rebuilt into the new case. I replaced with as many new cables I had, and I'm still using the same power splitter off the molex power cable (I'm using some shucked drives). I'm back up and running, added 2 new drives, and added 1 to the array successfully. It's been up for 2 days with no issues so far.

 

So in the end, I had to:

1) reload my flash drive from backup

2) replace / reconnect SATA and power cables

3) rebuild an array drive

4) ran scrub on my cache ssd

5) delete and replace my docker and libvert image as these were corrupted along the way.

 

So far so good. Thanks for the help, @JorgeB. I'm sure I'll be back in the future.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...