mattiapsu Posted March 29 Share Posted March 29 This topic is on the forum a few times, and most result in a new USB. However, my problems started last night and have cascaded to this. I wanted to get help before I start with the USB. Logs are attached and description of events below. 1) Last night, a few of my docker containers were not working, I thought it was due to bad update, and I was able to roll a couple back and they were working. 2) This morning at 7:42am - something happened and the server became unresponsive (in the logs) 3) Performed a bad shutdown - power button hold 4) I would normally start back up and see what I've got, but in prep for a new case and additional drives I opened up the case to take inventory, took some pics, closed back up and started up 5) Upon restart, Disk 1 had UDMA CRC errors and went offline - as this can be connection issue, I did a clean shutdown and made sure all the connections were good. 6) Put back up and started up, and now getting kernel panic, not syncing VFS, unable to mount root fs Note: I know I do have an error on my cache drive, that I have not formatted, and I had to rebuild a VM when first errored, but has been fine since deleting the corrupted files. Haven't taken the time to move everything off and reformat to hopefully correct. Any help is appreciated as I'm totally down now. unraid-2.log Quote Link to comment
JorgeB Posted March 29 Share Posted March 29 First you need to fix the flash drive, you can try redoing it first, if the same replace it, then post new diags. Quote Link to comment
mattiapsu Posted March 29 Author Share Posted March 29 Back up with a flash restore to the same USB. Disk 1 emulated. Logs and diagnostics attached. I did not attempt to start the array. syslog.txt oldmain-diagnostics-20240329-1245.zip Quote Link to comment
JorgeB Posted March 29 Share Posted March 29 Post new diags with the array started so we can see if the emulated disk1 is mounting. Quote Link to comment
mattiapsu Posted March 29 Author Share Posted March 29 Looks like it's not mounting... unmountable / unsupported or no file system oldmain-diagnostics-20240329-1307.zip oldmain-syslog-20240329-1708.zip Quote Link to comment
mattiapsu Posted March 29 Author Share Posted March 29 And the SMART report on the drive. oldmain-smart-20240329-1314.zip Quote Link to comment
JorgeB Posted March 29 Share Posted March 29 Check filesystem on the emulated disk1, run it without -n, and if it asks for -L use it Quote Link to comment
mattiapsu Posted March 29 Author Share Posted March 29 (edited) Ran the check, without -n, then with -L as instructed. Output is below. It appears that it completed successfully. However, when I start the array (non-maintenance), the drive is still showing unmountable. Again, much appreciated for the support. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 5 - agno = 2 - agno = 4 - agno = 6 - agno = 7 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:976460) is ahead of log (1:2). Format log to cycle 4. done Edited March 29 by mattiapsu Quote Link to comment
JorgeB Posted March 29 Share Posted March 29 Are you sure? Post new diags after array start in normal mode. Quote Link to comment
mattiapsu Posted March 29 Author Share Posted March 29 (edited) Still have the red x and unmountable message across Size-Used-Free. Disk log: Mar 29 14:27:07 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:27:11 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:27:12 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:27:13 OldMain kernel: ata5.00: configured for UDMA/133 Mar 29 14:32:07 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:32:11 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:32:12 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:32:13 OldMain kernel: ata5.00: configured for UDMA/133 Mar 29 14:35:15 OldMain emhttpd: read SMART /dev/sdf Mar 29 14:37:36 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:37:40 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:37:41 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:37:42 OldMain kernel: ata5.00: configured for UDMA/133 Mar 29 14:37:47 OldMain emhttpd: WDC_WD80EMAZ-00WJTA0_7HKHX2BJ (sdf) 512 15628053168 Mar 29 14:37:47 OldMain kernel: mdcmd (2): import 1 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7HKHX2BJ Mar 29 14:37:47 OldMain kernel: md: import disk1: (sdf) WDC_WD80EMAZ-00WJTA0_7HKHX2BJ size: 7814026532 Mar 29 14:37:53 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:37:57 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:37:58 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:37:59 OldMain kernel: ata5.00: configured for UDMA/133 Mar 29 14:37:59 OldMain emhttpd: read SMART /dev/sdf Mar 29 14:40:49 OldMain emhttpd: shcmd (1772): echo 128 > /sys/block/sdf/queue/nr_requests Mar 29 14:49:20 OldMain emhttpd: read SMART /dev/sdf Mar 29 14:49:22 OldMain emhttpd: WDC_WD80EMAZ-00WJTA0_7HKHX2BJ (sdf) 512 15628053168 Mar 29 14:49:22 OldMain kernel: mdcmd (2): import 1 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7HKHX2BJ Mar 29 14:49:22 OldMain kernel: md: import disk1: (sdf) WDC_WD80EMAZ-00WJTA0_7HKHX2BJ size: 7814026532 Mar 29 14:49:22 OldMain emhttpd: read SMART /dev/sdf Mar 29 14:49:53 OldMain emhttpd: shcmd (1848): echo 128 > /sys/block/sdf/queue/nr_requests Mar 29 14:54:23 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:54:27 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:54:28 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:54:29 OldMain kernel: ata5.00: configured for UDMA/133 oldmain-diagnostics-20240329-1450.zip oldmain-syslog-20240329-1851.zip Edited March 29 by mattiapsu Quote Link to comment
JorgeB Posted March 29 Share Posted March 29 Disabled is normal, but the fs is still corrupt, did you run xfs_repair from the GUI or manually? Quote Link to comment
mattiapsu Posted March 29 Author Share Posted March 29 I ran through GUI... make sure I did it correctly. To repair, it's running Check without -n and in my case with the -L argument? I can follow the article to run manually if that's next. Quote Link to comment
JorgeB Posted March 29 Share Posted March 29 Just do it again then, first just without -n, use -L only if asked. Quote Link to comment
JorgeB Posted March 29 Share Posted March 29 Forgot to mention, first make sure you check/replace cables for parity, to see if it fixes all those ATA errors. Quote Link to comment
mattiapsu Posted March 29 Author Share Posted March 29 (edited) Thanks, I'll get those cables replaced hopefully soon. Probably when I move to a new case, not a lot of free time right now. Is parity at risk with those ATA errors? The Disk is back up. I can read the emulated contents. It looks good from a file standpoint. There's a few files in the lost+found, not sure what those might be, but only a few, not critical for me. Sorry, don't really know what next steps are. I found your other posts on rebuilding the drive... I assume I'll just follow those steps. oldmain-diagnostics-20240329-1516.zip oldmain-syslog-20240329-1917.zip Edited March 29 by mattiapsu added logs Quote Link to comment
JorgeB Posted March 30 Share Posted March 30 13 hours ago, mattiapsu said: Is parity at risk with those ATA errors? It can create other issues, like disk1 needing to be repaired twice, I would really recommend replacing the cable before attempting a rebuild, also replace cable for disk1, since most likely it got disabled because of a cable problem. Quote Link to comment
mattiapsu Posted March 30 Author Share Posted March 30 I started the rebuild and got horrible speeds, so I stopped immediately. I replaced the parity cable and reconnected disk 1 at the drive and MB. I got about 25 ATA errors in the first minute, but nothing since and the speeds have averaged about 140MB/sec for the rebuild. Would the errors be any indication of drive integrity or just purely connection? I see some 'slow to respond' messages as well, but don't know if that's just my hardware, as I started this for fun, and never looked for top specs. I'll plan on replacing more cables when I move cases and add disks in the next month (or if more issues pop up, I'll do sooner). Any other words of wisdom? Thanks for getting me this far. syslog.txt oldmain-diagnostics-20240330-0907.zip Quote Link to comment
JorgeB Posted March 31 Share Posted March 31 Still looks like a power/connection issue, besides the SATA cable, also check/replace the power cables Quote Link to comment
Solution mattiapsu Posted April 6 Author Solution Share Posted April 6 Quick update on this... I had another read error, so I took the server down, and I rebuilt into the new case. I replaced with as many new cables I had, and I'm still using the same power splitter off the molex power cable (I'm using some shucked drives). I'm back up and running, added 2 new drives, and added 1 to the array successfully. It's been up for 2 days with no issues so far. So in the end, I had to: 1) reload my flash drive from backup 2) replace / reconnect SATA and power cables 3) rebuild an array drive 4) ran scrub on my cache ssd 5) delete and replace my docker and libvert image as these were corrupted along the way. So far so good. Thanks for the help, @JorgeB. I'm sure I'll be back in the future. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.