Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

[SOLVED] Odd system behavior, need help repairing

Featured Replies

Hi all, 

 

Last night i went to upgrade my UnRAID system from 6.2.2 to the latest and greatest, and noticed some escalating odd behavior.  First, when i went to do the "automated" (click here to upgrade) upgrade, it didn't work - it said something to the effect of "unable to write to flash".  I thought that was odd, so being a normally windows guy i decided to reboot because a "reboot fixes everything".  I stopped the array, and as soon as i did i noticed that one of the drives became unavailable (red X), and two more became "unknown" (with the expected drive name there and a dropdown to choose a disk).  Very odd.

 

I rebooted the server and unraid as well as all the drives came back up fine, and i was able to upgrade the OS.  All drives reported as available.  Being that it had been rebooted several times at this point i elected to run a parity check.  This ran all day (the server doesn't have the fastest proc in the world) and when i came back in the evening to check on it i found that the parity check was listed as "incomplete", and that one of the drives had become unavailable.  On the display attached to the unraid server i notice a ton of XFS and I/O errors.  

 

I shut down the server, checked all of the drive mountings and the cabling and all seemed well.  I reseated the cables and the drives just to be on the safe side, and fired the server back up.  When i did, i noticed the display was reporting similar I/O errors, and now the Web GUI is unresponsive (the page doesn't even load).  

 

I've got it rebooted into "safe mode" now as a precautionary measure as i am not home and would like to troubleshoot remotely.  Can anyone advise what my "next steps" are?  Thanks in advance!

Edited by jfeeser

  • Community Expert

Run a memtst (from the boot menu).  That is a good start when you have a number of different issues with  changing symptoms.  Flaky PS's have been known create similar problems.

 

EDIT:  you could upload a diagnostics file.   'Tools'    >>>   'Diagnostics'    ----  or type     diagnostics      on the command line.  (The latter one puts the file in the logs folder/directory of the Flash Drive.

Edited by Frank1940

  • Community Expert

There's probably filesystem corruption on one of the disks, start the array and then grab the diags on the CLI like Frank posted.

  • Author

Sounds good.  Can i do either of those things remotely from safe mode?  I only ask because i don't have physical access to the server right now, it's booted into safe mode, and i'm telnetted in.  

  • Community Expert
8 minutes ago, jfeeser said:

Can i do either of those things remotely from safe mode?

 

start and grab diags yes.

  • Author

Apologies, how do i accomplish that?  It's sad, i know windows and network gear backwards and forwards, but anything beyond the basics in *nix and i'm kindof out of my depth.

  • Community Expert

SSH into your server, google putty.

  • Author

I at least got that far :)

 

I mean the commands to start the array from within safe mode.

  • Community Expert

I though you were already in safe mode, if not there's no need to start in safe mode.

  • Author

Right, what i'm saying is that i'm currently in safe mode and would like to start the array to do the diagnostics you guys mentioned.  Can that be done from safemode or do i need to reboot into "normal" mode?  If i can start the array from safe mode, what are the commands to do so?

  • Community Expert

You start the array normally using the GUI.

  • Author

Hah, silly me.  I assumed that safe-mode was CLI only and never actually bothered to check if the webGUI worked.  Guess the caffiene hasn't kicked in yet.  I'll pull the diagnostics and report back.

  • Community Expert

We need the diags after starting the array.

Edited by johnnie.black

  • Author

Here you go.  Of note is that when i started the array this time, a _different_ disk showed up as unmountable in addition to the one that has red-x'ed previously.  SMART status for _all_ of my drives (even the X'ed out one) are green.

 

 

feezfileserv-diagnostics-20170515-1009.zip

  • Community Expert

You didn't grab the logs after the errors and before rebooting, so just guessing but problems on multiple disks when there's a SASLP that would be my prime suspect, don't forget to grab the diags before rebooting if it happens again.

 

For now, run xfs_repair on disk4 (md4):

 

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS

 

And then rebuild disk9 using the old disk since SMART looks good:

 

http://lime-technology.com/wiki/index.php/Troubleshooting#Re-enable_the_drive

Edited by johnnie.black

  • Author

Thanks.  When doing the xfs_repair on md4, it spits this out:

 

root@feezfileserv:/boot/logs# xfs_repair -v /dev/md4
Phase 1 - find and verify superblock...
        - block cache size set to 663264 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 776022 tail block 775959
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

(this is after stopping the array and restarting it in maintenance mode)

 

Should i just go ahead and do the "xfs_repair -Lv /dev/md4", or is there something else i should try first?
 

 

  • Community Expert

Use -L, it's normal in theses cases and usually there's no data loss.

  • Author

Okay.  After all of that Disk 4 re-detected properly, and disk 9 is re-building.  Time to re-verify that all my backups are up to date. :)

 

Thank you SO MUCH for all your help, and putting up with my novice-ness.

  • Author

Looks like i may have spoken too soon....the parity rebuild for Disk 9 seems to have just stopped itself, and the drive is back to a red X.  Here's a new diagnostic dump....any thoughts?

feezfileserv-diagnostics-20170515-1057.zip

  • Community Expert

Like I suspected it's the SASLP, these sometimes help:

 

-disable vt-d if not needed

-look for a board bios upgrade

-use the controller in a different slot if available

 

If nothing help, best solution it to replace it with a LSI controller.

  • Author

Strange.  Any idea what could've caused the sudden change?  I've been using the server in this configuration for almost a year without incident.

  • Community Expert

It happens to a significant amount of users with both the SASLP and the SAS2LP, any change in hardware or software (like a kernel change from upgrading unRAID) can trigger the issue.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.