unRAID unresponsive intermittently


Recommended Posts

Thanks again.  So I should run reiserfsck for all data drives and the cache drive...correct? 

 

I'm running unRAID 5.0.5.  This is after unRAID v5.0-beta8d...correct?

 

I'm a Linux idiot so if I don't ask I'll screw it up.  In the wiki it uses /dev/md1 as the example.  If I want to reiserfsck data drive 1 do I enter the same thing or should the syntax be different?  Typically, I associate sdb with drive 1.

 

The directions say not to, but what would be the verbiage to reiserfsck the parity drive?   

You must use the md device, not the sd device, or you will invalidate parity. Parity has no filesystem so reiserfsck does not apply.
Link to comment
  • Replies 85
  • Created
  • Last Reply

Top Posters In This Topic

You must use the md device, not the sd device, or you will invalidate parity. Parity has no filesystem so reiserfsck does not apply.

 

I recognize that it might not apply, but the wiki's reiserfck's directions state, "VERY IMPORTANT!!!! Do NOT run reiserfsck on the parity drive...running reiserfsck on the parity drive can corrupt it."

 

I want to avoid doing this and the only way I know I can avoid doing it for sure is to understand the command to accomplish it.  Admittedly, the /dev/md commands are throwing me a bit since it is new to me.  Is it as simple as?:

 

/dev/md1 = Data Drive 1

/dev/md2 = Data Drive 2

/dev/md3 = Data Drive 3

etc.

 

If so, is /dev/md0 = to the Parity Drive?

Link to comment

You must use the md device, not the sd device, or you will invalidate parity. Parity has no filesystem so reiserfsck does not apply.

 

I recognize that it might not apply, but the wiki's reiserfck's directions state, "VERY IMPORTANT!!!! Do NOT run reiserfsck on the parity drive...running reiserfsck on the parity drive can corrupt it."

 

I want to avoid doing this and the only way I know I can avoid doing it for sure is to understand the command to accomplish it.  Admittedly, the /dev/md commands are throwing me a bit since it is new to me.

By using the md device, any changes made to the disk will also update parity, so parity will be maintained.

  Is it as simple as?:

 

/dev/md1 = Data Drive 1

/dev/md2 = Data Drive 2

/dev/md3 = Data Drive 3

etc.

Yes

If so, is /dev/md0 = to the Parity Drive?

I don't think md0 exists

 

It is also possible to run it on just the disks, or actually, the partitions, like sda1, sdb1, etc. If using the sd devices, it would be possible to run it on parity so maybe that is what the warning is about.

 

In any case, use the md devices.

Link to comment

Just finished Data Drive 1.  This was the Samsung drive that couldn't complete the 20 gig transfer directly to the share as well as this being the drive that was noticed earlier with an "serror."  It seems like it passed the reiserfsck...correct?  I guess I'll go ahead and do the other drives as well.

reiserfsck_data1.txt

Link to comment

Yes. Check all of the data disks and the cache.

 

If this doesn't fix it, then we need to start isolating components.

 

Select Tools->New Config. Reassign only parity and disk1. Rebuild or check parity and then test. This procedure should be repeated using various combinations disk controllers and Hard Drives.

Link to comment

Yes. Check all of the data disks and the cache.

 

If this doesn't fix it, then we need to start isolating components.

 

Select Tools->New Config. Reassign only parity and disk1. Rebuild or check parity and then test. This procedure should be repeated using various combinations disk controllers and Hard Drives.

 

I completed all the data drives.  The log files are attached.  They all reflected "no corruptions" so I think that means they passed, however, there was some odd data reflected for drive 4.  It reflected some information around /.custom/couchpotato and the reiserfsck listed the paths out unlike all the other drives?  I do have a share called "custom" and a path for /custom on the cache drive.  These hold the plugins I have, but I'm not sure why /.custom/couchpotato would be on disk 4 and it is hidden?  I guess it could have been a backup because the couchpotato on the cache drive is labeled "couchpotato_v2", but it is unlike me to save something directly to a disk.  I didn't think I ever had.  I don't even use couchpotato so I should probably just delete the /.custom folder on Disk 4...don't you think?

 

Also, I would like to complete a reiserfsck on the cache drive, but I'm not clear on the syntax.  The wiki states "reiserfsck --check \dev\sdX1" but my cache drive is IDE (hdc).  Should the syntax be different?

 

Finally, rather than continuing to speculate around what the issue might be from a hardware perspective I just purchased a used and tested motherboard/cpu/ram bundle.  It should be here next week.  It is the same motherboard I'm using now so I'm familiar with it and can just easily mirror what is in the BIOS.  Also, this new bundle comes with a faster processor (AMD Athlon II X4 630).  This will at least give me enough horsepower for one HD stream in Plex and possibly two.  Regardless, $70 delivered was a small price to pay if this allows the server to get back up and running again.  Once I have it stable then I can look in to some more significant upgrades.  Fingers crossed this will help. :) 

 

 

reiserfsck_data_4_3_2.txt

Link to comment

Also, I would like to complete a reiserfsck on the cache drive, but I'm not clear on the syntax.  The wiki states "reiserfsck --check \dev\sdX1" but my cache drive is IDE (hdc).  Should the syntax be different?

 

reiserfsck --check /dev/hdc1

 

There is a Log button in the upper right corner of the unRAID webGUI. First make a copy of the current log (syslog.txt) then click on the Log button. Leave the Log window open until the server crashes. Copy and paste the entire contents of the log window into a second text file (syslogB.txt). Attach both syslog.txt and syslogB.txt to a post.

 

That's a good idea, for v6.  I think I'll add it to the Need help? Read me first! post.  One less thing at the command line, and works headless.  Only good while network connection is still operational, but will keep the very last messages showing, until server crashes or loses the network or is shutdown.  A small amount of constant network activity though.

Link to comment

Also, I would like to complete a reiserfsck on the cache drive, but I'm not clear on the syntax.  The wiki states "reiserfsck --check \dev\sdX1" but my cache drive is IDE (hdc).  Should the syntax be different?

 

reiserfsck --check /dev/hdc1

 

There is a Log button in the upper right corner of the unRAID webGUI. First make a copy of the current log (syslog.txt) then click on the Log button. Leave the Log window open until the server crashes. Copy and paste the entire contents of the log window into a second text file (syslogB.txt). Attach both syslog.txt and syslogB.txt to a post.

 

That's a good idea, for v6.  I think I'll add it to the Need help? Read me first! post.  One less thing at the command line, and works headless.  Only good while network connection is still operational, but will keep the very last messages showing, until server crashes or loses the network or is shutdown.  A small amount of constant network activity though.

 

RobJ, Reverse the order of the log collection. I.e., first click on Log and then copy the current syslog. This way they overlap.

Link to comment

Thanks RobJ.  The cache seemed to pass with "no corruptions found" as well.  This means all the data drives and the cache drive passed.  See below:

 

root@Tower:~#

root@Tower:~# root@Tower:~# reiserfsck --check /dev/hdc1

Will read-only check consistency of the filesystem on /dev/hdc1

Will put log info to 'stdout'

 

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Wed Jun 17 12:50:23 2015

###########

Replaying journal: Done.

Reiserfs journal '/dev/hdc1' in blocks [18..8211]: 0 transactions replayed

Checking internal tree.. finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

        Leaves 25440

        Internal nodes 176

        Directories 142363

        Other files 116706

        Data block pointers 8661272 (3967 of them are zero)

        Safe links 0

###########

reiserfsck finished at Wed Jun 17 12:52:08 2015

###########

root@Tower:

 

Unless someone tells me differently it feels like isolating the problem might be very difficult and it is likely to reside with something on the motherboard.  As much as I would like to definitively know where the problem is I'm wondering if it just might be in the best interest of everyone's time if I wait until the new motherboard/cpu/ram bundle come in. 

 

My hope is this upgrade should be pretty simple.  I should simply swap out the hardware, ensure the BIOS mirrors what I have now, and boot in to unRAID.  From there I need to ensure the parity drive is the same drive in the new configuration and start the array.  Does that seem right? 

 

 

Link to comment
  • 2 weeks later...

Well, I got in the new (used) motherboard/cpu/ram bundle and the results haven't been too promising.  Before I go in to details I need to ask if anyone understands the behavior of how unRAID works with the USB Flash?  Is unRAID loaded in to memory upon startup and the information on the USB is *never* accessed again OR does unRAID need to access the USB from time to time (for example when writing to a disk for some reason)? 

 

The reason I ask is I'm starting to get some indication that the flash might have been the issue this whole time, that is, if unRAID does go back to the flash from time to time.

Link to comment

Well, I got in the new (used) motherboard/cpu/ram bundle and the results haven't been too promising.  Before I go in to details I need to ask if anyone understands the behavior of how unRAID works with the USB Flash?  Is unRAID loaded in to memory upon startup and the information on the USB is *never* accessed again OR does unRAID need to access the USB from time to time (for example when writing to a disk for some reason)? 

 

The reason I ask is I'm starting to get some indication that the flash might have been the issue this whole time, that is, if unRAID does go back to the flash from time to time.

unRaid loads the entire OS into RAM at boot time.  After that, it reads the flash drive to load plugins and configuration information.  About the only writes that take place are when configuration changes take place or at shutdown / restart.
Link to comment
unRaid loads the entire OS into RAM at boot time.  After that, it reads the flash drive to load plugins and configuration information.  About the only writes that take place are when configuration changes take place or at shutdown / restart.

 

Thanks Squid.  Let me ask something specific just so I completely understand.  When I say "writes" I meant that for writes to the array, not the USB.  What I'm wondering is whether unRAID needs to read from the USB drive from time to time during normal operation.  I'm especially wondering whether any reads from the USB might be required during writes to the array.

Link to comment

unRaid loads the entire OS into RAM at boot time.  After that, it reads the flash drive to load plugins and configuration information.  About the only writes that take place are when configuration changes take place or at shutdown / restart.

 

Thanks Squid.  Let me ask something specific just so I completely understand.  When I say "writes" I meant that for writes to the array, not the USB.  What I'm wondering is whether unRAID needs to read from the USB drive from time to time during normal operation.  I'm especially wondering whether any reads from the USB might be required during writes to the array.

No.  Reads from the USB do not happen during writes to the array.
Link to comment
No.  Reads from the USB do not happen during writes to the array.

 

Sigh...I was afraid of that.  Since that is the case what I'm seeing is leaving me even more confused. 

 

I installed the new motherboard/cpu/ram bundle thinking that my intermittent freezes with writes to the array were possibly due to a bad sata controller on the motherboard.  This new bundle is used, but claimed to be "tested" and working fine.  It is almost identical to the same motherboard/ram/cpu I have so there would be some familiarity. 

 

I mirrored the BIOS between the configurations and expected everything to boot just fine.  It didn't and the resulting behavior wasn't always the same.  So I decided to really rule things out by taking the motherboard/cpu/ram out of the system and then I bought a new power supply in case there was an issue there. 

 

Just having the motherboard/cpu/ram bundle out on a table connected to a new power supply results in 3 different boot results.

 

1.  Black screen.  Power is on, but nothing happens.

2.  Hardware Monitor Screen.  A screenshot is below.  I have a feeling this is happening because it doesn't see the USB Flash as bootable for some reason.  Basically, it thinks there is nothing to boot to.

 

hardware%20monitor.jpg

 

3.  Here is where it gets weird.  50% of the time it boots in to unRAID like it should.  I get the Blue Screen where I can choose SAFE MODE, MEMTEST, etc..  Shouldn't this be consistent?  Why upon reboot does it go to this part of the time and then part of the time it does not.

 

4.  Finally, I got this result once.  It feels like the system was trying to read the USB, but then failed.

 

syslinux.jpg

 

After seeing all this with the new motherboard/ram/cpu bundle I elected to try the old system to see if the behavior was different.  For whatever reason it booted more consistently from the FLASH (yes, the bios information is identical between the two setups) in to unRAID, but not always.  It also displayed the Hardware Monitor screen. 

 

Naturally, this all leads me to believe there is an issue with the USB, which is easily remedied, but it still doesn't explain why this server worked fine for five years and then started intermittently freezing during writes to the array.  Not always, but sometimes. 

Link to comment

Have you tried running memtest (i haven't been following this thread closely)

 

I did on the old system where it was first exhibiting the intermittent freezes while writing.  It ran for 5 days.  No errors.

Since you replaced the P/S, I would assume that it has the guts to run the system.

 

The booting issues could be a bad flash drive.  I would think that the best course of action would be to initially get the system so that it boots consistently.  After that then we can concentrate on freezing issues.  I would try and reformat the flash drive again.  Failing that, I would replace it.  It also wouldn't be a bad idea to go into the BIOS and disable everything thats not needed for unRaid operation.  Things like the serial port / parallel port.  No need for them to be using system resources (IRQ).  And just so I don't have to go through the entire 5 pages of replies, what m/b is this?

Link to comment

Some success!  I may be celebrating too soon, but I purchased a new flash and started testing it just to see how it would boot with unRAID.  Upon first boot it would give the Hardware Monitor screen again, but subsequent boots seemed to work fine.  It had me wondering whether the legacy flash would do the same and it did.  I elected to connect the drives outside of the case and see if I could spin things up. 

 

I was a little surprised, but it came up in SAFE MODE just fine.  I started transferring files back and forth and things looked smooth.  I was especially concerned with a 25 gig DVR capture I have because on the previous system this long write would almost assuredly cause the system to freeze with no explanation why.  It went through fine this time.

 

Gaining an inch and asking for a mile I rebooted the server and started with the plugins and out of SAFE MODE.  Again, I haven't stress tested anything yet, but everything worked as I would expect and without freezes.  This is the furthest I've gotten in a while.

 

I hope to put everything back in the case this weekend and see if I can maintain stability.  This is promising though.  I've purchased so much new equipment I may never know truly where the issue resided, but fortunately it was all pretty inexpensive. 

 

Fingers crossed this case isn't possessed.  Now that everything is working outside the case it has me wondering.  We'll see..

Link to comment

Unfortunately, still seeing erratic behavior.  Earlier everything booted up perfectly.  The system shutdown just fine and I went to my daughter's play.  I come back and boot up again and here is what I find on the screen:

 

usberror1.jpg

 

usberror2.jpg

 

It doesn't appear to be loading unraid and the USB might be to blame.  Earlier today I purchased a new USB Flash anticipating this might be the issue and had copied all the files over to the new usb flash and made it bootable.  However, the new USB Flash does the same darn thing and now for whatever reason I can't boot in to unRAID.

 

Could some of the files simply be corrupt on the flash?  They seemingly load sometimes and sometimes they don't?  Could this have been causing the freezes because something wasn't installing correctly from the USB Flash, but unRAID moved forward anyway?  I'm trying to make some sense of how the USB Flash could have effected writing to the array.

 

Anyway, I guess from here I should probably just abandon the Flash altogether, use only the new USB flash, format it, and load a standard version of unRAID 5.0.5 on it....don't you think?  I'll lose my plugins, but those could be loaded back on.  At this point I would just like to get back to some safe data.

 

 

Link to comment

Good idea... start from scratch with the new USB.  Also try a different port (use one that's not on the same "stack" of ports on the back of the server -> that way it should hopefully wind up on a different controller)

 

Also, you've got at least one drive thats running in IDE mode (I'm assuming that you only have SATA drives).  You'll get better performance if you set the BIOS to be AHCI instead of IDE or Legacy

Link to comment

Good idea... start from scratch with the new USB.  Also try a different port (use one that's not on the same "stack" of ports on the back of the server -> that way it should hopefully wind up on a different controller)

 

Thanks for confirming.  I see that Lime Technology only has 5.0.6 available.  I was on 5.0.5.  Is it OK to install 5.0.6 on the new flash and try it even though the array was using 5.0.5?

Link to comment

I would reset the bios to default.  The only change I would make is to ensure the flash is set as the 1st boot device.  Make sure you have it plugged into a USB 2.0 port. If all of that works fine, then make any bios changes one at a time.

 

EDIT:

Also, since you purchased all new hardware, I would also reseat all the boards/cables.

Link to comment

Good idea... start from scratch with the new USB.  Also try a different port (use one that's not on the same "stack" of ports on the back of the server -> that way it should hopefully wind up on a different controller)

 

Thanks for confirming.  I see that Lime Technology only has 5.0.6 available.  I was on 5.0.5.  Is it OK to install 5.0.6 on the new flash and try it even though the array was using 5.0.5?

No problems there.  If you're running anything other than the free version you're also going to have to transfer your registration to the new flash drive.
Link to comment

This is crazy.  Using the new flash I loaded unRAID 5.0.6 on it, a fresh install downloaded from Lime Tech.  I made it bootable and booted the server with it.  I restored the defaults in the BIOS.  I turned off serial, parallel, and floppy controllers.  I gave the USB priority in the boot list and then booted up.  unRAID loaded perfectly.  I then get the GUID to send to Lime Tech, which I did, and then reboot.  Upon reboot it attempts to load unRAID and I get the same USB errors as before on the old Flash.  This is a new flash with fresh install of unRAID?  The motherboard, cpu, and power supply are all new.  The memory came from the old system which was memory tested for 5 days with no errors.  Very confusing.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.