unRAID Server Release 6.2.0-beta21 Available


Recommended Posts

Well, I finally got around to setting up a test server for v6.2, and noted a couple of anomalies.  Not sure if these have already been reported (I scanned this thread, but not all of the previous ones) ... but here's what I noted today while setting it up.

 

I used an old system with a C2SEE board with Pentium E6300, 4GB, and 10 old 1.5TB and 2TB drives (reduced to 9 after 30 minutes, as I'll noted below).

 

I configured the system with the two newest 2TB drives as parity, and 6 2TB and 2 1.5TB data drives, and started the array.

After about 30 minutes I checked on the status, and noticed that one of the 2TB drives had already had 5 read errors, so I decided to simply reduce the array by one drive, so I stopped the parity sync.

 

Anomaly #1:  I then did a New Config, and assigned the same two parity drives and 7 of the data drives (excluding the one I didn't want to use) ... and then Started the array.    The system did NOT start a parity sync, but claimed that parity was already good  :)    Clearly it was NOT.

 

So I then did another New Config, and assigned only the 7 data drives -- leaving parity unassigned.    I then Started the array; formatted the data drives; stopped the array; assigned the two parity drives ... and Started the array, and the system then started a parity sync with no problem.

 

Anomaly #2:  During the parity sync, I would check on the status about every 30-60 minutes.    More often than not, when I refreshed the page, only the "bottom section" was shown -- the area where the disks are displayed on the Main tab was blank (the section labeled "Array Devices".    A refresh of the page would fix this ... but it happens VERY often.

 

The sync just finished before I wrote this, so I haven't had a chance to really do much with it yet.  I'm planning to copy several TB of data; "fail" a drive (yank it out mid-operation); and then fail a 2nd drive while rebuilding the "failed" one over the new few days.    I presume all will work perfectly, but just though I'd mention the behaviors I noted when setting it up.

 

Link to comment
  • Replies 545
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Anomaly #1:  I then did a New Config, and assigned the same two parity drives and 7 of the data drives (excluding the one I didn't want to use) ... and then Started the array.    The system did NOT start a parity sync, but claimed that parity was already good  :)    Clearly it was NOT.

 

Are you sure you didn't check the "parity is already valid" checkbox by mistake? It happened to me sometimes since it's in the same place as the "I'm sure I want to this" checkbox (when displayed).

 

I've done tens of new configs with single and dual parity and a parity sync always begins if not checking the trust parity box.

Link to comment

Is there any way to still mount the drives without an Internet connection to access my data? Wasn't aware of the requirement and now am unable to get hold of the shares due to not having access to the net.

Or is there documentation for configuring internal wifi cards?

Link to comment

Is there any way to still mount the drives without an Internet connection to access my data? Wasn't aware of the requirement and now am unable to get hold of the shares due to not having access to the net.

Or is there documentation for configuring internal wifi cards?

 

Probably easiest to downgrade to v6.1, just copy bzroot and bzimage to the flash and reboot.

Link to comment

Is there any way to still mount the drives without an Internet connection to access my data? Wasn't aware of the requirement and now am unable to get hold of the shares due to not having access to the net.

Or is there documentation for configuring internal wifi cards?

 

Probably easiest to downgrade to v6.1, just copy bzroot and bzimage to the flash and reboot.

 

And later on simply revert back by copying originals back? Any considerable risk involved with this move?

Link to comment

And later on simply revert back by copying originals back? Any considerable risk involved with this move?

 

No risk for the array, only thing is if you're using dual parity it will be unassigned on v6.1.

 

If you're using VMs not sure if they stay working after downgrading without any config changes.

Link to comment

And later on simply revert back by copying originals back? Any considerable risk involved with this move?

 

No risk for the array, only thing is if you're using dual parity it will be unassigned on v6.1.

 

If you're using VMs not sure if they stay working after downgrading without any config changes.

 

Will avoid spinning vm-s up. What about dockers, should be fine?

Link to comment

Anomaly #1:  I then did a New Config, and assigned the same two parity drives and 7 of the data drives (excluding the one I didn't want to use) ... and then Started the array.    The system did NOT start a parity sync, but claimed that parity was already good  :)    Clearly it was NOT.

 

Are you sure you didn't check the "parity is already valid" checkbox by mistake? It happened to me sometimes since it's in the same place as the "I'm sure I want to this" checkbox (when displayed).

 

I've done tens of new configs with single and dual parity and a parity sync always begins if not checking the trust parity box.

Maybe related, but probably not.

I just finished my scheduled 2-Month parity check and it came back with 60748078 errors...

 

I was one of the people who had trouble with the freezing server and the resulting hard-resets. I also reproduced theese freezes and tried my best to make sure all of my disks are indeed spun down before cutting power.

 

I was always wondering, because after rebooting, unraid never started a parity check, I was thinking it was due to my best efforts. I have not yet seen any corrupt data, not in or outside of any VM, so I think/hope, that the data was written correctly, but due to the lockup in the unraid driver, changes didn't make it to parity. No disks, parity or data, show any errors or SMART warnings.

While the lock-up were definitly annoying, I think not starting a partiy check when it should be is a more severe issue.

 

The other thing that may have something to do with it, I switched one of the disks from xfs to reiserfs and back again.

But I always moved every file to another disk before I took the array offline, changed the filesystem, started the array, and hit "format" in the gui. I think that should not invalidate parity or because its an easy task through gui, at least warn the user that it does and start a new parity sync?

 

However, the most concerning thing for me is, the scheduled parity check runs with "nocorrect" in case I need to manually check if parity or data is wrong.

With 60748078 errors and nocorrect, shouldn't parity be invalidated? it still states "parity valid"...

 

I am tempted to run a manual parity check with "write corrections", but I would be willing to run another "check only" If you think it helps to find the issue.

 

Last option would be, I am confused and everything is as it should be, but an explanation would be nice in that case :)

 

Jun  6 01:00:01 unRAID kernel: mdcmd (117): check NOCORRECT
Jun  6 01:00:01 unRAID kernel: 
Jun  6 01:00:01 unRAID kernel: md: recovery thread: check P ...
Jun  6 01:00:01 unRAID kernel: md: using 1536k window, over a total of 5860522532 blocks.
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=0
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=8
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=16
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=24
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=32
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=40
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=48
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=56
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=64
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=72
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=80
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=88
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=96
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=104
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=112
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=120
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=128
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=136
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=144
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=152
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=160
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=168
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=176
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=184
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=192
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=200
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=208
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=216
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=224
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=232
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=240
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=248
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=256
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=264
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=272
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=280
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=288
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=296
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=304
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=312
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=320
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=328
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=336
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=344
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=352
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=360
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=368
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=376
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=384
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=392
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=400
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=408
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=416
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=424
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=432
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=440
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=448
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=456
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=464
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=472
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=480
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=488
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=496
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=504
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=512
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=520
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=528
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=536
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=544
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=552
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=560
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=568
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=576
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=584
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=592
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=600
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=608
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=616
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=624
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=632
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=640
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=648
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=656
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=664
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=672
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=680
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=688
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=696
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=704
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=712
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=720
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=728
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=736
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=744
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=752
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=760
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=768
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=776
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=784
Jun  6 01:00:11 unRAID kernel: md: recovery thread: P incorrect, sector=792
Jun  6 01:00:11 unRAID kernel: md: recovery thread: stopped logging
Jun  6 17:55:24 unRAID kernel: md: sync done. time=60923sec
Jun  6 17:55:24 unRAID kernel: md: recovery thread: completion status: 0

 

Diag. & screenshots attached...

parity.PNG.a61f55059144e9dce9ae565355fd6d85.PNG

scheduler.PNG.91e382180651234173275aa9dc26743c.PNG

unraid-diagnostics-20160606-1915.zip

Link to comment

I just finished my scheduled 2-Month parity check and it came back with 60748078 errors...

 

I was one of the people who had trouble with the freezing server and the resulting hard-resets. I also reproduced theese freezes and tried my best to make sure all of my disks are indeed spun down before cutting power.

 

I was always wondering, because after rebooting, unraid never started a parity check, I was thinking it was due to my best efforts. I have not yet seen any corrupt data, not in or outside of any VM, so I think/hope, that the data was written correctly, but due to the lockup in the unraid driver, changes didn't make it to parity. No disks, parity or data, show any errors or SMART warnings.

While the lock-up were definitly annoying, I think not starting a partiy check when it should be is a more severe issue.

 

The other thing that may have something to do with it, I switched one of the disks from xfs to reiserfs and back again.

But I always moved every file to another disk before I took the array offline, changed the filesystem, started the array, and hit "format" in the gui. I think that should not invalidate parity or because its an easy task through gui, at least warn the user that it does and start a new parity sync?

 

However, the most concerning thing for me is, the scheduled parity check runs with "nocorrect" in case I need to manually check if parity or data is wrong.

With 60748078 errors and nocorrect, shouldn't parity be invalidated? it still states "parity valid"...

 

I am tempted to run a manual parity check with "write corrections", but I would be willing to run another "check only" If you think it helps to find the issue.

 

Last option would be, I am confused and everything is as it should be, but an explanation would be nice in that case :)

 

* The syslog unfortunately has been edited, and while it's OK to edit out the files moved, we really need the rest of the syslog.  If there's an issue, the 2 most important parts of the syslog are the part where the errors begin AND the initial setup, all of it.  If one of those are missing, then we have to speculate, can't do a very good analysis.  Your syslog is missing the entire beginning, so perhaps there's a syslog.1 or syslog.2 in /var/log?

 

* You booted some time on the 25th, but the syslog begins on the 29th, and very unfortunately there was some kind of kernel crash on the 28th, involving KVM.  It caused a registry dump in the KVM log, but it would be very useful to know what appeared in the syslog.  Once a critical event like that happens, I would never consider a system to be stable after that, and that makes everything that occurred after that suspect.  If you detect a kernel crash, of any kind, it's best to reboot, even if the system appears to have recovered.

 

* The unusual behaviors you mentioned do seem wrong, but since the system is 'suspect', there's not much we can conclude, as we don't know if it's truly from a bug, or just a corrupted system from the earlier 'crash' event.

 

* *If* all was fine with the system, then I would have to say that parity had never been built, as every single parity block was wrong, until logging was stopped.

 

* Something I noticed, and found in other 6.2-beta21 diagnostics with no second parity drive assigned, the second parity drive is marked as DISK_NP_DSBL, not DISK_NP, that is, it's marked as "disabled", not "not present".  And the vars for the system do not count the second parity drive but do count a disabled drive and an invalid drive, even though you don't have any invalid or disabled drives.

    [mdDisabledDisk] => (null)

    [mdInvalidDisk] => (null)

    [mdMissingDisk] => (null)

    [mdNumDisks] => 6

    [mdNumDisabled] => 1

    [mdNumInvalid] => 1

    [mdNumMissing] => 0

...

    [sbNumDisks] => 7

You have 5 data drives and 1 parity drive, making 6 array drives, none of which are invalid or disabled.  I'm not sure what the 7 is counting.  One conjecture, when looping through the drive count, if the parity function checked if there are disabled disks, it would see a positive count and possibly check the var (mdDisabledDisk) for its index number, which would calculate as zero, meaning skip the parity drive.  This conjecture seems improbable though, as more would have hit this situation too.

Link to comment

What about dockers, should be fine?

 

Yes.

 

After removing the usb from the box, the filesystem got damaged somehow and wasn't mountable. After resolving the issue, the downgrade was a success. To some degree - no docker container nor vm images can be seen under their respective tabs in the webui. Can someone confirm if this is expected?

 

Go to the Docker tab, add container, all your previous dockers should appear on the user defined templates, just add them again, settings will remain the same.

Link to comment

Just a brief update, we are pretty confident we have discovered the bug causing deadlocks and system hangs and are in the process of testing patched code now before rolling out a new release.  Thank you all for your patience with us as we worked to get to the bottom of this very nasty bug.

 

May I vote for beta release just fixing this nasty bug (e.g. 6.2.0-beta21a)?

 

After uncounted hard-reboots because of freezing unRAID machines several of my drives are gone. No cable problem, no adapter problem, they are just dead.

 

If a machine freezes it is no longer usable. Even a graceful IPMI down is not possible. I have to use the power-reset from IPMI. Don't know how many forced power-outages a typical drive survives ...

 

IMHO, it doesn't make sense to test Docker, KVM or whatever if a core functionality like copying one file over SMB can freeze a machine.

 

Just my 0.02.

 

Thanks for listening.

 

Link to comment

May I vote for beta release just fixing this nasty bug (e.g. 6.2.0-beta21a)?

 

After uncounted hard-reboots because of freezing unRAID machines several of my drives are gone. No cable problem, no adapter problem, they are just dead.

 

If a machine freezes it is no longer usable. Even a graceful IPMI down is not possible. I have to use the power-reset from IPMI. Don't know how many forced power-outages a typical drive survives ...

 

IMHO, it doesn't make sense to test Docker, KVM or whatever if a core functionality like copying one file over SMB can freeze a machine.

 

Just my 0.02.

 

Thanks for listening.

 

 

 

Stop Array.  Go to Settings --> Disk Settings and change the num_stripes tunable from 1028 to 8192.  Save.  Start Array.

 

That should workaround the deadlock / web ui unresponsive issues during heavy IO for now.

 

The setting above has fixed lockups for me, have you performed this temporary work around and still have lockups?

Link to comment
Guest
This topic is now closed to further replies.