unRAID Server Release 6.1.3 Available


limetech

Recommended Posts

It seems to have sped up and is now at over 100MB/sec with 62% done. Weird. I guess we'll see how quick it does it when it finishes overnight!

Looking more carefully at your screenshot you average speed is not that bad, 929GB in 2:44H is about 95MB/s average, if there was nothing using the array the momentary slowdown can be a disk getting some slow sectors.

 

I also experience slowdowns during parity checks.

I just checked performance of my disks and one disk shows lower average speed and the graph shows low performance in the first 2 TBs of the disk. see attachment. Can this somewhat troublesome disk cause bad speeds?

I would suggest attaching a smart report for disk 11.  Something is going on with that disk and/or the its interface.  (i.e., its controller or cabling.)

all my disks are in cse m35t drive cages. I already relocated this disk in another slot by swapping disks. So it is using another cable and channel on the controller. It is still showing the same result.

smart report is attached.

 

The short test is not sufficient enough.

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     24865         -

 

 

The drive spin down timer needs to be disabled temporarily and a long/extended test of the whole surface needs to be executed.

96  Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
5   Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0

Nothing else seems to stand out.

 

Another choice might be to do a badblocks test in readonly mode of the whole drive.

Which may or may not trigger any events for weak sectors.  However, badblocks and kernel reads are retried, so the smart long/extended test will reveal a problem earlier.

 

The extended test will take approximately 8~9 hours to finish as estimated by the smart recommended polling time.

Extended self-test routine
recommended polling time:     ( 492) minutes.

 

I started a SMART extended self test this morning in the GUI, it is now at 80%. It has been running for about 7 hours.

 

Will it show any output?

Link to comment
  • Replies 246
  • Created
  • Last Reply

Top Posters In This Topic

It seems to have sped up and is now at over 100MB/sec with 62% done. Weird. I guess we'll see how quick it does it when it finishes overnight!

 

Looking more carefully at your screenshot you average speed is not that bad, 929GB in 2:44H is about 95MB/s average, if there was nothing using the array the momentary slowdown can be a disk getting some slow sectors.

 

 

I also experience slowdowns during parity checks.

I just checked performance of my disks and one disk shows lower average speed and the graph shows low performance in the first 2 TBs of the disk. see attachment. Can this somewhat troublesome disk cause bad speeds?

 

I would suggest attaching a smart report for disk 11.  Something is going on with that disk and/or the its interface.  (i.e., its controller or cabling.)

 

all my disks are in cse m35t drive cages. I already relocated this disk in another slot by swapping disks. So it is using another cable and channel on the controller. It is still showing the same result.

smart report is attached.

 

The only thing that I saw was the Throughput_Performance.  You have a number of Hitachi Deskstar 7K3000 in this array.  Have a look at the smart reports of these disks to see if anything stands out.

 

You can do this quickly by clicking on the 'Disk 11' on the 'Main' tab and then on the 'Attributes' tab on that page.  Look at your other Hitachi Deskstar 7K3000's and see how that number compares.

 

This value is comparable on these drives.

Link to comment

I attach the SMART report taken after the extended test.

 

SMART report is essentially perfect.

 

The only thing wrong is the Power_On_Hours VALUE of 97.  After 25000 hours of operation, it says it's used up only 3% of expected life!  That extrapolates out to 800,000 hours, an absurdly optimistic projection!

Link to comment

I attach the SMART report taken after the extended test.

 

SMART report is essentially perfect.

 

The only thing wrong is the Power_On_Hours VALUE of 97.  After 25000 hours of operation, it says it's used up only 3% of expected life!  That extrapolates out to 800,000 hours, an absurdly optimistic projection!

 

Perhaps your interpretation of that value is wrong? Or that of the manufacturers.

Link to comment

The numbers aren't percentages => I'm not sure just how they compute the value, but I've found that it drops rapidly after about 30,000 hours, but not much until then.    I've got drives with over 50,000 hours that show values in the mid-30's for this parameter.

 

In any event, I wouldn't worry about the power-on-hours value anyway ... it's not a parameter that will cause SMART to fail a disk => it's simply reporting the # of hours.

 

Link to comment

I make the 1 hour parity check test with the all v6 versions i can find from my backups.

If not noted differently all test are done without docker, cache dirs, webgui (just use to start and stop parity check), without any use, with nocorrect option.

All tests are done after fresh boot waited aprox. 5 min after boot to settle down. Except dockered tests, waited aprox. 1/2 hour to make sure startup jobs done.

Average speeds calculated by me (current_position*1024/3600)

 

Version Position at 1H Avg.Speed Notes

6.1.3 326 GB (10.9%) 92.73 MB/s Correct

6.1.3 341 GB (11.4%) 97.00 MB/s NoCorrect

6.1.2 377 GB (12.6%) 107.24 MB/s Test#1

6.1.2 380 GB (12.7%) 108.09 MB/s Test#2

6.1.1 391 GB (13.0%) 111.22 MB/s

6.1.0 378 GB (12.6%) 107.52 MB/s

6.0.1 384 GB (12.8%) 109.23 MB/s

 

6.1.3 151 GB (5.0%) 42.95 MB/s With Docker & Cache Dirs

6.1.2 178 GB (5.9%) 50.63 MB/s With Docker & Cache Dirs

6.1.0 173 GB (5.8%) 49.21 MB/s With Docker & Cache Dirs

6.0.1 216 GB (7.2%) 61.44 MB/s With Docker & Cache Dirs

(i lost my 6.1.1 backup because of rotation, cant test 6.1.1 with docker)

 

Hope this is clear enough and helps to fix this issue on the next version.

Link to comment

I attach the SMART report taken after the extended test.

 

SMART report is essentially perfect.

 

The only thing wrong is the Power_On_Hours VALUE of 97.  After 25000 hours of operation, it says it's used up only 3% of expected life!  That extrapolates out to 800,000 hours, an absurdly optimistic projection!

 

SMART seems perfect yet read performance seems degraded.

diskspeed_sdu.zip

Link to comment

I attach the SMART report taken after the extended test.

 

SMART report is essentially perfect.

 

The only thing wrong is the Power_On_Hours VALUE of 97.  After 25000 hours of operation, it says it's used up only 3% of expected life!  That extrapolates out to 800,000 hours, an absurdly optimistic projection!

SMART seems perfect yet read performance seems degraded.

 

My GUESS would be that this drive is having problems reading some rather large areas on that drive.  First thing to realize that the read errors are expected on all modern hard drives.  There are error detecting and error correcting schemes to address this issue.  As I understand it, when a Read Operation fails to pass the checking operation.  The disk will then re-read the data and if that fails, it will do third try.  IF that is unsuccessful, there is a longer error correcting encoding that requires an even larger block of data to see if the data can be reconstructed.  (Of course, all of this takes much more time then a read where the data is correct or easily correctable. Normally this would probably not be noticeable in average usage but when you are doing hundreds of millions of reads in  few hours--- like in a parity check --- it could be come significant.)  I am not sure at what point the Hitachi read algorithm decides that a sector should be marked for reallocation but obviously, this disk has not yet that point. 

 

I am sure the question that you are asking is, "Should I replace this disk"?  You have twenty disks in this array. With single error correcting parity, you can easily recover from a single disk failure.  What you don't want to have happen is for another disk to fail and then have problems with this drive on the rebuilt.    You have to decide what your level of tolerance is for such of an event. 

Link to comment
My GUESS would be that this drive is having problems reading some rather large areas on that drive.

 

This is quite possible.  I have seen three drives which exhibited very poor read perfomance/errors on one area of the drive.  In all cases the SMART test 'passed'..

 

The last one, a WD 3TB Red, was showing an ever-increasing Raw Read Error Rate.  Movies played from this drive would show spells of 'Buffering....' for minutes at a time.  Read Errors were showing in the unRAID syslog, but the drive never 'red-balled'.  I replaced the drive with another, similar, drive - system is now back to normal.  I attempted a preclear on this dodgy drive, but never had long enough between power cuts for the pre-read to complete.

 

I will attempt to RMA this drive, which is only nine months old.

Link to comment

I've read most of this thread, but not every thread on the board. Has the slow parity check/build problem been identified? This usually takes about 14 hours on my setup (6 drives on motherboard SATA, 2 on a 1430SA), not 31 hours!  :-\

 

 

Is that from a HP microserver? Never had any issue with mine, always good constant speed in every unraid release, but only have 4 array disks + cache.

 

My last parity check:

 

Last checked on Wed 28 Oct 2015 06:20:53 AM GMT (today), finding 0 errors. 
Duration: 10 hours, 12 minutes, 35 seconds. Average speed: 108.9 MB/sec

Yes, that's what I got before this release. Are you running 6.1.3? I have an Adaptec 1430SA as well, so this release appears not to like it (and others like the SAS2LP?)

 

This issue has been resolved !!  8) 8)

 

Thanks to Eric Schultz, who found that if you change the nr_requests for all drives that have certain interface characteristics [specifically any drive using the 'ATA8-ACS' spec] to a lower value than the default 128 it will resolve this issue.    The simplest thing is to just do it for all of your drives ... i.e. type the following line for EVERY drive in the array:

 

echo 8 > /sys/block/sdX/queue/nr_requests

 

... where sdX is, of course, the actual identifier for each drive (sda, sdb, etc.).    No need to do this for your cache drive or USB flash drive.

 

Note that the 1430SA cards have a Marvell chip on them, which is why they are causing this issue.

 

In any event, Eric has clearly isolated this issue -- I tried it and it works perfectly => my parity check speeds are back to what I always had with v4 and v5.

 

Eric's detailed post is here (and follows through the rest of the thread):  http://lime-technology.com/forum/index.php?topic=42629.msg417261#msg417261

 

 

Link to comment

I've read most of this thread, but not every thread on the board. Has the slow parity check/build problem been identified? This usually takes about 14 hours on my setup (6 drives on motherboard SATA, 2 on a 1430SA), not 31 hours!  :-\

 

 

Is that from a HP microserver? Never had any issue with mine, always good constant speed in every unraid release, but only have 4 array disks + cache.

 

My last parity check:

 

Last checked on Wed 28 Oct 2015 06:20:53 AM GMT (today), finding 0 errors. 
Duration: 10 hours, 12 minutes, 35 seconds. Average speed: 108.9 MB/sec

Yes, that's what I got before this release. Are you running 6.1.3? I have an Adaptec 1430SA as well, so this release appears not to like it (and others like the SAS2LP?)

 

This issue has been resolved !!  8) 8)

 

Thanks to Ed Schultz, who found that if you change the nr_requests for all drives that have certain interface characteristics [specifically any drive using the 'ATA8-ACS' spec] to a lower value than the default 128 it will resolve this issue.    The simplest thing is to just do it for all of your drives ... i.e. type the following line for EVERY drive in the array:

 

echo 8 > /sys/block/sdX/queue/nr_requests

 

... where sdX is, of course, the actual identifier for each drive (sda, sdb, etc.).    No need to do this for your cache drive or USB flash drive.

 

Note that the 1430SA cards have a Marvell chip on them, which is why they are causing this issue.

 

In any event, Ed has clearly isolated this issue -- I tried it and it works perfectly => my parity check speeds are back to what I always had with v4 and v5.

 

Ed's detailed post is here (and follows through the rest of the thread):  http://lime-technology.com/forum/index.php?topic=42629.msg417261#msg417261

 

Dang it, Ed stole my thunder!  :-[

 

-Eric

Link to comment

The numbers aren't percentages => I'm not sure just how they compute the value, but I've found that it drops rapidly after about 30,000 hours, but not much until then.    I've got drives with over 50,000 hours that show values in the mid-30's for this parameter.

 

Technically, you're correct of course, they aren't percentages.  But when the number range runs from 100 to 1 (or 200 to 1, so you divide in half), and until your report they have always appeared to me to behave linearly, then they behave just like the equivalent percentage number.  In general, I have found that if you assume relatively ideal conditions over a 5 year period, you get hour numbers from 60000 to 100000 (depending on drive quality?), and the manufacturers have appeared to set that as the scale max for Power On Hours.  They assume that at least a few of their drives might reach that.  That makes the current value a useful number to check.

 

You indicate you have observed nonlinear behavior, something I haven't and haven't heard of before.  I'll have to take a closer look, and watch for that, I may have been wrong.  At the same time though, you generally purchase only top quality drives (or near), which would be expected to have a longer lifetime.  If you set the max life at 150000 (is it a WD Red NAS or HGST NAS?), then you get the numbers you indicated, mid 30's at 50000.  But there's no rule what vendors will do.  Once we understand how a vendor is setting it though, it makes the number more predictable and interesting.

Link to comment

... Once we understand how a vendor is setting it though, it makes the number more predictable and interesting.

 

Therein lies the real key  :) :)

... unfortunately, the logic by which they set the current values -- and, for that matter, the basic meaning of many of the parameters, is a tightly held secret and the vendors don't provide any insight into.    Different vendors show differing sets of parameters, and none of them really provide details on which parameters are "important" and which are simply "interesting".

 

Link to comment

I've read most of this thread, but not every thread on the board. Has the slow parity check/build problem been identified? This usually takes about 14 hours on my setup (6 drives on motherboard SATA, 2 on a 1430SA), not 31 hours!  :-\

 

 

Is that from a HP microserver? Never had any issue with mine, always good constant speed in every unraid release, but only have 4 array disks + cache.

 

My last parity check:

 

Last checked on Wed 28 Oct 2015 06:20:53 AM GMT (today), finding 0 errors. 
Duration: 10 hours, 12 minutes, 35 seconds. Average speed: 108.9 MB/sec

Yes, that's what I got before this release. Are you running 6.1.3? I have an Adaptec 1430SA as well, so this release appears not to like it (and others like the SAS2LP?)

 

This issue has been resolved !!  8) 8)

 

Thanks to Eric Schultz, who found that if you change the nr_requests for all drives that have certain interface characteristics [specifically any drive using the 'ATA8-ACS' spec] to a lower value than the default 128 it will resolve this issue.    The simplest thing is to just do it for all of your drives ... i.e. type the following line for EVERY drive in the array:

 

echo 8 > /sys/block/sdX/queue/nr_requests

 

... where sdX is, of course, the actual identifier for each drive (sda, sdb, etc.).    No need to do this for your cache drive or USB flash drive.

 

Note that the 1430SA cards have a Marvell chip on them, which is why they are causing this issue.

 

In any event, Eric has clearly isolated this issue -- I tried it and it works perfectly => my parity check speeds are back to what I always had with v4 and v5.

 

Eric's detailed post is here (and follows through the rest of the thread):  http://lime-technology.com/forum/index.php?topic=42629.msg417261#msg417261

Note that if you read on past my original post, my setup with a 1430SA does not in fact have a severe slowdown during parity build.

 

ETA: I'll try it with a parity check later. 

Link to comment

Just shows the value of configurability of unraid combined with the value of the user forum and his contributors.

 

A combination of 'ATA8-ACS and Marvell chipsets causes a problem and luckily a user, Mr. Schultz this time, finds a setting that can be changed. Hats of to him. Job well done!

 

I still have a SAS2LP card stocked, but now use M1015 and alikes.

 

My slowdown issue seems to be solved too. with the help a script from, again, another user on this forum, diskspeed.sh. It showed that one disk had severe read problems, bringing the parity check to a crawl. After replacing the disk the situation was fixed. It showed once again that parity check speeds are determined by the slowest disk in an array.

 

 

Link to comment

What is the status of webGui hanging? I still have problems with unresponsive webGui. After unRAID server restart, webGui is unreachable for 5-6 minutes. All other services are working normally, ping to unRAID server is ok.

When I am switching between "tabs" in webGui, sometimes it freezes for 1 minute and more.

unRAID version: 6.1.3 - if I remember correctly, I had same problems with 6.1.2 version.

Link to comment

What is the status of webGui hanging? I still have problems with unresponsive webGui. After unRAID server restart, webGui is unreachable for 5-6 minutes. All other services are working normally, ping to unRAID server is ok.

When I am switching between "tabs" in webGui, sometimes it freezes for 1 minute and more.

unRAID version: 6.1.3 - if I remember correctly, I had same problems with 6.1.2 version.

 

I have the same issue, but just hit refresh on the browser which gets things moving again. It's been happening for a few months, but I just automatically refresh to fix so didn't really think about it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.