pwm

November 13, 2017

Note that a disk that says it has one sector offline uncorrectable doesn't mean the disk need to be toast.

But it means that based on statistics, there is an increased probability that the drive will fail more - or totally - within a limited time span. Some disks just may get a bad sector because of a defect on the surface that wasn't noticed during the original factory scan, but there is a danger that the problem isn't just a tiny spot but a larger surface area that isn't good or that there is some issue with the head or other parts of the drive, in which case the drive is dangerous to continue to use.

It also means there is one sector that can't be read out correctly because the error correction code (ECC) for that sector isn't enough to correct the bit errors. If you already know the contents of that sector and tries to overwrite the sector then the disk can make use of a spare sector to store the correct data, making your RAID have a full set of disks with all correct data again.

As johnnie.black notes, you most definitely do not want to rebuild your parity at this stage, since the current parity is one way to recompute what contents that should have been stored in the offline uncorrectable sector (unless you happen to have a backup of the specific file data for the file that happens to make use of this specific disk sector).

Anyway - after a extended SMART scan, the disk will be able to tell which sector it finds the first error on. And it might potentially also increase the number of bad sectors.

November 11, 2017

Your problem sounds a lot like a heat issue. If heat is an issue then the first test will run longer while if you then restart the test, the machine will fail earlier because the affected component will reach the critical temperature quicker.

Note that the load on the PSU is high during the parity scan. Unstable power can make CPU, memory, motherboard, ... malfunction giving this kind of issues. But you can do any number of CPU or memory tests later without seeing any issue since your individual component tests are made with lower total system load making the PSU able to supply stable voltages. And component tests will not increase the total temperature in the box as much. During the parity scan, you both have a high power load on the CPU and also generally warmer input air to the PSU.

November 10, 2017

Thanks for the answer. I don't run any docker apps on the unit. I have a separate application server machine. But the beta4 is so old that it still suffers from a timing issue in the RAID code that makes it regularly give a write error when copying new files to the unit. I more-or-less stopped using this specific NAS for other than streaming out data as I got tired of waiting for a fix to this specific problem.

But if this update goes well, then I have a unRAID 5 installation to update too - but there is documentation about the steps for going from version 5 to version 6.

November 7, 2017

The Wiki contains information about upgrading from version 5 to version 6, but doesn't mention if there are any important steps when going from one of the ancient beta versions.

It's a rather plain installation with 4 data + 1 parity and no cache.

June 29, 2014

Ok, first and foremost, can you reboot into safe mode to see if the parity check performance issue continues? I want to see if this is plugin related since I see you're using a bunch. Just a quick "sanity check" before we move on in troubleshooting...

Thought I uninstalled the old plugins from v5, guess not. I did have apcupsd and unmenu running, disabled unmenu and booted into safe mode, showing 1.1MB/s now, current position 533MB after 5 minutes, going to let it continue to run.

EDIT: Think i might've found my problem. Went thru and read a large file from /mnt/disk* for each drive with dd, all drives came back with 100+ MB/s except one. That drive is dying isn't it?

That fixed it, swapped drives around now rebuilding at ~110MB/s. Guess I had a drive decide to fail at the exact same time I decided to try beta6. Thanks for the help jonp

Have you verified that you rally had a bad drive? You might also have had an issue with the cabling, resulting in transfer errors and the disk transfer mode switching from a fast DMA mode into an extremely slow and CPU-intensive PIO mode.

On the other hand - lack of automatic supervision of the disks means that a disk can be bad for quite some time without it being noticed.

May 22, 2014

How are you running plugins now without keeping your disks spun up if you do not have a cache drive?

What I run is stored on a thumb drive. No writes unless I specifically ask for it. No CPU unless I specifically make any request. And if needed I have MySQL available on other machine.

May 12, 2014

I definitely hope plugin support will not go away.

I don't want virtual machines for the task. I have no cache disk and if I had room for a cache disk I would use that space/cable for one more data disk.

And I want my disks to sleep which is an advantage we get from booting from a USB thumb drive. With virtual machines, we'll get lots of extra processes that will regularly want to make disk accesses.

May 1, 2014

yes, but for such simple needs why use beta software if you don't need to?

It takes 64-bit PHP to be able to do lots of file operations with files larger than 4GB. That can be a direct show-stopper for some people.

January 12, 2014

Just noted something interesting, while wondering why a new N54L with two 4TB Seagate disks only builds parity at 40MB/s.

Most subsystems thinks that

/dev/sda = flash thumb drive

/dev/sdb = 4TB disk

/dev/sdc = 4TB disk

When using hdparm -t (or -T) to test transfer speeds, then

/dev/sda = 4TB disk

/dev/sdb = flash thumb drive

/dev/sdc = 4TB disk

For other hdparm commands, the devices are in the expected order.

I have never seen hdparm goof like this on any other system - is there any special driver layer code in unRAID that might give this weird behavior?

pwm

Posts

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by pwm

Disk with errors (but green) during parity rebuild

Was in the middle of Monthly Parity and system froze

Any special requirements when upgrading ancient 6.0-beta4?

Any special requirements when upgrading ancient 6.0-beta4?

unRAID Server Release 6.0-beta6-x86_64 Available

unRAID Server Release 5.0.5-i386 Available

unRAID Server Release 5.0.5-i386 Available

unRAID Server Release 6.0-beta4-x86_64 Available

Drive recognition