Skip to content

View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS

Tap the Share icon in Safari
Scroll the menu and tap Add to Home Screen.
Tap Add in the top-right corner.

To install this app on Android

Tap the 3-dot menu (⋮) in the top-right corner of the browser.
Tap Add to Home screen or Install app.
Confirm by tapping Install.

Unraid Unleash Your Hardware

Unraid Summer Sale in live: 20% off Starter and Unleashed + buy one, get one 50% off

Unraid

Unraid Unleash Your Hardware

My unRaid is in shambles, please help me fix it...

Start new topic

September 30, 201015 yr

Okay, my server is messed up... I haven't even turned it on in 3 weeks because I've been busy, but it's time to fix it so I can listen to music and watch movies again...

First, I have HPA on a drive:

Sep 29 18:22:29 Tower kernel: sde1

Sep 29 18:22:29 Tower kernel: ata5.00: HPA detected: current 2930275055, native 2930277168

Sep 29 18:22:29 Tower kernel: ata5.00: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7

Sep 29 18:22:29 Tower kernel: ata5.00: 2930275055 sectors, multi 0: LBA48 NCQ (depth 31/32)

I tried this:

root@Tower:~# hdparm -N p2930277168 /dev/sde

and got this:

/dev/sde:

setting max visible sectors to 2930277168 (permanent)

SET_MAX_ADDRESS failed: Input/output error

HDIO_DRIVE_CMD(identify) failed: Input/output error

More seriously, my parity is useless because I have two bad drives.

I did a rebuild-tree on one of them, and even after that it's still not mountable. The other one gives me a Bad root block 0. everytime I try any kind of resierfsck

I've tried juggling the drives around to different backplanes/cables/power cables, and the results have been identical; I'm thinking I just got unlucky, rather than having a problem in the machine.

Any advise on data recovery for these two drives?

Lastly, I do get an odd warning in the syslog ever time I turn it on:

Tower kernel: ACPI Warning: Incorrect checksum in table [OEMB] - 01, should be F4 (20090903/tbutils-314)

What does this mean? Is it dire?

Quote

September 30, 201015 yr

Before anything else, you need to disable the HPA (Save BIOS to disk option) in the BIOS... otherwise when one disk fails, the system can then wipe out the next one as it looks for some place to put the HPA.

Quote

September 30, 201015 yr

Author

My motherboard doesn't have HPA... the drive was used in my desktop before which did.

Quote

September 30, 201015 yr

My motherboard doesn't have HPA... the drive was used in my desktop before which did.

Then you don't need to do anything if you don't want to. The few megabytes occupied by the HPA do no harm.

Quote

October 1, 201015 yr

Author

Okay cool... guess HPA only matters if it's changing, or on the parity drive (cause it would make it smaller than the equivalent data drives?)

Anyways, magically, one of my disks is all good again!!! :D

It was the most full to...

Going to tinker with the other problematic drive for a bit before I rebuild parity...

And I need to CC35 up some of those drives on CC34 before they can go off...

Quote

October 1, 201015 yr

Author

just got a kernel panic while I was telnet'd...

Message from syslogd@Tower at Thu Sep 30 23:13:54 2010 ...

Tower kernel: Oops: 0000 [#1] SMP

Message from syslogd@Tower at Thu Sep 30 23:13:54 2010 ...

Tower kernel: Stack:

Message from syslogd@Tower at Thu Sep 30 23:13:54 2010 ...

Tower kernel: Process emhttp (pid: 1584, ti=c41a6000 task=f779c000 task.ti=c41a6000)

Message from syslogd@Tower at Thu Sep 30 23:13:54 2010 ...

Tower kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/host0/port-0:3/end_device-0:3/target0:0:3/0:0:3:0/block/sde/stat

Message from syslogd@Tower at Thu Sep 30 23:13:54 2010 ...

Tower kernel: CR2: 00000000f95b4000

Message from syslogd@Tower at Thu Sep 30 23:13:54 2010 ...

Tower kernel: EIP: [<f95aa8e4>] md_cmd_proc_read+0x41/0x54 [md_mod] SS:ESP 0068:c41a7ef4

Message from syslogd@Tower at Thu Sep 30 23:13:54 2010 ...

Tower kernel: Call Trace:

Message from syslogd@Tower at Thu Sep 30 23:13:54 2010 ...

Tower kernel: Code: 55 f0 e8 6a c2 b8 c7 8d 50 01 29 f2 39 d3 7c 0b 8b 45 0c 89 d3 c7 00 01 00 00 00 8b 45 f0 89 d9 81 c6 ac fd 5a f9 c1 e9 02 89 38 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 5a 89 d8 5b 5e 5f 5d c3 55 89

Quote

October 2, 201015 yr

Author

Removed that drive... it was only 1TB anyways, rather get a 2TB in there...

Anyways, I finished rebuilding the tree on the other drive. It did save most of the data, but the drive is slow now; it takes a long time to load directories and such. Is there some kind of defrag I should be doing? Or should I move the files to a freshly cleared drive, then reclear it?

Quote

October 2, 201015 yr

Author

Okay, after deleting all the lost+found fragments, the speed is okay again.

I have now fixed all the drives, and rebuilt the parity... while it was rebuilding the parity, it said it had 75 sync errors... but everything was fine, and I used the unRaid for 5-6 hours.

Now I see a green circle flashing next to my disk2... 75 errors found. What does this mean?

Quote

October 2, 201015 yr

Okay, after deleting all the lost+found fragments, the speed is okay again.

I have now fixed all the drives, and rebuilt the parity... while it was rebuilding the parity, it said it had 75 sync errors... but everything was fine, and I used the unRaid for 5-6 hours.

Now I see a green circle flashing next to my disk2... 75 errors found. What does this mean?

The flashing green indicator means the disk has gone to sleep.

Now, do another parity check. No errors should be found.

Quote

October 3, 201015 yr

Author

Okay, did a parity check overnight, it came up clean - 0 errors.

However, disk2 is up to 199 errors (they popped up before I ran the parity check)

Here's part of the syslog, where all the errors occurred: http://pastebin.com/KCVUtg1y

Mostly a lot of:

handle_stripe read error: 24432/2, count: 1

md: disk2 read error

Quote

October 3, 201015 yr

Okay, did a parity check overnight, it came up clean - 0 errors.

However, disk2 is up to 199 errors (they popped up before I ran the parity check)

Here's part of the syslog, where all the errors occurred: http://pastebin.com/KCVUtg1y

Mostly a lot of:

handle_stripe read error: 24432/2, count: 1

md: disk2 read error

Those errors represent a sector that was un-readable. ( an un-correctable media error)

Now, get a smart report on that drive to see what has actually happened on it. The sector(s) have probably already been re-allocated.

I think disk2 is /dev/sdf

the command would then be:

smartctl -d ata -a /dev/sdf

It is good there were no additional parity errors. That is a good sign.

Quote

October 3, 201015 yr

Author

Now, get a smart report on that drive to see what has actually happened on it. The sector(s) have probably already been re-allocated.

I think disk2 is /dev/sdf

the command would then be:

smartctl -d ata -a /dev/sdf

Yes, it is sdf... here's the result: http://pastebin.com/pVA88hGY

Quote

October 3, 201015 yr

It shows 1 sector pending re-allocation. (it should be re-allocated the next time the sector is written to)

Quote

October 3, 201015 yr

Author

Okay, thanks!

From now on I'll be running parity checks at least weekly, and keeping a precleared drive ready to go the second a disk fails...

Quote

October 3, 201015 yr

Okay, thanks!

From now on I'll be running parity checks at least weekly, and keeping a precleared drive ready to go the second a disk fails...

Most of us do it monthly, and we do it in NOCORRECT mode. That way, if a data disk mis-behaves and returns trash it does not corrupt the parity that might be used to re-construct it.

Quote

October 3, 201015 yr

Most of us do it monthly, and we do it in NOCORRECT mode. That way, if a data disk mis-behaves and returns trash it does not corrupt the parity that might be used to re-construct it.

What if the mis-behaves one is parity disk?

Quote

October 3, 201015 yr

Most of us do it monthly, and we do it in NOCORRECT mode. That way, if a data disk mis-behaves and returns trash it does not corrupt the parity that might be used to re-construct it.

What if the mis-behaves one is parity disk?

It is performed a no-correction mode. You'll still see the "parity errors" occur on the interface, but they will not be automatically corrected. determining which drive is the cause of the parity errors is a whole different problem. It could be any one of them, but at least you get to attempt to figure it out from other errors/symptoms in the log files.

If you decide the data disk is correct, you can run the normal "Check" while will change parity to reflect what is being read from the data disks.

Quote

October 3, 201015 yr

It is performed a no-correction mode. You'll still see the "parity errors" occur on the interface, but they will not be automatically corrected. determining which drive is the cause of the parity errors is a whole different problem. It could be any one of them, but at least you get to attempt to figure it out from other errors/symptoms in the log files.

If you decide the data disk is correct, you can run the normal "Check" while will change parity to reflect what is being read from the data disks.

This actually is what we need to think about.

For what i can see so far, unRAID is taking "trust data disk" approach, whenever there is a parity mismatch, parity disk will be corrected with new parity. However when using one parity disk for 20 data disks, chance are these 20 data disks might have higher probability to produce trash data (uncorrectable read error, for example) than this solo parity disk during parity check.

However even in taking "trust parity disk" approach by using NOCORRECT mode parity check, there is no enough information from syslog can tell us parity disk is really that trustworthy.

Quote

October 3, 201015 yr

However even in taking "trust parity disk" approach by using NOCORRECT mode parity check, there is no enough information from syslog can tell us parity disk is really that trustworthy.

We are not "trusting" parity by using the NOCORRECT mode. It is just performing a parity verify instead of a parity verify/correction.

There is absolutely no way for anything to know which disk(s) are incorrect if parity is not as expected. all we know is what adding all the bits together across a specific bit position comes out to an odd number instead of an even one.

Other clues in the syslog might let you identify a bad drive... for example, one where a read failure resulted in all zeros being returned.

Quote

October 3, 201015 yr

There is absolutely no way for anything to know which disk(s) are incorrect if parity is not as expected. all we know is what adding all the bits together across a specific bit position comes out to an odd number instead of an even one.

Other clues in the syslog might let you identify a bad drive... for example, one where a read failure resulted in all zeros being returned.

That is actually one thing really trouble me, From time to time, when there were parity error during periodical parity check, none of them shown there were any HW issues like bad drive, SATA error....etc, all i can see is perfect syslog content.

Quote

October 3, 201015 yr

There is absolutely no way for anything to know which disk(s) are incorrect if parity is not as expected. all we know is what adding all the bits together across a specific bit position comes out to an odd number instead of an even one.

Other clues in the syslog might let you identify a bad drive... for example, one where a read failure resulted in all zeros being returned.

That is actually one thing really trouble me, From time to time, when there were parity error during periodical parity check, none of them shown there were any HW issues like bad drive, SATA error....etc, all i can see is perfect syslog content.

The reiserfs file systems on the data disks have journals that allow them to re-group from a un-expected power failure. The parity disk have no such journal. It is not uncommon to have a few parity errors if you take a power hit.

There is no simple solution other than checksums on the files themselves.

Quote

October 3, 201015 yr

The reiserfs file systems on the data disks have journals that allow them to re-group from a un-expected power failure. The parity disk have no such journal. It is not uncommon to have a few parity errors if you take a power hit.

That is true, however i have UPS for my server and always perform clean shutdown, but still have parity error during checking from time to time, that would for sure raise my concern.

Quote

October 3, 201015 yr

The reiserfs file systems on the data disks have journals that allow them to re-group from a un-expected power failure. The parity disk have no such journal. It is not uncommon to have a few parity errors if you take a power hit.

That is true, however i have UPS for my server and always perform clean shutdown, but still have parity error during checking from time to time, that would for sure raise my concern.

Mine too.. I've never had an un-expected parity error in 5 years. If you do, and you did not have a power failure, and you always cleanly shut down, then odds are you have some flaky hardware involved. (and it could be absolutely anything, from the power supply to the RAM to the motherboard itself, to the disks.)

Joe L.

Quote

Archived

This topic is now archived and is closed to further replies.

Go to topic listing

Where:

Search:

Date Created:

Use:

Last Updated:

Chrome (Android)

Tap the lock icon next to the address bar.
Tap Permissions → Notifications.
Adjust your preference.

Chrome (Desktop)

Click the padlock icon in the address bar.
Select Site settings.
Find Notifications and adjust your preference.