Performance and Stability issues [SOLVED]

November 10, 201411 yr

Hey guys -

I'm a newbie to unraid. Just set up a new server about a week ago with the following specs:

Asrock Q1900-ITX motherboard/cpu

8GB ram

Parity disk: Seagate Barracuda 3TB ST3000DM001

Disk 1: Seagate Barracude 2TB ST2000DL003

Disk 2: Samsung Spinpoint F1 1TB HD103UJ

I'm having some strange issues. Random reboots every few days. Nothing in the syslog for it except a notification that disks 1 and 0 are spinning down.

Nov 10 03:00:32 Myklene kernel: mdcmd (27): spindown 0

Nov 10 03:00:32 Myklene kernel: mdcmd (28): spindown 1

When doing a parity check, for a while the check runs pretty fast (100+ MB/sec), but then drops to about 8 MB/sec and I start seeing a ton of writes to the parity disk. No errors, just many, many writes.

Any insight or ideas of how I can further troubleshoot? I was thinking of maybe trying a different sata controller, but don't want to shell out the $$ on a guess.

UPDATE

SO - 1 replaced hard drive and 2 motherboards later I finally figured out the issue. Turns out my PSU was crapping out. It finally died completely. New PSU and I'm back in business. While I was at it, I precleared the parity drive, precleared and put in a new WD-Red as the (currently) single data drive and removed the other two old drives -- one of them I'll probably put through a preclear and see what SMART has to say; if it looks good I'll add it back into the array. My guess is that the FS errors were introduced by a crash (due to failing PSU) during a data write.

Thanks for the help everyone!

-Justin

Quote

November 10, 201411 yr

Post a syslog and smart report for the drives in question.

Quote

November 10, 201411 yr

Author

Any specific time you want me to post a syslog from? On the last unexpected reboot, the only thing I saw in syslog was spin down on drives 0 and 1.

SMART reports are attached.

sdb.txt = Parity

sdc.txt = disk1

sda.txt = disk2

sdb.txt

sda.txt

sdc.txt

Quote

November 10, 201411 yr

Whatever is available from before the last crash I suppose.

It gives people a picture of your hardware.

Quote

November 10, 201411 yr

Author

Parity check is currently running (9.39 MB/sec, 7% done).

as I continue to do SMART reports on sdb I'm watching the raw_read_error_rate continue to rise on both samsung drives (sdb and sdc)

sdb was at 60936288 when I first pulled it 30 min ago -- it is now at 128554080

sdc was at 162892136. Now it is 210346840

I know raw_error_rate isn't supposed to necessarily indicate a problem but these values look to be going through the roof.

I don't have the syslog from before last crash, but here's since last crash. Huh, I definitely didn't notice this before but I see a lot of errors in there like this:

Nov 10 12:35:48 Myklene kernel: REISERFS error (device md1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [605 45099 0x0 SD]

Nov 10 12:35:48 Myklene kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 189497997. Fsck?

syslog.zip

Quote

November 10, 201411 yr

sda has this value. Monitor it for increasing amounts.

It indicates a CRC error in the communication path. I.E. Potential cable or removable drive bay issue.

199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 2360

The other drives look OK, albeit a bit old.

Did you pre-clear and verify the parity drive?

Describe your hardware more. I.E. PSU CPU, ram brand, etc, etc.

I see that none of these drives have had the smart long test.

I like to have people run that as it puts a line in the sand or log record in the smart data as to when the surface was last verified.

Keep in mind this will take many hours, You will need to stop the array and disable any spin down timers.

You can trigger all 3 smart long tests at the same time.

It happens in each of the machines firmware.

Then leave the machine alone for the amount of time for the longest smart long test.

Then capture another smart log and tuck it away.

look for pending sectors or other abnormal attributes.

Quote

November 10, 201411 yr

Nov 10 12:35:48 Myklene kernel: REISERFS error (device md1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [605 45099 0x0 SD]

Nov 10 12:35:48 Myklene kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 189497997. Fsck?

This isn't good, it could indicate bad sectors on the drive, but since the drive was never really scanned by SMART who knows what's up with the surface vs the higher level format. Seems like the drives themselves might need the smart long test for confidence then a reiserfsck on the high level format to make sure everything is ok.

Quote

November 10, 201411 yr

Author

ok, kicked off the SMART long tests. Looks like they'll be done in about 6 hours; I'll post them when they are complete.

As far as other detailed hardware specs, here you go:

Motherboard: Asrock Q1900-ITX

CPU: Intel J1900

Memory: 2x 4GB PC3-1333 (don't know manufacturer, will have to look when I get home from work)

Antec 550W PSU

Nothing else special I can think of, no hot-swap bays or anything on the drives.

Not sure if I pre-cleared and verified the parity drive. I just put the OS on the flash disk, booted it up, logged in and configured the drives; IIRC, it took several hours to do its thing before I was able to bring the array online.

Quote

November 10, 201411 yr

IIRC, it took several hours to do its thing before I was able to bring the array online.

unRAID did it's own internal blind clear probably, However, I'm not sure it does that to the parity drive.

In any case, the drive with the UDMA CRC's. If that number increases, check or change the cable.

So far the SMART logs do not show anything, let's see after a scan.

The REISERFS format errors do have me concerned. Until they go away, I would not consider the drives reliable.

I.E. do not store data on those drives yet, or if you have, try to capture that data or insure you have a backup.

Quote

November 11, 201411 yr

Author

OK, I ran the long SMART tests. They are attached. I am now getting these REISERFS errors constantly. I have also attached syslog since my last reboot (I manually rebooted this morning so I could change out the sata cable on sdb).

[EDIT] - I forgot to take the array offline before running the long test. Running again now...

Changing out the cable on sdb didn't seem to have any effect.

-Justin

Quote

November 11, 201411 yr

OK, I ran the long SMART tests. They are attached. I am now getting these REISERFS errors constantly.

Reiserfs errors indicate that there is some file system corruption at the reiserfs level on the drive(s). To fix this you need to run reiserfsck against the drive while in maintenance mode.

Quote

November 11, 201411 yr

Might be better to just start over and preclear all drives before trusting them.

Quote

November 11, 201411 yr

Author

OK, here's the smart reports after long test. Other than old-age, anything that indicates problems with any of these?

sdb= Parity

sdc = disk1

sda = disk2

sda_long.txt

sdb_long.txt

sdc_long.txt

Quote

November 12, 201411 yr

OK, here's the smart reports after long test. Other than old-age, anything that indicates problems with any of these?

sdb= Parity

sdc = disk1

sda = disk2

Those SMART reports look fine.

Quote

November 12, 201411 yr

Author

OK, here's the smart reports after long test. Other than old-age, anything that indicates problems with any of these?

sdb= Parity

sdc = disk1

sda = disk2

Those SMART reports look fine.

Thanks itimpi. So I'm stumped then. Given the storm of REISERFS errors, think perhaps I have a bad drive controller?

Quote

November 12, 201411 yr

Thanks itimpi. So I'm stumped then. Given the storm of REISERFS errors, think perhaps I have a bad drive controller?

Once you get a reiserfs error, it is going to continue occurring until you have successfully completed a reiserfsck run to fix it.

Quote

November 12, 201411 yr

If you haven't done already and there isn't any data on the disks, follow the earlier advice a run at least one preclear run on all drives..

With screen you can run all at once.

Quote

November 12, 201411 yr

Author

My plan is to drop a new 3TB WD Red in there (coming tomorrow). I'll preclear it, and run with just that and the parity drive and see how that goes. Will report back.

Thanks everyone!

Quote

November 26, 201411 yr

Author

First post updated with solution.

Quote

Performance and Stability issues [SOLVED]

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)