can someone please interpret my disk speed data?


Recommended Posts

Hi,

I'm having a lot of trouble building a NAS.

I bought all new hardware to build the nas:

 

intel Core i3 4130 3.40GHz Socket 1150 3MB Cache

Asus B85M-G Socket 1150 VGA DVI HDMI 6+2 Channel HD Audio mATX Motherboard

8gb Corsair XMS3 DDR3 1600MHz C11 XMS3

3x WD 2TB Green Desktop Hard Drive SATA 3.5

Be Quiet Straight Power 400W Fully Wired 80+ Gold Power Supply

 

And initially things seemed ok. but then I experienced extreemly slow uploads.

I ran memtest and round that my RAM had over 8000 errors, so i RMA-ed it and got some Kingston memory. Memtest said the Kingston was error free.

I decided to start over and reformat and do a parity sync. However, this is taking a VERY long time (2mb/s) so I'm suspecting I might also have a problem with one or more of my WD green 2tb hdd.

I've run hdparm and these are the results from the 3 drives:

 

/dev/sdd:
Timing cached reads:   16940 MB in  2.00 seconds = 8487.40 MB/sec
Timing buffered disk reads:  60 MB in  3.01 seconds =  19.90 MB/sec

root@Tower:~# hdparm -tT /dev/sdc

/dev/sdc:
Timing cached reads:   16738 MB in  2.00 seconds = 8386.48 MB/sec
Timing buffered disk reads: 350 MB in  3.00 seconds = 116.50 MB/sec
root@Tower:~# hdparm -tT /dev/sdb

/dev/sdb:
Timing cached reads:   13020 MB in  2.00 seconds = 6520.22 MB/sec
Timing buffered disk reads:  10 MB in  3.17 seconds =   3.15 MB/sec
root@Tower:~#

obviously they vary quite a bit in the Timing buffered disk reads.

sdb in particular looks very slow compared to the others at 3.15 MB/sec.

 

Should I return this disk and get a new one?

 

Also, sdd looks very slow (19.90 MB/sec) compared to sdc (116.50 MB/sec)

 

thanks for any help.

Link to comment

no I didnt  preclear.Should I have?

 

Yes.

 

Then check the smart reports.

 

For now check the smart reports with

 

 

smartctl -a /dev/sd? where ? is the drive letter of the drives in the array.

Post them and we can take a look.

 

Post a syslog, there could be something that jumps out that could point to the issue.

Link to comment

thanks for that.

 

 

Here's sdb:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   166   165   021    Pre-fail  Always       -       4675
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       59
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   198   198   000    Old_age   Always       -       382
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       151
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       57
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       33
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3582
194 Temperature_Celsius     0x0022   128   120   000    Old_age   Always       -       19
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged]

 

sdc:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   168   167   021    Pre-fail  Always       -       4591
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       59
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       151
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       57
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       31
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3887
194 Temperature_Celsius     0x0022   128   120   000    Old_age   Always       -       19
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

 

sdd:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   167   165   021    Pre-fail  Always       -       4641
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       60
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       151
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       57
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       30
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3744
194 Temperature_Celsius     0x0022   128   119   000    Old_age   Always       -       19
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

Link to comment

it looks like you have a parity check that is going on and has not finished, that will slow everything down.

Perhaps I missed something.

 

Nov  7 10:02:17 Tower kernel: mdcmd (13): start NEW_ARRAY

Nov  7 10:02:18 Tower kernel: mdcmd (14): check CORRECT

..

Nov  7 18:38:58 Tower login[2023]: ROOT LOGIN  on '/dev/pts/0' from 'acer'

 

At the end of mine I have

# grep sync /var/log/syslog

Nov  7 03:50:54 unRAID1 kernel: md: sync done. time=31713sec

Nov  7 03:50:55 unRAID1 kernel: md: recovery thread sync completion status: 0

 

So until you see the md: sync done. the machine is going to be pretty busy.

Link to comment

Hi and thanks.

My real concern is that the parity check is so slow.

It's done 20% and has been running for 15 hours and is at 200kb/s

Surely that indicates that something is wrong?

I agree that sound slow.  With 2TB drives I would only expect it take something of the order of 4 hours.

 

Does the syslog you provided cover the period where the parity sync is running?    If there are any problems with any of the drives I would expect there to be error reports in the syslog.

Link to comment

hi,

and thanks for replying.

Yes, the sys log is everything from when I booted the machine to install unraid up untill about 2am this morning. At that time it had been doing a parity sync for about 9 hours .

Looking at the latest sys log after 15 hours in unmenu I can see errors.

the 3 error lines are:

 

Nov  7 10:02:17 Tower logger:        missing codepage or helper program, or other error (Errors)

Nov  7 10:02:17 Tower logger:        missing codepage or helper program, or other error (Errors)

Nov  7 10:02:17 Tower emhttp: disk2 mount error: 32 (Errors)

 

 

I've uploaded the latest sys log .

This is the section with errors :

Nov  7 10:02:17 Tower logger: mount: wrong fs type, bad option, bad superblock on /dev/md1,
Nov  7 10:02:17 Tower logger:        missing codepage or helper program, or other error (Errors)
Nov  7 10:02:17 Tower logger:        In some cases useful info is found in syslog - try
Nov  7 10:02:17 Tower logger:        dmesg | tail  or so
Nov  7 10:02:17 Tower logger: 
Nov  7 10:02:17 Tower emhttp: _shcmd: shcmd (36): exit status: 32 (Other emhttp)
Nov  7 10:02:17 Tower emhttp: disk1 mount error: 32 (Errors)
Nov  7 10:02:17 Tower emhttp: shcmd (37): rmdir /mnt/disk1 (Other emhttp)
Nov  7 10:02:17 Tower emhttp: shcmd (38): mkdir /mnt/disk2 (Routine)
Nov  7 10:02:17 Tower emhttp: shcmd (39): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md2 /mnt/disk2 |$stuff$ logger (Other emhttp)
Nov  7 10:02:17 Tower logger: mount: wrong fs type, bad option, bad superblock on /dev/md2,
Nov  7 10:02:17 Tower logger:        missing codepage or helper program, or other error (Errors)
Nov  7 10:02:17 Tower logger:        In some cases useful info is found in syslog - try
Nov  7 10:02:17 Tower logger:        dmesg | tail  or so
Nov  7 10:02:17 Tower logger: 
Nov  7 10:02:17 Tower emhttp: _shcmd: shcmd (39): exit status: 32 (Other emhttp)
Nov  7 10:02:17 Tower emhttp: disk2 mount error: 32 (Errors)
Nov  7 10:02:17 Tower emhttp: shcmd (40): rmdir /mnt/disk2 (Other emhttp)
Nov  7 10:02:17 Tower kernel: REISERFS warning (device md1): sh-2021 reiserfs_fill_super: can not find reiserfs on md1 (Minor Issues)
Nov  7 10:02:17 Tower kernel: REISERFS warning (device md2): sh-2021 reiserfs_fill_super: can not find reiserfs on md2 (Minor Issues)
Nov  7 10:02:18 Tower emhttp: shcmd (41): /usr/local/sbin/emhttp_event disks_mounted (Other emhttp)
Nov  7 10:02:18 Tower emhttp_event: disks_mounted (Other emhttp)
Nov  7 10:02:18 Tower kernel: mdcmd (14): check CORRECT (unRAID engine)
Nov  7 10:02:18 Tower kernel: md: recovery thread woken up ... (unRAID engine)
Nov  7 10:02:18 Tower kernel: md: recovery thread syncing parity disk ... (unRAID engine)
Nov  7 10:02:18 Tower kernel: md: using 1536k window, over a total of 1953514552 blocks. (unRAID engine)
Nov  7 10:02:19 Tower emhttp: shcmd (42): :>/etc/samba/smb-shares.conf (Other emhttp)
Nov  7 10:02:19 Tower avahi-daemon[1247]: Files changed, reloading.
Nov  7 10:02:19 Tower emhttp: Restart SMB... (Other emhttp)
Nov  7 10:02:19 Tower emhttp: shcmd (43): killall -HUP smbd (Minor Issues)
Nov  7 10:02:19 Tower emhttp: shcmd (44): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service (Other emhttp)
Nov  7 10:02:19 Tower avahi-daemon[1247]: Files changed, reloading.
Nov  7 10:02:19 Tower avahi-daemon[1247]: Service group file /services/smb.service changed, reloading.
Nov  7 10:02:19 Tower emhttp: shcmd (45): ps axc | grep -q rpc.mountd (Other emhttp)
Nov  7 10:02:19 Tower emhttp: _shcmd: shcmd (45): exit status: 1 (Other emhttp)
Nov  7 10:02:19 Tower emhttp: shcmd (46): /usr/local/sbin/emhttp_event svcs_restarted (Other emhttp)
Nov  7 10:02:19 Tower emhttp_event: svcs_restarted (Other emhttp)

syslog-2014-11-08_1.txt

Link to comment

If it were me, I would start looking at the cables. Make sure there is no interference or poor paths. Maybe even replace them with other spares.

I might even try re-arranging the cables.

 

It does sound odd to me that it's taking so long. It should not be that slow. However, nothing jumped out at me in the syslog.

if you decide to stop the parity check and re-arrange, make sure you configure the array not to start.

Then you can check your raw hdparm speeds without interference.

You can also use dd to check your speeds like this.

 

To see disks.

fdisk -l | grep 'Disk /'

 

where ? = drive letter. 

400MB read test -> dd of=/dev/null bs=4096 count=102400 if=/dev/sd?

4.2GB read test -> dd of=/dev/null bs=4096 count=102400 if=/dev/sd?

 

here's an example of mine

root@unRAIDx:/mnt/disk3/filedb# fdisk -l | grep 'Disk /'

 

Disk /dev/sda: 1073 MB, 1073741824 bytes

Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes

Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes

Disk /dev/sdd: 4000.8 GB, 4000787030016 bytes

Disk /dev/sde: 4000.8 GB, 4000787030016 bytes

Disk /dev/sdg: 15.9 GB, 15937306624 bytes

 

root@unRAIDx:/mnt/disk3/filedb# dd of=/dev/null bs=4096 count=102400 if=/dev/sde

102400+0 records in

102400+0 records out

419430400 bytes (419 MB) copied, 3.38214 s, 124 MB/s

 

root@unRAIDx:/mnt/disk3/filedb# dd of=/dev/null bs=4096 count=1024000 if=/dev/sde

1024000+0 records in

1024000+0 records out

4194304000 bytes (4.2 GB) copied, 28.7707 s, 146 MB/s

Link to comment

Hi,

Thanks for that .

I've decided to stop the parity check and preclear all disks instead.

2 are preforming ok (120mb/s) but one is very very slow (about 7 mb/s)

 

Given the same prior circumstances with hardware, I would not expect the speed to change.

That's why I gave you the quick read test.

 

You cannot expect unraid to do anything faster then a linux/unix based dd read.

Link to comment

 

 

Given the same prior circumstances with hardware, I would not expect the speed to change.

That's why I gave you the quick read test.

 

You cannot expect unraid to do anything faster then a linux/unix based dd read.

Did as you suggested and my sdb drive is definitely under performing.

3.9mb/s compared to 45 and 49mb/s for the other 2:

 

Disk /dev/sda: 7851 MB, 7851737088 bytes
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
root@Tower:~# dd of=/dev/null bs=4096 count=102400 if=/dev/sdd
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 9.24476 s, 45.4 MB/s
root@Tower:~# dd of=/dev/null bs=4096 count=102400 if=/dev/sdc
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 8.50575 s, 49.3 MB/s
root@Tower:~# dd of=/dev/null bs=4096 count=102400 if=/dev/sdb
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 107.331 s, 3.9 MB/s

Link to comment

Move it to another slot and/or change the cable, you'll need to isolate if it's the drive itself.

The other choice is to run the manufacturer's diagnostic software or trigger a smart long test.

 

 

It may not be the drive, it may be the controller, cable, connector, etc.

 

 

The SMART long test will read every sector internally in the drive. Thus avoiding the interface.

Link to comment

thanks.

When I've finished preclearing the other 2 drives, I'll power down and mess about with the cables.

The problematic drive in question sdb has the identifier WDC_WD20EZRX-00D8PB0_WD-WCC4M9YLCTRE.

Will this be written on the drive somewhere? Presumably this is a serila number?

Link to comment

thanks.

When I've finished preclearing the other 2 drives, I'll power down and mess about with the cables.

The problematic drive in question sdb has the identifier WDC_WD20EZRX-00D8PB0_WD-WCC4M9YLCTRE.

Will this be written on the drive somewhere? Presumably this is a serila number?

 

 

WDC_WD20EZRX-00D8PB0_WD-WCC4M9YLCTRE.

 

Serial Number should be on a sticker on the drive.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.