SAS Drives and unRAID - Using parity drives causes read errors


TristanP

Recommended Posts

For those not familiar with the subreddit /r/JDM_WAAAT I have built their NSFW system for use with unRAID with 15 SAS drives from a recently decommissioned SAN at my work. Here's the hardware:

 

Motherboard - Gigabyte GA-7PESH2 (Latest available BIOS from Gigabyte and SAS controller flashed into IT mode)

CPU - 2x Xeon E5-2630L

Memory - 128G ECC DDR3 10600MHz in 16 sticks

SAS Expander - HP 24 Bay Expander Card (487738-001)

SAS Drives - 15x Seagate 3T Constellation SAS drives (ST33000650SS)

Cache Drive Crucial MX500 1T SSD (CT1000MX500SSD1(Z))

Various cables, fans and heat sinks

Rosewill 4U Chassis (RSV-L4500)

 

Using the latest unRAID (v6.6.6) all the drives will format normally without error with the mkfs.xfs command with the following output:

 

meta-data=/dev/sdi1 isize=512 agcount=4, agsize=183141659 blks

sectsz=512 attr=2, projid32bit=1

crc=1 finobt=1, sparse=1, rmapbt=0, reflink=0

data = bsize=4096 blocks=732566633, imaxpct=5

sunit=0 swidth=0 blks

naming =version 2 bsize=4096 ascii-ci=0 ftype=1

log = internal log bsize=4096 blocks=357698, version=2 = sectsz=512 sunit=0 blks, lazy-count=1

realtime =none extsz=4096 blocks=0, rtextents=0

 

I can assign all these drives as normal drives and unRAID will mount the drives after I tell it to format them.

However once I put any of the drives into the parity role upon starting the array I get disk read errors for pretty much every sector until I kill the parity rebuild.

From initial troubleshooting the drives work fine on the command line, but something in unRAID freaks out when I try to use the drives in a parity role.

Can someone shed some light on what's going on? It's currently beyond me. And if I'm not including any useful logs let me know and I'll get them up.

 

Syslog snapshot:

Dec 22 18:58:31 Tower emhttpd: Mounting disks...

Dec 22 18:58:31 Tower emhttpd: shcmd (3441): /sbin/btrfs device scan

Dec 22 18:58:32 Tower root: Scanning for Btrfs filesystems

Dec 22 18:58:32 Tower emhttpd: shcmd (3442): mkdir -p /mnt/disk1

Dec 22 18:58:32 Tower emhttpd: shcmd (3443): mount -t btrfs,xfs,reiserfs -o noatime,nodiratime /dev/md1 /mnt/disk1

Dec 22 18:58:32 Tower kernel: XFS (md1): Mounting V5 Filesystem

Dec 22 18:58:32 Tower kernel: XFS (md1): Ending clean mount

Dec 22 18:58:32 Tower emhttpd: shcmd (3444): xfs_growfs /mnt/disk1

Dec 22 18:58:32 Tower root: meta-data=/dev/md1 isize=512 agcount=4, agsize=183141659 blks

Dec 22 18:58:32 Tower root: = sectsz=512 attr=2, projid32bit=1

Dec 22 18:58:32 Tower root: = crc=1 finobt=1 spinodes=1 rmapbt=0

Dec 22 18:58:32 Tower root: = reflink=0

Dec 22 18:58:32 Tower root: data = bsize=4096 blocks=732566633, imaxpct=5

Dec 22 18:58:32 Tower root: = sunit=0 swidth=0 blks

Dec 22 18:58:32 Tower root: naming =version 2 bsize=4096 ascii-ci=0 ftype=1

Dec 22 18:58:32 Tower root: log =internal bsize=4096 blocks=357698, version=2

Dec 22 18:58:32 Tower root: = sectsz=512 sunit=0 blks, lazy-count=1

Dec 22 18:58:32 Tower root: realtime =none extsz=4096 blocks=0, rtextents=0

Dec 22 18:58:32 Tower emhttpd: shcmd (3445): sync

Dec 22 18:58:32 Tower emhttpd: shcmd (3446): mkdir /mnt/user

Dec 22 18:58:32 Tower emhttpd: shcmd (3447): /usr/local/sbin/shfs /mnt/user -disks 2 -o noatime,big_writes,allow_other -o rem ember=0 |& logger

Dec 22 18:58:32 Tower shfs: stderr redirected to syslog

Dec 22 18:58:32 Tower emhttpd: shcmd (3449): /usr/local/sbin/update_cron

Dec 22 18:58:32 Tower emhttpd: Starting services...

Dec 22 18:58:32 Tower emhttpd: shcmd (3451): chmod 0777 '/mnt/user/appdata'

Dec 22 18:58:32 Tower emhttpd: shcmd (3452): chown 'nobody':'users' '/mnt/user/appdata'

Dec 22 18:58:32 Tower emhttpd: shcmd (3453): chmod 0777 '/mnt/user/domains'

Dec 22 18:58:32 Tower emhttpd: shcmd (3454): chown 'nobody':'users' '/mnt/user/domains'

Dec 22 18:58:32 Tower emhttpd: shcmd (3455): chmod 0777 '/mnt/user/system'

Dec 22 18:58:32 Tower emhttpd: shcmd (3456): chown 'nobody':'users' '/mnt/user/system'

Dec 22 18:58:32 Tower emhttpd: shcmd (3471): /usr/local/sbin/mount_image '/mnt/user/system/docker/docker.img' /var/lib/docker20

Dec 22 18:58:32 Tower kernel: BTRFS: device fsid dd886698-5798-4d20-a9af-8e60fa1e339b devid 1 transid 8 /dev/loop2

Dec 22 18:58:32 Tower kernel: BTRFS info (device loop2): disk space caching is enabled

Dec 22 18:58:32 Tower kernel: BTRFS info (device loop2): has skinny extents

Dec 22 18:58:33 Tower root: Resize '/var/lib/docker' of 'max'

Dec 22 18:58:33 Tower kernel: BTRFS info (device loop2): new size for /dev/loop2 is 21474836480

Dec 22 18:58:33 Tower emhttpd: shcmd (3473): /etc/rc.d/rc.docker start

Dec 22 18:58:33 Tower root: starting dockerd ...

Dec 22 18:58:33 Tower avahi-daemon[2656]: Joining mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1.

Dec 22 18:58:33 Tower avahi-daemon[2656]: New relevant interface docker0.IPv4 for mDNS.

Dec 22 18:58:33 Tower avahi-daemon[2656]: Registering new address record for 172.17.0.1 on docker0.IPv4.

Dec 22 18:58:33 Tower kernel: IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready

Dec 22 18:58:36 Tower root: Starting docker_load

Dec 22 18:58:36 Tower emhttpd: shcmd (3487): /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1

Dec 22 18:58:36 Tower kernel: BTRFS info (device loop3): disk space caching is enabled

Dec 22 18:58:36 Tower kernel: BTRFS info (device loop3): has skinny extents

Dec 22 18:58:36 Tower root: Resize '/etc/libvirt' of 'max'

Dec 22 18:58:36 Tower kernel: BTRFS info (device loop3): new size for /dev/loop3 is 1073741824

Dec 22 18:58:36 Tower emhttpd: shcmd (3489): /etc/rc.d/rc.libvirt start

Dec 22 18:58:36 Tower root: Starting virtlockd...

Dec 22 18:58:36 Tower root: Starting virtlogd...

Dec 22 18:58:36 Tower root: Starting libvirtd...

Dec 22 18:58:36 Tower kernel: tun: Universal TUN/TAP device driver, 1.6

Dec 22 18:58:36 Tower kernel: mdcmd (38): check correct

Dec 22 18:58:36 Tower kernel: md: recovery thread: recon Q ...

Dec 22 18:58:36 Tower kernel: md: using 1536k window, over a total of 2930266532 blocks.

Dec 22 18:58:36 Tower kernel: virbr0: port 1(virbr0-nic) entered blocking state

Dec 22 18:58:36 Tower kernel: virbr0: port 1(virbr0-nic) entered disabled state

Dec 22 18:58:36 Tower kernel: device virbr0-nic entered promiscuous mode

Dec 22 18:58:36 Tower dhcpcd[2290]: virbr0: new hardware address: d2:65:0a:e4:b2:ce

Dec 22 18:58:36 Tower dhcpcd[2290]: virbr0: new hardware address: 52:54:00:e3:45:77

Dec 22 18:58:36 Tower kernel: mpt2sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), sub_code(0x043b)

Dec 22 18:58:36 Tower kernel: sd 7:0:3:0: [sde] Unaligned partial completion (resid=172376, sector_sz=512)

Dec 22 18:58:36 Tower kernel: sd 7:0:3:0: [sde] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x08

Dec 22 18:58:36 Tower kernel: sd 7:0:3:0: [sde] tag#1 Sense Key : 0x5 [current]

Dec 22 18:58:36 Tower kernel: sd 7:0:3:0: [sde] tag#1 ASC=0x10 ASCQ=0x1

Dec 22 18:58:36 Tower kernel: sd 7:0:3:0: [sde] tag#1 CDB: opcode=0x7f, sa=0x9

Dec 22 18:58:36 Tower kernel: sd 7:0:3:0: [sde] tag#1 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00

Dec 22 18:58:36 Tower kernel: sd 7:0:3:0: [sde] tag#1 CDB[10]: 00 00 54 40 00 00 54 40 00 00 00 00 00 00 04 00

Dec 22 18:58:36 Tower kernel: print_req_error: protection error, dev sde, sector 21568

Dec 22 18:58:36 Tower kernel: md: disk1 read error, sector=21504

Dec 22 18:58:36 Tower kernel: md: recovery thread: multiple disk errors, sector=21504

Dec 22 18:58:36 Tower kernel: md: disk1 read error, sector=21512

Dec 22 18:58:36 Tower kernel: md: recovery thread: multiple disk errors, sector=21512

Dec 22 18:58:36 Tower kernel: md: disk1 read error, sector=21520

Dec 22 18:58:36 Tower kernel: md: recovery thread: multiple disk errors, sector=21520

Dec 22 18:58:36 Tower kernel: md: disk1 read error, sector=21528

(etc. until parity is cancelled)

Link to comment
41 minutes ago, TristanP said:

Using the latest unRAID (v6.6.6) all the drives will format normally without error with the mkfs.xfs command

You must let Unraid format the drives in the webUI after adding them to the array. And parity isn't a formatted drive.

 

43 minutes ago, TristanP said:

I can assign all these drives as normal drives and unRAID will mount the drives after I tell it to format them.

So the command line formatting was pointless.

 

Syslog snippets are seldom sufficient. Best not to even post them.

 

Go to Tools - Diagnostics and attach the complete diagnostics zip file to your next post.

Link to comment
Quote

You must let Unraid format the drives in the webUI after adding them to the array. And parity isn't a formatted drive. 

That is true, but at the time I could not assume any drive was good.

 

Quote

So the command line formatting was pointless.

Seems in a support forum people still think snide comments are considered help.

 

You might think it was pointless, but what it told me is that the problem seems to reside in unRAID. Command line formatting told me that it wasn't the drive that couldn't be formatted, or mounted, or written to, or dd'ed to; command line formatting told me that unRAID trying to format the drive is where the issue seems to be.

 

Diagnostic log attached.

tower-diagnostics-20181223-1407.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.