Unraid and Unassigned Devices show different device names

hawihoney · December 11, 2018

I took the steps to build a RAID1 from two Unassigned Devices descripted below (see link). This RAID1 was working perfect. This morning one drive, the first of the RAID1, was no longer mounted. The share still does exist and the data can be read. I think the RAID1 is degraded but working:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?tab=comments#comment-462135

The device "sdaa" is shown within the Unassigned Devices section of the Main page but does not exist on the Dashboard. Opposite, the device "sdg" is shown on the Dashboard but not on the Main page - neither in the Unraid pools nor in the Unassigned Devices section. It's not possible to mount the device.

The result of a short self test of the device was "No Errors Logged". But syslog shows lot's of BTRFS errors (syslog attached).

1.) Why is there a mismatch in device names between Unraid and Unassigned Devices?

2.) What are the steps to mount the first disk again? If it's a BTRFS thing, how to fix that? And if the disk is faulty how to replace that disk in this RAID1?

Any help is highly appreciated. Many thanks in advance.

syslog.zip

JorgeB · December 11, 2018

Log is not complete, but by the looks of it one of the devices, sdaa dropped offline and there are millions of read/write errors:

Dec 11 04:40:10 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdaa1 errs: wr 13290219, rd 5115312, flush 112592, corrupt 0, gen 0

See here on how to monitor a btrfs pool:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

You should also reboot and post the complete diagnostics: Tools -> Diagnostics.

hawihoney · December 11, 2018

Thanks you very much for your fast answer.

The BTRFS stats of that RAID1 pool shows massive errors. Wondering why SMART does not report anything:

root@Tower:~# btrfs dev stats /mnt/disks/UD01/
[/dev/sdaa1].write_io_errs    17472484
[/dev/sdaa1].read_io_errs     8525841
[/dev/sdaa1].flush_io_errs    124495
[/dev/sdaa1].corruption_errs  0
[/dev/sdaa1].generation_errs  0
[/dev/sdi1].write_io_errs    0
[/dev/sdi1].read_io_errs     0
[/dev/sdi1].flush_io_errs    0
[/dev/sdi1].corruption_errs  0
[/dev/sdi1].generation_errs  0

So I would replace that BTRFS RAID1 disk "sdaa". Is it as easy as the Unraid style? Stop Unraid server, replace disk, restart machine and click on something

The orginal post, where I got the infos how to build the RAID1 from, is not very clear to me regarding replacing a disk. What are the steps?

BTW, any idea about the device mismatch?

itimpi · December 11, 2018

6 minutes ago, hawihoney said:

BTW, any idea about the device mismatch?

The sd? type names are assigned dynamically by Linux and can change between boots so should not be relied on.

This particular case an be explained by the device originally getting a sd? type name that Unraid recognizes, and then getting a new name after a disconnect/reconnect which UD (which handles device dynamically changing) is using.

JorgeB · December 11, 2018

1 hour ago, hawihoney said:

Wondering why SMART does not report anything:

Because most times these are the result of a connection problem, i.e., that's what you get if a disk drops offline, but would need the full diagnostics to check the SMART report, after a reboot/power cycle, so the disk comes online, though you should also grab and post the current diags before rebooting.

hawihoney · December 11, 2018

Took diagnostics before reboot (attached).

After the reboot the drive is online again. Will both drives balance automatically (I don't know if this is the correct wording, I'm a monkey issuing BTRFS commands others give me)?

I do have two identical Unraid server. Both are rock solid. Never have cableing issues, power outages or similar problems.

Both Unraid server make use of Unassigned Devices RAID1 pools. All BTRFS pools show read/write errors. Only the BTRFS RAID1 pools! Unraid itself is rock solid, did not have to change Array or Cache devices for ages.

Have to say that I have some concerns regarding BTRFS ...

tower-diagnostics-20181211-1009.zip

JorgeB · December 11, 2018

29 minutes ago, hawihoney said:

Took diagnostics before reboot (attached).

This is when the device dropped:

Dec  8 16:19:30 Tower kernel: sd 6:0:5:0: device_block, handle(0x000e)
Dec  8 16:19:32 Tower kernel: sd 6:0:5:0: device_unblock and setting to running, handle(0x000e)
Dec  8 16:19:32 Tower kernel: sd 6:0:5:0: [sdg] Synchronizing SCSI cache
Dec  8 16:19:32 Tower kernel: sd 6:0:5:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Dec  8 16:19:34 Tower kernel: mpt3sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00)
### [PREVIOUS LINE REPEATED 5 TIMES] ###
Dec  8 16:19:34 Tower kernel: scsi 6:0:5:0: [sdg] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
Dec  8 16:19:34 Tower kernel: mpt3sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00)
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Dec  8 16:19:34 Tower kernel: scsi 6:0:5:0: [sdg] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 1f 84 f9 c0 00 00 02 58 00 00
Dec  8 16:19:34 Tower kernel: print_req_error: I/O error, dev sdg, sector 528808384
Dec  8 16:19:34 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Dec  8 16:19:34 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Dec  8 16:19:34 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Dec  8 16:19:34 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Dec  8 16:19:34 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0

The btrfs errors are the result of the disk dropping offline, so it's a hardware problem, SMART looks fine so I suggest swapping cables/backplane with another disk, if issues with this disk persist it's likely a disk problem.

31 minutes ago, hawihoney said:

Will both drives balance automatically (I don't know if this is the correct wording, I'm a monkey issuing BTRFS commands others give me)?

Like mentioned on the FAQ entry I kinked you should run a scrub on the pool and make sure there are no uncorrectable errors.

32 minutes ago, hawihoney said:

Have to say that I have some concerns regarding BTRFS ...

btrfs has some issues, but you can't blame it for disks dropping offline.

Unraid and Unassigned Devices show different device names

Recommended Posts

hawihoney

Link to comment

JorgeB

Link to comment

hawihoney

Link to comment

itimpi

Link to comment

JorgeB

Link to comment

hawihoney

Link to comment

JorgeB

Link to comment

Archived