Jump to content

ryanborstelmann

Members
  • Content count

    44
  • Joined

  • Last visited

Community Reputation

2 Neutral

About ryanborstelmann

  • Rank
    Advanced Member

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. ryanborstelmann

    "Array has 9 disks with read errors"

    Update: Since disabling disk spin-down, I have had 0 read errors. Honestly have no idea what the issue would be - maybe my HBA freaked out or something. Will let it go for a week or two to ensure 100% that it's stable without errors, then might do some poking around to determine what could be the issue. Thanks for the guidance, folks!
  2. ryanborstelmann

    ZFS plugin for unRAID

    I was able to resolve it by running chmod 777 /mnt/docker/* Unsure why this works while the /mnt/user/docker one has all the non-777 perms, but that's fine. I should probably someday go back and find out what's going on, but for my Plex server, it doesn't matter that things are 777. Thanks!
  3. ryanborstelmann

    ZFS plugin for unRAID

    No dice set it to /mnt/docker as my ZFS mountpoint, and I still get permissions issues. Here's my zpool setup and transfer of CA Backup to the new pool - maybe something I need to add/change in this? I stop docker prior to the configuration, change where my docker.img file is located in unraid settings, then start it up after the process. zpool create docker -m /mnt/docker mirror scsi-350000393a819195c scsi-350000393a8195a74 mirror scsi-350000393b82b5d94 scsi-350000393b82b6a2c zfs set compression=lz4 docker zfs set atime=off docker tar xzvf /mnt/user/backups/docker/2018-09-12\@03.01/CA_backup.tar.gz -C /mnt/docker/ cp /mnt/user/docker/docker.img /mnt/docker/
  4. ryanborstelmann

    ZFS plugin for unRAID

    Trying this out. Will report back findings. Thanks!
  5. ryanborstelmann

    ZFS plugin for unRAID

    Hey all, anything I should be aware of regarding folder permissions when using ZFS on UnRAID? I've moved my docker's appdata folder from /mnt/user/docker to /zfs/docker (using the ZFS plugin to create a zpool outside my array): root@NAS01:~# zfs list NAME USED AVAIL REFER MOUNTPOINT docker 20.1G 518G 20.1G /zfs/docker root@NAS01:~# zpool status pool: docker state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM docker ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 scsi-350000393a819195c ONLINE 0 0 0 scsi-350000393a8195a74 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 scsi-350000393b82b5d94 ONLINE 0 0 0 scsi-350000393b82b6a2c ONLINE 0 0 0 errors: No known data errors I restored my CA Backup of my docker appdata folder, and it seem to have kept all my permissions in place: root@NAS01:~# ls -ln /zfs/docker total 4267660 drwxrwxrwx 8 0 0 19 Jul 27 16:14 Community_Applications_USB_Backup/ drwxrwxrwx 3 0 0 3 Jul 27 16:14 appdata/ drwxrwxrwx 3 0 0 7 Sep 12 00:11 1/ drwxr-xr-x 3 0 0 4 Aug 9 13:59 2/ drwxrwxrwx 3 0 0 3 Aug 8 14:51 3/ drwxrwxrwx 3 0 0 4 Jul 27 16:14 4/ -rw-rw-rw- 1 99 100 53687091200 Sep 12 11:23 docker.img drwxr-xr-x 4 911 911 4 Jul 27 16:14 5/ drwxrwxrwx 6 911 911 6 Jul 27 16:14 6/ drwxrwxrwx 7 1000 1000 18 Sep 11 10:10 7/ drwxr-xr-x 6 0 0 15 Jul 27 16:14 8/ drwxr-xr-x 2 1000 100 3 Jul 27 16:14 9/ drwxrwxrwx 3 1000 100 6 Sep 12 03:07 10/ drwx------ 6 1000 100 10 Jul 27 16:38 11/ drwxrwxrwx 4 0 0 4 Jul 27 16:38 12/ drwxrwxrwx 7 1000 100 15 Sep 12 02:19 13/ drwxr-xr-x 2 911 911 4 Jul 27 16:38 14/ drwxrwxr-x 7 1000 100 15 Sep 12 02:36 15/ drwxr-xr-x 2 1000 100 2 Jul 27 16:39 16/ drwxrwxrwx 8 1000 100 13 Sep 12 03:07 17/ drwx------ 4 0 0 6 Jul 27 16:39 18/ drwxr-xr-x 5 1000 100 10 Jul 27 16:39 19/ drwxrwxrwx 5 1000 100 5 Jul 27 16:39 20/ drwx------ 7 911 911 7 Jul 27 16:40 21/ root@NAS01:~# ls -ln /mnt/user/docker total 8357876 drwxrwxrwx 1 0 0 328 Jul 27 16:14 Community_Applications_USB_Backup/ drwxrwxrwx 1 0 0 20 Jul 27 16:14 appdata/ drwxrwxrwx 1 0 0 147 Sep 12 11:11 1/ drwxr-xr-x 1 0 0 47 Aug 9 13:59 2/ drwxrwxrwx 1 0 0 38 Aug 8 14:51 3/ drwxrwxrwx 1 0 0 37 Jul 27 16:14 4/ -rw-rw-rw- 1 99 100 53687091200 Sep 12 10:50 docker.img drwxr-xr-x 1 911 911 34 Jul 27 16:14 5/ drwxrwxrwx 1 911 911 57 Jul 27 16:14 6/ drwxrwxrwx 1 1000 1000 4096 Sep 11 10:10 7/ drwxr-xr-x 1 0 0 212 Jul 27 16:14 8/ drwxr-xr-x 1 1000 100 25 Jul 27 16:14 9/ drwxrwxrwx 1 1000 100 53 Sep 12 11:23 10/ drwx------ 1 1000 100 191 Jul 27 16:38 11/ drwxrwxrwx 1 0 0 40 Jul 27 16:38 12/ drwxrwxrwx 1 1000 100 238 Sep 12 11:11 13/ drwxr-xr-x 1 911 911 44 Jul 27 16:38 14/ drwxrwxr-x 1 1000 100 232 Sep 12 11:12 15/ drwxr-xr-x 1 1000 100 6 Jul 27 16:39 16/ drwxrwxrwx 1 1000 100 208 Sep 12 11:11 17/ drwx------ 1 0 0 66 Jul 27 16:39 18/ drwxr-xr-x 1 1000 100 184 Jul 27 16:39 19/ drwxrwxrwx 1 1000 100 41 Jul 27 16:39 20/ drwx------ 1 911 911 71 Jul 27 16:40 21/ (obfuscated the folder names) Yet when I move my docker volumes from /mnt/user/docker/xyz to /zfs/docker/xyz, the containers have all sorts of permissions issues writing to their config folders. For example, Plex won't start, UNMS throws permissions errors on its config folder, etc. I can't find any differences in UID/GIDs between the two docker folders, but every container I've tried so far has the same issue. Any thoughts on what I'm missing?
  6. Hey all, I've moved my docker's appdata folder from /mnt/user/docker to /zfs/docker (using the ZFS plugin to create a zpool outside my array). I restored my CA Backup of my docker appdata folder, and it seem to have kept all my permissions in place: root@NAS01:~# ls -ln /zfs/docker total 4267660 drwxrwxrwx 8 0 0 19 Jul 27 16:14 Community_Applications_USB_Backup/ drwxrwxrwx 3 0 0 3 Jul 27 16:14 appdata/ drwxrwxrwx 3 0 0 7 Sep 12 00:11 1/ drwxr-xr-x 3 0 0 4 Aug 9 13:59 2/ drwxrwxrwx 3 0 0 3 Aug 8 14:51 3/ drwxrwxrwx 3 0 0 4 Jul 27 16:14 4/ -rw-rw-rw- 1 99 100 53687091200 Sep 12 11:23 docker.img drwxr-xr-x 4 911 911 4 Jul 27 16:14 5/ drwxrwxrwx 6 911 911 6 Jul 27 16:14 6/ drwxrwxrwx 7 1000 1000 18 Sep 11 10:10 7/ drwxr-xr-x 6 0 0 15 Jul 27 16:14 8/ drwxr-xr-x 2 1000 100 3 Jul 27 16:14 9/ drwxrwxrwx 3 1000 100 6 Sep 12 03:07 10/ drwx------ 6 1000 100 10 Jul 27 16:38 11/ drwxrwxrwx 4 0 0 4 Jul 27 16:38 12/ drwxrwxrwx 7 1000 100 15 Sep 12 02:19 13/ drwxr-xr-x 2 911 911 4 Jul 27 16:38 14/ drwxrwxr-x 7 1000 100 15 Sep 12 02:36 15/ drwxr-xr-x 2 1000 100 2 Jul 27 16:39 16/ drwxrwxrwx 8 1000 100 13 Sep 12 03:07 17/ drwx------ 4 0 0 6 Jul 27 16:39 18/ drwxr-xr-x 5 1000 100 10 Jul 27 16:39 19/ drwxrwxrwx 5 1000 100 5 Jul 27 16:39 20/ drwx------ 7 911 911 7 Jul 27 16:40 21/ root@NAS01:~# ls -ln /mnt/user/docker total 8357876 drwxrwxrwx 1 0 0 328 Jul 27 16:14 Community_Applications_USB_Backup/ drwxrwxrwx 1 0 0 20 Jul 27 16:14 appdata/ drwxrwxrwx 1 0 0 147 Sep 12 11:11 1/ drwxr-xr-x 1 0 0 47 Aug 9 13:59 2/ drwxrwxrwx 1 0 0 38 Aug 8 14:51 3/ drwxrwxrwx 1 0 0 37 Jul 27 16:14 4/ -rw-rw-rw- 1 99 100 53687091200 Sep 12 10:50 docker.img drwxr-xr-x 1 911 911 34 Jul 27 16:14 5/ drwxrwxrwx 1 911 911 57 Jul 27 16:14 6/ drwxrwxrwx 1 1000 1000 4096 Sep 11 10:10 7/ drwxr-xr-x 1 0 0 212 Jul 27 16:14 8/ drwxr-xr-x 1 1000 100 25 Jul 27 16:14 9/ drwxrwxrwx 1 1000 100 53 Sep 12 11:23 10/ drwx------ 1 1000 100 191 Jul 27 16:38 11/ drwxrwxrwx 1 0 0 40 Jul 27 16:38 12/ drwxrwxrwx 1 1000 100 238 Sep 12 11:11 13/ drwxr-xr-x 1 911 911 44 Jul 27 16:38 14/ drwxrwxr-x 1 1000 100 232 Sep 12 11:12 15/ drwxr-xr-x 1 1000 100 6 Jul 27 16:39 16/ drwxrwxrwx 1 1000 100 208 Sep 12 11:11 17/ drwx------ 1 0 0 66 Jul 27 16:39 18/ drwxr-xr-x 1 1000 100 184 Jul 27 16:39 19/ drwxrwxrwx 1 1000 100 41 Jul 27 16:39 20/ drwx------ 1 911 911 71 Jul 27 16:40 21/ (obfuscated the folder names) Yet when I move my docker volumes from /mnt/user/docker/xyz to /zfs/docker/xyz, the containers have all sorts of permissions issues writing to their config folders. For example, Plex won't start, UNMS throws permissions errors on its config folder, etc. I can't find any differences in UID/GIDs between the two docker folders, but every container I've tried so far has the same issue. Any thoughts on what I'm missing?
  7. ryanborstelmann

    "Array has 9 disks with read errors"

    They're hardware errors but they make no sense, so I'm looking to roll back the only things that changed just before I started seeing the errors. There's no way the same exact sector on 6 disks failed at once, and there's even less of a chance that it happened 100 times in one second, all on the same sectors. Even a bad cable/seating/HBA failure, etc would see different sectors going bad, or the entire drive going bad. Even if 6 or 9 drives go bad at once, there is about 0% chance it's all the same sector, and even less of a chance it happens many times. How is UnRAID even looking at the same exact sector on 6 drives at the same time to determine there's a read error? Is there a possibility that this is a kernel bug and it's not actually reporting correctly? How is SMART not seeing any read errors, but the UnRAID kernel is? Sorry for the questions, but this is all so odd and I'd like to wrap my head around it before getting new hardware.
  8. ryanborstelmann

    "Array has 9 disks with read errors"

    Again overnight, at 4:33AM, 6 of the drives all reported the same bad sectors. Only thing that happened was CA Backup ran at 3:00 and finished at 3:30, and the drives all spun down at 3:58AM. Full log: https://ghostbin.com/paste/ja7mh Relevant snippet: Sep 12 04:33:33 NAS01 kernel: sd 7:0:20:0: timing out command, waited 180s Sep 12 04:33:33 NAS01 kernel: sd 7:0:20:0: [sdw] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 12 04:33:33 NAS01 kernel: sd 7:0:20:0: [sdw] tag#18 Sense Key : 0x2 [current] Sep 12 04:33:33 NAS01 kernel: sd 7:0:20:0: [sdw] tag#18 ASC=0x4 ASCQ=0x2 Sep 12 04:33:33 NAS01 kernel: sd 7:0:20:0: [sdw] tag#18 CDB: opcode=0x28 28 00 43 95 86 78 00 04 00 00 Sep 12 04:33:33 NAS01 kernel: print_req_error: I/O error, dev sdw, sector 1133872760 Sep 12 04:33:33 NAS01 kernel: sd 7:0:23:0: timing out command, waited 180s Sep 12 04:33:33 NAS01 kernel: sd 7:0:23:0: [sdz] tag#21 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 12 04:33:33 NAS01 kernel: sd 7:0:23:0: [sdz] tag#21 Sense Key : 0x2 [current] Sep 12 04:33:33 NAS01 kernel: sd 7:0:23:0: [sdz] tag#21 ASC=0x4 ASCQ=0x2 Sep 12 04:33:33 NAS01 kernel: sd 7:0:23:0: [sdz] tag#21 CDB: opcode=0x88 88 00 00 00 00 00 43 95 86 78 00 00 04 00 00 00 Sep 12 04:33:33 NAS01 kernel: print_req_error: I/O error, dev sdz, sector 1133872760 Sep 12 04:33:33 NAS01 kernel: sd 7:0:24:0: timing out command, waited 180s Sep 12 04:33:33 NAS01 kernel: sd 7:0:24:0: [sdaa] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 12 04:33:33 NAS01 kernel: sd 7:0:24:0: [sdaa] tag#22 Sense Key : 0x2 [current] Sep 12 04:33:33 NAS01 kernel: sd 7:0:24:0: [sdaa] tag#22 ASC=0x4 ASCQ=0x2 Sep 12 04:33:33 NAS01 kernel: sd 7:0:24:0: [sdaa] tag#22 CDB: opcode=0x28 28 00 43 95 86 78 00 04 00 00 Sep 12 04:33:33 NAS01 kernel: print_req_error: I/O error, dev sdaa, sector 1133872760 Sep 12 04:33:33 NAS01 kernel: sd 7:0:25:0: timing out command, waited 180s Sep 12 04:33:33 NAS01 kernel: sd 7:0:25:0: [sdab] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 12 04:33:33 NAS01 kernel: sd 7:0:25:0: [sdab] tag#23 Sense Key : 0x2 [current] Sep 12 04:33:33 NAS01 kernel: sd 7:0:25:0: [sdab] tag#23 ASC=0x4 ASCQ=0x2 Sep 12 04:33:33 NAS01 kernel: sd 7:0:25:0: [sdab] tag#23 CDB: opcode=0x28 28 00 43 95 86 78 00 04 00 00 Sep 12 04:33:33 NAS01 kernel: print_req_error: I/O error, dev sdab, sector 1133872760 Sep 12 04:33:33 NAS01 kernel: sd 7:0:27:0: timing out command, waited 180s Sep 12 04:33:33 NAS01 kernel: sd 7:0:27:0: [sdac] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 12 04:33:33 NAS01 kernel: sd 7:0:27:0: [sdac] tag#24 Sense Key : 0x2 [current] Sep 12 04:33:33 NAS01 kernel: sd 7:0:27:0: [sdac] tag#24 ASC=0x4 ASCQ=0x2 Sep 12 04:33:33 NAS01 kernel: sd 7:0:27:0: [sdac] tag#24 CDB: opcode=0x28 28 00 43 95 86 78 00 04 00 00 Sep 12 04:33:33 NAS01 kernel: print_req_error: I/O error, dev sdac, sector 1133872760 Sep 12 04:33:33 NAS01 kernel: sd 7:0:29:0: timing out command, waited 180s Sep 12 04:33:33 NAS01 kernel: sd 7:0:29:0: [sdae] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 12 04:33:33 NAS01 kernel: sd 7:0:29:0: [sdae] tag#25 Sense Key : 0x2 [current] Sep 12 04:33:33 NAS01 kernel: sd 7:0:29:0: [sdae] tag#25 ASC=0x4 ASCQ=0x2 Sep 12 04:33:33 NAS01 kernel: sd 7:0:29:0: [sdae] tag#25 CDB: opcode=0x88 88 00 00 00 00 00 43 95 86 78 00 00 04 00 00 00 Sep 12 04:33:33 NAS01 kernel: print_req_error: I/O error, dev sdae, sector 1133872760 Sep 12 04:34:33 NAS01 kernel: md: disk10 read error, sector=1133872696 Sep 12 04:34:33 NAS01 kernel: md: disk11 read error, sector=1133872696 Sep 12 04:34:33 NAS01 kernel: md: disk15 read error, sector=1133872696 Sep 12 04:34:33 NAS01 kernel: md: disk21 read error, sector=1133872696 Sep 12 04:34:33 NAS01 kernel: md: disk22 read error, sector=1133872696 Sep 12 04:34:33 NAS01 kernel: md: disk23 read error, sector=1133872696 Sep 12 04:34:33 NAS01 kernel: md: disk10 read error, sector=1133872704 Sep 12 04:34:33 NAS01 kernel: md: disk11 read error, sector=1133872704 Sep 12 04:34:33 NAS01 kernel: md: disk15 read error, sector=1133872704 Sep 12 04:34:33 NAS01 kernel: md: disk21 read error, sector=1133872704 Sep 12 04:34:33 NAS01 kernel: md: disk22 read error, sector=1133872704 Sep 12 04:34:33 NAS01 kernel: md: disk23 read error, sector=1133872704 Sep 12 04:34:33 NAS01 kernel: md: disk10 read error, sector=1133872712 Sep 12 04:34:33 NAS01 kernel: md: disk11 read error, sector=1133872712 Sep 12 04:34:33 NAS01 kernel: md: disk15 read error, sector=1133872712 Sep 12 04:34:33 NAS01 kernel: md: disk21 read error, sector=1133872712 Sep 12 04:34:33 NAS01 kernel: md: disk22 read error, sector=1133872712 Sep 12 04:34:33 NAS01 kernel: md: disk23 read error, sector=1133872712 Sep 12 04:34:33 NAS01 kernel: md: disk10 read error, sector=1133872720 Sep 12 04:34:33 NAS01 kernel: md: disk11 read error, sector=1133872720 Sep 12 04:34:33 NAS01 kernel: md: disk15 read error, sector=1133872720 Sep 12 04:34:33 NAS01 kernel: md: disk21 read error, sector=1133872720 Sep 12 04:34:33 NAS01 kernel: md: disk22 read error, sector=1133872720 Sep 12 04:34:33 NAS01 kernel: md: disk23 read error, sector=1133872720 Sep 12 04:34:33 NAS01 kernel: md: disk10 read error, sector=1133872728 Sep 12 04:34:33 NAS01 kernel: md: disk11 read error, sector=1133872728 Sep 12 04:34:33 NAS01 kernel: md: disk15 read error, sector=1133872728 Sep 12 04:34:33 NAS01 kernel: md: disk21 read error, sector=1133872728 Sep 12 04:34:33 NAS01 kernel: md: disk22 read error, sector=1133872728 Sep 12 04:34:33 NAS01 kernel: md: disk23 read error, sector=1133872728 Thoughts? My next course of action is to roll back some of the changes I made this past weekend (disable drive spin-down, etc) just in case something isn't happy.
  9. ryanborstelmann

    (Solved) Can't Stop Array When VM is Running

    Update: This was due to the fact that I had "hibernate" selected as my "what to do when the array stops" option. I changed this to "Shut Down" and it works like a charm. I'll troubleshoot/search a bit and submit a new post if needed about my apparent inability to hibernate my Windows VM.
  10. ryanborstelmann

    "Array has 9 disks with read errors"

    I re-seated everything and after ~6hrs, don't see any read errors. Will report back findings after 24-48hrs if I don't see anything new. Really appreciate the input from all!
  11. Hi all, UnRAID OS: 6.5.3 virtio drivers: 0.1.141-1 I have a Windows Server 2016 VM running, with the QEMU guest agent installed via the latest libvirt drivers. When I go to stop my array, libvirt fires a stop, but nothing happens. I can still RDP to the VM, it hasn't received a call from the hypervisor - it's as if nothing happened at all. I can, however, stop the VM via the VMs dashboard on the UnRAID Web UI. The Shutdown command is fired to the guest OS and it properly shuts down as expected. I just see this in my syslog when stopping the array: Sep 11 12:22:22 NAS01 root: Sep 11 12:22:22 NAS01 root: /dev/sdd: Sep 11 12:22:22 NAS01 root: setting standby to 0 (off) Sep 11 12:22:23 NAS01 emhttpd: Stopping services... Sep 11 12:22:23 NAS01 emhttpd: shcmd (226): /etc/rc.d/rc.libvirt stop The process then hangs until I RDP to the VM to stop it, or stop it via the UnRAID dashboard. I waited a few minutes, it doesn't seem to kill itself after X number of seconds, or anything like that. The QEMU Guest Agent appears to be running just fine: PS C:\Users\Administrator> Get-Service QEMU-GA Status Name DisplayName ------ ---- ----------- Running QEMU-GA QEMU Guest Agent If it matters, my WebUI is running on port 81 (I know this used to be an issue with the 'stop array' script, unsure if it still is with regard to VMs).
  12. ryanborstelmann

    "Array has 9 disks with read errors"

    Good point - that's why I reset the stats (and it does after reboots too). I noticed it incremented again this AM (logs above) after resetting last night. The server had almost zero activity at that time (2AM and 6AM or so), which is strange too. So it's definitely continuing to increment. I'll focus on re-seating everything, and might snag a new LSI HBA too. If it's not an issue with the HBA, I can at least have a spare for the day the HBA does fail. Let's hope it's not the backplane, as SAS2 backplanes are relatively expensive ($300 on eBay at quick glance). I only paid $600 for the server as it is 😀
  13. ryanborstelmann

    "Array has 9 disks with read errors"

    One other thing I see in syslog right as the read errors are noted is this: Sep 11 05:50:42 NAS01 kernel: sd 8:0:25:0: timing out command, waited 180s Sep 11 05:50:42 NAS01 kernel: sd 8:0:25:0: [sdac] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 11 05:50:42 NAS01 kernel: sd 8:0:25:0: [sdac] tag#23 Sense Key : 0x2 [current] Sep 11 05:50:42 NAS01 kernel: sd 8:0:25:0: [sdac] tag#23 ASC=0x4 ASCQ=0x2 Sep 11 05:50:42 NAS01 kernel: sd 8:0:25:0: [sdac] tag#23 CDB: opcode=0x28 28 00 43 95 5f f8 00 04 00 00 Sep 11 05:50:42 NAS01 kernel: print_req_error: I/O error, dev sdac, sector 1133862904 Sep 11 05:50:53 NAS01 kernel: sd 8:0:27:0: timing out command, waited 180s Sep 11 05:50:53 NAS01 kernel: sd 8:0:27:0: [sdad] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 11 05:50:53 NAS01 kernel: sd 8:0:27:0: [sdad] tag#15 Sense Key : 0x2 [current] Sep 11 05:50:53 NAS01 kernel: sd 8:0:27:0: [sdad] tag#15 ASC=0x4 ASCQ=0x2 Sep 11 05:50:53 NAS01 kernel: sd 8:0:27:0: [sdad] tag#15 CDB: opcode=0x28 28 00 43 95 5f f8 00 04 00 00 Sep 11 05:50:53 NAS01 kernel: print_req_error: I/O error, dev sdad, sector 1133862904 Sep 11 05:51:06 NAS01 kernel: sd 8:0:28:0: timing out command, waited 180s Sep 11 05:51:06 NAS01 kernel: sd 8:0:28:0: [sdae] tag#57 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 11 05:51:06 NAS01 kernel: sd 8:0:28:0: [sdae] tag#57 Sense Key : 0x2 [current] Sep 11 05:51:06 NAS01 kernel: sd 8:0:28:0: [sdae] tag#57 ASC=0x4 ASCQ=0x2 Sep 11 05:51:06 NAS01 kernel: sd 8:0:28:0: [sdae] tag#57 CDB: opcode=0x88 88 00 00 00 00 00 43 95 5f f8 00 00 04 00 00 00 Sep 11 05:51:06 NAS01 kernel: print_req_error: I/O error, dev sdae, sector 1133862904 will try physically re-seating everything tonight and see if the issue persists.
  14. ryanborstelmann

    "Array has 9 disks with read errors"

    There are no SATA cables, as it's backplane-based. Just a SAS cable or two to each backplane. Also I can try re-seating the controller, but this server has been in the rack for ~2 years now, and I somehow doubt it's a physical seating/cabling issue that just magically started on Saturday. But I'll pull the server from the rack tonight and double check it all to be sure. I can also order a new M1015 from eBay to be 1000% sure on that front as well. If I reset the stat counters on the dashboard, they do come back in time as well If I was having true read errors, would those show in a SMART report? Those are coming back clean. One final thing that's interesting to me is that the syslog shows that the same exact sector of several disks has the read error. Example: Sep 11 05:51:06 NAS01 kernel: md: disk10 read error, sector=1133863832 Sep 11 05:51:06 NAS01 kernel: md: disk22 read error, sector=1133863832 Sep 11 05:51:06 NAS01 kernel: md: disk23 read error, sector=1133863832 Sep 11 05:51:06 NAS01 kernel: md: disk10 read error, sector=1133863840 Sep 11 05:51:06 NAS01 kernel: md: disk22 read error, sector=1133863840 Sep 11 05:51:06 NAS01 kernel: md: disk23 read error, sector=1133863840 Sep 11 05:51:06 NAS01 kernel: md: disk10 read error, sector=1133863848 Sep 11 05:51:06 NAS01 kernel: md: disk22 read error, sector=1133863848 Sep 11 05:51:06 NAS01 kernel: md: disk23 read error, sector=1133863848 Sep 11 05:51:06 NAS01 kernel: md: disk10 read error, sector=1133863856 Sep 11 05:51:06 NAS01 kernel: md: disk22 read error, sector=1133863856 Sep 11 05:51:06 NAS01 kernel: md: disk23 read error, sector=1133863856 Notice how the sector is the same across three drives. That doesn't seem likely/normal. Thoughts on that? Here is my entire syslog output, grepped for the read errors: https://hastebin.com/yusipoyula.pl
  15. ryanborstelmann

    "Array has 9 disks with read errors"

    Disks with errors: Disk10/sdac | 256 errors Disk4/sdu | 6 errors Disk11/sdab | 256 errors Disk13/sdy | 128 errors Disk15/sdaa | 256 errors Disk14/sdt | 6 errors Disk16/sds | 6 errors Disk22/sdad | 256 errors Disk23/sdae | 128 errors Meaning most of my SMART reports are in "Pre-Fail" and "Old Age" results in the "Type" column. I presume it's cause many of the disks are 5yrs+ old - though no sector failures that I can find. There is one Dell M1015 flashed to IT Mode (aka LSI SAS2008) handling all these disks across two SAS2 backplanes: SuperMicro BPN-SAS2-846EL & BPN-SAS2-826EL. All of the affected disks are on the 846EL, which currently has 16 disks attached to it. Power supply is 2x SuperMicro power supplies (I think 1200W each, but i'd have to double check) each hooked up to their own APC BR1500G UPS for surge & UPS protection.