hypyke

Members
  • Posts

    196
  • Joined

  • Last visited

Everything posted by hypyke

  1. I missed the faq post because I misread it as specifically about the technical aspects of running the docker based on the comment below the link. That was indeed my mistake. I do appreciate all the kind comments though.
  2. What is the advantage of this docker over the existing plugin?
  3. Nope. I will if it happens again. Still, not deleting a docker during a config change because of a typo would be nice.
  4. I wasn't aware of that and had to go searching. Still, you have to re-setup the docker settings and such. It's not nice.
  5. I am not sure if this needs to be a BR or a feature request of if it's just "the way it works" If you edit a docker container and that edit fails, for example I added a path and didn't use a slash, ex. Container path: something instead of : Container path /something The change is made and when you try to apply it deletes the docker first and then fails to reconfigure and you end up with no docker. Should there be some validation, either on the Apply or at least before the old docker is deleted? It's an easy mistake to make.
  6. With the array stopped, an image is broken on the Main tab. The file is sum.png. See the attached screenshot.
  7. I am really up the creek now with this issue. I have a z390 asrock board and a Sandisk Ultra FLAIR (not fit) and it's giving me the same kind of issues. Some sort of corruption after reboots and then the /boot is empty issue. Unplugging and replugging seems to have no effect. Once it's messed up a reinstall is the only thing I can do. The drive TESTS fine in windows and I've reformatted and reinstalled using the correct procedure which only fixes it for a while. I plug my old Sandisk usb 2.0 drive in and it works fine it seems. Of course I've done a key replacement already so the old one is black listed on the next upgrade. I have no idea what to do. I have 2 pro keys, one is now on this drive that won't work. The other is on an old drive that's only 256 megs. I could upgrade that one but I am unsure if it will be stable.
  8. There are some old topics in the forum and the docs aren't very clear. I now have the current ver and I'm running docker containers and such. What is the correct way to cleanly power down the array from the CLI? the powerdown command has no help Will that powerdown safely? I am trying to script it from another machine so I can powerdown everything in case of a storm or something. Thanks.
  9. Cool. To note though: If Sonarr is running as a plugin then all it's writes would not be to network shares but local shares and I don't think that would trigger the issue I was/are having. I think it's specific to Sonarr writing to unraid over the network. I THINK.
  10. EDIT: This might be the wrong thread for the issue I am having. I am really not sure so, whatever..... :-) I still don't have a clear indication of what is causing the problem to be honest. There is a core issue, and then there is a residual issue caused by the core issue, I think, but I am stable now and not really testing anymore. I can tell you that stopping the drive spin downs and turning off the cache drive (and thus the mover) did not solve the problem for me so I don't think the mover causes the issue. I think it is affected by the residual issue. However, I also haven't had a lockup for two weeks and I think that was due to one change made that doesn't involve unraid. I disabled Sonarr (Which runs on another server, not unraid) from being able to rename and move files to unraid. I had turned this on a while back so that I could stop using a different media sorter. I went back to my old media sorter and it's been stable. This is hardly conclusive, I know, but here is what I found: The Core Issue Something, in this case, for me, it was Sonarr, is writing over the network and triggers the core issue. This causes the samba duplicate processes as it's write has failed and it's retrying over and over again. Once the array is in this state it must be hard rebooted. All attempts to stop the array manually or kill processes fail. The hard reboot CAUSES THE RESIDUAL ISSUE. Now, while the core issue has been triggered, samba stops responding to network requests but it also locks files on the array, so while in this state the mover will not run either. It runs and hangs due to the general state of the array and it's processes. When you find the machine in this state it SEEMS the mover has caused the issue but I it happens to me with the cache drive disabled completely. The Residual Issue After the hard reboot, the filesystem on some or all of the drives is in a bad state. Transactions need to be replayed and sorted and the FS needs to be checked. Unraid, not knowing what has happened before does not know to do this for you. After every single crash since I have been troubleshooting, unraid simply reboots and tries to do a parity check, it does not check for issues with the transaction logs on the disk filesystems, yet if I stop the array, put it into maintenance mode and check the FS manually I get ton of replaying journal messages and they are ALWAYS on disks that were hung before the reboot (I can check that by looking at lsof output before rebooting). So, if you DON'T do the filesystem checks after a reboot then the mover can hang AGAIN because of FS issues and it SEEMS like it is the same issue. I verified this by shutting down all the apps I have on the net which write to the array and running the mover immediately after a reboot WITHOUT running FS checks. But this lockup is not the same because samba is not misbehaving. So, how did I test?: I turned off disk spindown and the cache drive altogether at Tom's suggestion. Those have been off since he asked so all of my testing since is with them in that state, eliminating the mover or disk spindown as the cause of the issue. I shutdown both applications on my network that do writes to the array, at different times, to find the cuplrit. Sonarr and Emby are the two programs and neither runs on the array itself. Sonarr moves files to the array and Emby writes some metadata from time to time. I created different users for every machine on my network so I could trace the one causing the samba issues. What I found was: The one causing/triggering the issues was always the server running both Emby and Sonarr. Reads never seem to cause the problems. Writes are what triggers the CORE issue and ONLY writes over the network. With both Emby/Sonarr shutdown the array was stable. After every hard reboot there needed to be FS checks run first or even writing data to the array using windows explorer would cause lockups and failures. Eventually I ran Emby only for a while and it was still stable. When I turned Sonarr back on the problem came back. So, I reconfigured Sonarr to stop writing to the array at all and started using MetaBrowser, which I used to use, to sort and write my TV progs to the array. In this configuration my array has been up for an hour shy of two full weeks. The only conclusions I can draw are: Writes over the network are causing the issue, but not all writes. ----------------------------------------------------------------------------------------------------- I have no idea if it's Sonarr, or the WAY Sonarr is doing it's writes, or if it's Samba that is the problem. (I think we can assume that it's not Sonarr exclusively that is the issue because not everyone having problems is running it.) I have no idea if Reiserfs is a factor in the CORE issue. (Suggestions in this thread seem to point that way.) I don't know that this post does anything but add more confusion to the issue. :-/
  11. That is the speed at which the parity drive is read without any other drives to read and calc parity. Ok, so are there some sample parity check and rebuild times using the 8tb drives With a 4+4R0 Parity? That would be my main concern.
  12. So I am trying to understand the speeds here. Considering pkn's last post it SOUNDS like speeds are great with the archive drives as long as you use a non truncating raid card and two 4tbs for parity. What is the "free-falling" speed mean. 330MB/s sounds insane to me but I am only hardware and my parity and rebuilds top out around 80MB/s
  13. I just tried that and I didn't get any redball. Yeah, me neither, I run smart reports on a schedule, daily, at night, and never had a drive redball like that. Both times mine dropped, I was streaming video.
  14. Thanks for the link. I setup a dual boot with it on my flash drive. The next time I have to reboot or have issues I think I will give RC5 a shot. I'm also very interested to see if my parity check and rebuild speeds go up, like so many have reported. Currently, I am stable. Last night I copied a bunch of stuff to the array, via the cache drive, and the mover ran without issue later that evening.
  15. I'd like to have RC5 as a fallback. Anyone know where I can get a copy? I've dug through the forums but can't find any links.
  16. Interesting! Definitely keep me posted, either via this thread or PM. I ordered a couple of new mini-sas to sata cables from Monoprice with the same intention. If that doesn't fix the issue I'm going to pull the trigger on the IBM controller as well. My plan seems to have work so far. I shutdown the array Upgraded to RC11 Swapped out my super.dat with one before the most recent failure Moved the cache to the disk 9 spot and the red balled, but good, drive to the cache drive slot Powered up and started a parity check - it found about 1100 changes which makes some sense and since the data on the disk9 and the rest of the array is "known good" correcting parity should be the way to go. Data seems intact and the array is stable for now. Not sure if I am going to go ahead and do the cable swap once they are here or wait to see if I have problems again.
  17. This seems to describe the problem I am having but they are in a windows environment. - http://hardforum.com/showthread.php?t=1584144 Here is a thread in this forum that seems to be the same issue - http://lime-technology.com/forum/index.php?topic=25779.msg225810 This thread suggests there is some strange underlying incompatibility between certain mobos and the Supermicro card. - http://lime-technology.com/forum/index.php?topic=26391.msg231144 (funny since my mobo is a Supermicro) Should I return this card for the SAS2LP? Maybe an IBM M1015?
  18. Thanks. That was the kind of help I was hoping for. I am also pretty sure it's not a drive issue since the drive it happened to before was replaced. The part that gets me is that it's the same port. If it had happened to a different port I would be sure it was the controller card or cable. I have a backup of the superblock.dat from after the March 5th rebuild where it was once again stable. Since I am 95% certain the current disk9 drive is OK I could restore that and get back to a protected state asap. The other idea I had was to shutdown, move my cache drive to the disk9 slot, put a replacement spare in the old cache drive spot and rebuild. That way if the same port fails again it won't break parity. I will research the firmware issue. Currently it's on 3.1.0.15N which is what it shipped with. I still welcome any additional notes or analysis.
  19. I found the old syslog which I'd saved as well: Mar 5 22:02:33 Tower2 kernel: sd 1:0:4:0: [sdr] command f276e840 timed out Mar 5 22:02:33 Tower2 kernel: sd 1:0:4:0: [sdr] command f276e000 timed out Mar 5 22:02:39 Tower2 kernel: sd 1:0:4:0: [sdr] command f76d2480 timed out Mar 5 22:02:39 Tower2 kernel: sas: Enter sas_scsi_recover_host busy: 3 failed: 3 Mar 5 22:02:39 Tower2 kernel: sas: trying to find task 0xc4b1a140 Mar 5 22:02:39 Tower2 kernel: sas: sas_scsi_find_task: aborting task 0xc4b1a140 Mar 5 22:02:39 Tower2 kernel: sas: sas_scsi_find_task: task 0xc4b1a140 is aborted Mar 5 22:02:39 Tower2 kernel: sas: sas_eh_handle_sas_errors: task 0xc4b1a140 is aborted Mar 5 22:02:39 Tower2 kernel: sas: ata15: end_device-1:4: cmd error handler Mar 5 22:02:39 Tower2 kernel: sas: ata11: end_device-1:0: dev error handler Mar 5 22:02:39 Tower2 kernel: sas: ata12: end_device-1:1: dev error handler Mar 5 22:02:39 Tower2 kernel: sas: ata13: end_device-1:2: dev error handler Mar 5 22:02:39 Tower2 kernel: sas: ata14: end_device-1:3: dev error handler Mar 5 22:02:39 Tower2 kernel: sas: ata15: end_device-1:4: dev error handler Mar 5 22:02:39 Tower2 kernel: sas: ata16: end_device-1:5: dev error handler Mar 5 22:02:39 Tower2 kernel: sas: ata17: end_device-1:6: dev error handler Mar 5 22:02:39 Tower2 kernel: ata15.00: exception Emask 0x1 SAct 0x7 SErr 0x0 action 0x6 frozen Mar 5 22:02:39 Tower2 kernel: ata15.00: failed command: READ FPDMA QUEUED Mar 5 22:02:39 Tower2 kernel: sas: ata18: end_device-1:7: dev error handler Mar 5 22:02:39 Tower2 kernel: ata15.00: cmd 60/00:00:37:5b:d6/02:00:28:00:00/40 tag 0 ncq 262144 in Mar 5 22:02:39 Tower2 kernel: res 41/04:af:88:5a:d6/00:00:28:00:00/40 Emask 0x1 (device error) Mar 5 22:02:39 Tower2 kernel: ata15.00: status: { DRDY ERR } Mar 5 22:02:39 Tower2 kernel: ata15.00: error: { ABRT } Mar 5 22:02:39 Tower2 kernel: ata15.00: failed command: READ FPDMA QUEUED Mar 5 22:02:39 Tower2 kernel: ata15.00: cmd 60/00:00:37:59:d6/02:00:28:00:00/40 tag 1 ncq 262144 in Mar 5 22:02:39 Tower2 kernel: res 41/04:af:88:5a:d6/00:00:28:00:00/40 Emask 0x5 (timeout) Mar 5 22:02:39 Tower2 kernel: ata15.00: status: { DRDY ERR } Mar 5 22:02:39 Tower2 kernel: ata15.00: error: { ABRT } Mar 5 22:02:39 Tower2 kernel: ata15.00: failed command: READ FPDMA QUEUED Mar 5 22:02:39 Tower2 kernel: ata15.00: cmd 60/00:00:37:5d:d6/02:00:28:00:00/40 tag 2 ncq 262144 in Mar 5 22:02:39 Tower2 kernel: res 41/04:af:88:5a:d6/00:00:28:00:00/40 Emask 0x1 (device error) Mar 5 22:02:39 Tower2 kernel: ata15.00: status: { DRDY ERR } Mar 5 22:02:39 Tower2 kernel: ata15.00: error: { ABRT } Mar 5 22:02:39 Tower2 kernel: ata15: hard resetting link Mar 5 22:02:42 Tower2 kernel: mvsas 0000:01:00.0: Phy4 : No sig fis Mar 5 22:02:42 Tower2 kernel: drivers/scsi/mvsas/mv_sas.c 1521:mvs_I_T_nexus_reset for device[4]:rc= 0 Mar 5 22:02:45 Tower2 kernel: drivers/scsi/mvsas/mv_sas.c 1951:Release slot [0] tag[0], task [c4b1b900]: Mar 5 22:02:45 Tower2 kernel: sas: sas_ata_task_done: SAS error 8a Mar 5 22:02:45 Tower2 kernel: ata15.00: failed to IDENTIFY (I/O error, err_mask=0x11) Mar 5 22:02:45 Tower2 kernel: sas: sas_form_port: phy4 belongs to port4 already(1)! Mar 5 22:02:45 Tower2 kernel: ata15.00: revalidation failed (errno=-5) Mar 5 22:02:48 Tower2 kernel: ata15: hard resetting link Mar 5 22:02:53 Tower2 kernel: ata15.00: qc timeout (cmd 0xec) Mar 5 22:02:53 Tower2 kernel: ata15.00: failed to IDENTIFY (I/O error, err_mask=0x5) Mar 5 22:02:53 Tower2 kernel: ata15.00: revalidation failed (errno=-5) Mar 5 22:02:53 Tower2 kernel: ata15: hard resetting link Mar 5 22:02:56 Tower2 kernel: mvsas 0000:01:00.0: Phy4 : No sig fis Mar 5 22:02:56 Tower2 kernel: drivers/scsi/mvsas/mv_sas.c 1521:mvs_I_T_nexus_reset for device[4]:rc= 0 Mar 5 22:02:59 Tower2 kernel: drivers/scsi/mvsas/mv_sas.c 1951:Release slot [0] tag[0], task [c4b1b900]: Mar 5 22:02:59 Tower2 kernel: sas: sas_ata_task_done: SAS error 8a Mar 5 22:02:59 Tower2 kernel: ata15.00: failed to IDENTIFY (I/O error, err_mask=0x11) Mar 5 22:02:59 Tower2 kernel: ata15.00: revalidation failed (errno=-5) Mar 5 22:02:59 Tower2 kernel: ata15.00: disabled Mar 5 22:02:59 Tower2 kernel: ata15: EH complete Mar 5 22:02:59 Tower2 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] Unhandled error code Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] Result: hostbyte=0x04 driverbyte=0x00 Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] CDB: cdb[0]=0x28: 28 00 28 d6 5d 37 00 02 00 00 Mar 5 22:02:59 Tower2 kernel: end_request: I/O error, dev sdr, sector 685137207 Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] Unhandled error code Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] Result: hostbyte=0x04 driverbyte=0x00 Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] CDB: cdb[0]=0x28: 28 00 28 d6 59 37 00 02 00 00 Mar 5 22:02:59 Tower2 kernel: end_request: I/O error, dev sdr, sector 685136183 Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] Unhandled error code Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] Result: hostbyte=0x04 driverbyte=0x00 Mar 5 22:02:59 Tower2 kernel: sd 1:0:4:0: [sdr] CDB: cdb[0]=0x28: 28 00 28 d6 5b 37 00 02 00 00 Mar 5 22:02:59 Tower2 kernel: end_request: I/O error, dev sdr, sector 685136695 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137144/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137152/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137160/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137168/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137176/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137184/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137192/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137200/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137208/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137216/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137224/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137232/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137240/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137248/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137256/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137264/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137272/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137280/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137288/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137296/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137304/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137312/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137320/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137328/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137336/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137344/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error Mar 5 22:02:59 Tower2 kernel: handle_stripe read error: 685137352/9, count: 1 Mar 5 22:02:59 Tower2 kernel: md: disk9 read error
  20. Here are the relevant entries in the log - Mar 10 16:49:42 Tower2 kernel: sd 11:0:4:0: [sdr] command f76fe600 timed out (Drive related) Mar 10 16:49:42 Tower2 kernel: sd 11:0:4:0: [sdr] command f76fe6c0 timed out (Drive related) Mar 10 16:49:42 Tower2 kernel: sd 11:0:4:0: [sdr] command f76fea80 timed out (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: Enter sas_scsi_recover_host busy: 3 failed: 3 (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: trying to find task 0xc4916280 (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_scsi_find_task: aborting task 0xc4916280 (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_scsi_find_task: task 0xc4916280 is aborted (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_eh_handle_sas_errors: task 0xc4916280 is aborted (Errors) Mar 10 16:49:42 Tower2 kernel: sas: trying to find task 0xc4917180 (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_scsi_find_task: aborting task 0xc4917180 (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_scsi_find_task: task 0xc4917180 is aborted (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_eh_handle_sas_errors: task 0xc4917180 is aborted (Errors) Mar 10 16:49:42 Tower2 kernel: sas: trying to find task 0xc4916140 (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_scsi_find_task: aborting task 0xc4916140 (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_scsi_find_task: task 0xc4916140 is aborted (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: sas_eh_handle_sas_errors: task 0xc4916140 is aborted (Errors) Mar 10 16:49:42 Tower2 kernel: sas: ata15: end_device-11:4: cmd error handler (Errors) Mar 10 16:49:42 Tower2 kernel: sas: ata11: end_device-11:0: dev error handler (Errors) Mar 10 16:49:42 Tower2 kernel: sas: ata12: end_device-11:1: dev error handler (Errors) Mar 10 16:49:42 Tower2 kernel: sas: ata13: end_device-11:2: dev error handler (Errors) Mar 10 16:49:42 Tower2 kernel: sas: ata14: end_device-11:3: dev error handler (Errors) Mar 10 16:49:42 Tower2 kernel: sas: ata15: end_device-11:4: dev error handler (Errors) Mar 10 16:49:42 Tower2 kernel: ata15.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen (Errors) Mar 10 16:49:42 Tower2 kernel: sas: ata16: end_device-11:5: dev error handler (Errors) Mar 10 16:49:42 Tower2 kernel: ata15.00: failed command: READ FPDMA QUEUED (Minor Issues) Mar 10 16:49:42 Tower2 kernel: ata15.00: cmd 60/00:00:90:b8:a8/02:00:04:00:00/40 tag 0 ncq 262144 in (Drive related) Mar 10 16:49:42 Tower2 kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors) Mar 10 16:49:42 Tower2 kernel: sas: ata17: end_device-11:6: dev error handler (Errors) Mar 10 16:49:42 Tower2 kernel: ata15.00: status: { DRDY } (Drive related) Mar 10 16:49:42 Tower2 kernel: sas: ata18: end_device-11:7: dev error handler (Errors) Mar 10 16:49:42 Tower2 kernel: ata15.00: failed command: READ FPDMA QUEUED (Minor Issues) Mar 10 16:49:42 Tower2 kernel: ata15.00: cmd 60/00:00:90:b6:a8/02:00:04:00:00/40 tag 1 ncq 262144 in (Drive related) Mar 10 16:49:42 Tower2 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) (Errors) Mar 10 16:49:42 Tower2 kernel: ata15.00: status: { DRDY } (Drive related) Mar 10 16:49:42 Tower2 kernel: ata15.00: failed command: READ FPDMA QUEUED (Minor Issues) Mar 10 16:49:42 Tower2 kernel: ata15.00: cmd 60/00:00:90:ba:a8/02:00:04:00:00/40 tag 2 ncq 262144 in (Drive related) Mar 10 16:49:42 Tower2 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) (Errors) Mar 10 16:49:42 Tower2 kernel: ata15.00: status: { DRDY } (Drive related) Mar 10 16:49:42 Tower2 kernel: ata15: hard resetting link (Minor Issues) Mar 10 16:49:44 Tower2 kernel: mvsas 0000:01:00.0: Phy4 : No sig fis (Drive related) Mar 10 16:49:44 Tower2 kernel: drivers/scsi/mvsas/mv_sas.c 1521:mvs_I_T_nexus_reset for device[4]:rc= 0 (System) Mar 10 16:49:44 Tower2 kernel: sas: sas_form_port: phy4 belongs to port4 already(1)! (Drive related) Mar 10 16:49:50 Tower2 kernel: ata15.00: qc timeout (cmd 0x27) (Drive related) Mar 10 16:49:50 Tower2 kernel: ata15.00: failed to read native max address (err_mask=0x4) (Minor Issues) Mar 10 16:49:50 Tower2 kernel: ata15.00: HPA support seems broken, skipping HPA handling (Minor Issues) Mar 10 16:49:50 Tower2 kernel: ata15.00: revalidation failed (errno=-5) (Minor Issues) Mar 10 16:49:50 Tower2 kernel: ata15: hard resetting link (Minor Issues) Mar 10 16:49:52 Tower2 kernel: mvsas 0000:01:00.0: Phy4 : No sig fis (Drive related) Mar 10 16:49:52 Tower2 kernel: drivers/scsi/mvsas/mv_sas.c 1521:mvs_I_T_nexus_reset for device[4]:rc= 0 (System) Mar 10 16:49:56 Tower2 kernel: drivers/scsi/mvsas/mv_sas.c 1951:Release slot [0] tag[0], task [c4916140]: (System) Mar 10 16:49:56 Tower2 kernel: sas: sas_ata_task_done: SAS error 8a (Errors) Mar 10 16:49:56 Tower2 kernel: ata15.00: failed to set xfermode (err_mask=0x11) (Minor Issues) Mar 10 16:49:56 Tower2 kernel: ata15.00: limiting speed to UDMA/133:PIO3 (Minor Issues) Mar 10 16:49:56 Tower2 kernel: sas: sas_form_port: phy4 belongs to port4 already(1)! (Drive related) Mar 10 16:49:58 Tower2 kernel: ata15: hard resetting link (Minor Issues) Mar 10 16:50:03 Tower2 kernel: ata15.00: qc timeout (cmd 0xec) (Drive related) Mar 10 16:50:03 Tower2 kernel: ata15.00: failed to IDENTIFY (I/O error, err_mask=0x5) (Errors) Mar 10 16:50:03 Tower2 kernel: ata15.00: revalidation failed (errno=-5) (Minor Issues) Mar 10 16:50:03 Tower2 kernel: ata15.00: disabled (Errors) Mar 10 16:50:03 Tower2 kernel: ata15.00: device reported invalid CHS sector 0 (Drive related) Mar 10 16:50:03 Tower2 last message repeated 2 times Mar 10 16:50:03 Tower2 kernel: ata15: hard resetting link (Minor Issues) Mar 10 16:50:06 Tower2 kernel: mvsas 0000:01:00.0: Phy4 : No sig fis (Drive related) Mar 10 16:50:06 Tower2 kernel: drivers/scsi/mvsas/mv_sas.c 1521:mvs_I_T_nexus_reset for device[4]:rc= 0 (System) Mar 10 16:50:06 Tower2 kernel: ata15: EH complete (Drive related) Mar 10 16:50:06 Tower2 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 (Drive related) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Unhandled error code (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Result: hostbyte=0x04 driverbyte=0x00 (System) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] CDB: cdb[0]=0x28: 28 00 04 a8 ba 90 00 02 00 00 (Drive related) Mar 10 16:50:06 Tower2 kernel: end_request: I/O error, dev sdr, sector 78166672 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166608/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Unhandled error code (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166616/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: Result: hostbyte=0x04 driverbyte=0x00 (System) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] CDB: cdb[0]=0x28: 28 00 04 a8 b6 90 00 02 00 00 (Drive related) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: end_request: I/O error, dev sdr, sector 78165648 (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166624/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166632/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166640/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166648/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Unhandled error code (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Result: hostbyte=0x04 driverbyte=0x00 (System) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] CDB: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166656/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: cdb[0]=0x28: 28 00 04 a8 b8 90 00 02 00 00 Mar 10 16:50:06 Tower2 kernel: end_request: I/O error, dev sdr, sector 78166160 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166664/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166672/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166680/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166688/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166696/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166704/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] READ CAPACITY(16) failed (Drive related) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166712/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Result: hostbyte=0x04 driverbyte=0x00 (System) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Sense not available. (Drive related) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166720/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166728/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] READ CAPACITY failed (Drive related) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Result: hostbyte=0x04 driverbyte=0x00 (System) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Sense not available. (Drive related) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166736/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166744/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166752/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Truncating mode parameter data from 8226 to 512 bytes (Drive related) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166760/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Got wrong page (Drive related) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166768/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: sd 11:0:4:0: [sdr] Assuming drive cache: write through (Drive related) Mar 10 16:50:06 Tower2 kernel: sdr: detected capacity change from 3000592982016 to 0 (Drive related) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166776/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166784/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166792/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166800/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166808/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166816/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166824/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166832/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166840/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166848/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166856/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166864/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166872/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166880/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166888/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166896/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166904/9, count: 1 (Errors) Mar 10 16:50:06 Tower2 kernel: md: disk9 read error (Errors) Mar 10 16:50:06 Tower2 kernel: handle_stripe read error: 78166912/9, count: 1 (Errors)
  21. Syslog attached but zipped due to size. Recently swapped out two older controllers, an adaptec 4 port and a Promise 4 port PCI for a new Supermicro 8 port SAS/SATA pci express. (AOC-SASLP-MV8) Everything seemed fine until disk9 dropped off the controller on may 5th with stripe errors exactly like those in the current attached log. I swapped out that disk for a brand new one and tested the suspect disk in another system and it seems to be ok. Rebuild went fine and the system has been chugging along until today. Once again, today, while streaming video, I got a stutter and freeze. Went to the unraid and indeed the disk9 has red balled yet again. Could this be a bad cable? I am using 3ware min sas to sata. If so why would it only happen once in a while? Any analysis would be helpful. Thanks. syslog-2013-03-10.zip
  22. Ok there are two bits of info I have found: http://lime-technology.com/wiki/index.php?title=Plugin/webGui/Share_Settings and http://lime-technology.com/forum/index.php?topic=22146 But I can't say I understand them completely. It sounds like the setting scan get you into a state where if a disk is full data will never get written elsewhere. If that's the case then how do you correct this without changing split-level?
  23. I think I am having the same issue. I am running rc10. Everything is normal until: Jan 30 03:42:31 Tower2 logger: ./TV/Parades End/Season 1/metadata/S01E02 - Episode 2-mediainfo.data Jan 30 03:42:31 Tower2 logger: .d..t...... TV/ Jan 30 03:42:31 Tower2 logger: .d..t...... TV/Parades End/ Jan 30 03:42:31 Tower2 logger: .d..t...... TV/Parades End/Season 1/ Jan 30 03:42:31 Tower2 logger: .d..t...... TV/Parades End/Season 1/metadata/ Jan 30 03:42:31 Tower2 logger: >f+++++++++ TV/Parades End/Season 1/metadata/S01E02 - Episode 2-mediainfo.data Jan 30 03:42:31 Tower2 shfs/user0: shfs_create: open: /mnt/disk6/TV/Parades End/Season 1/metadata/S01E02 - Episode 2-mediainfo.data (28) No space left on device Jan 30 03:42:31 Tower2 logger: rsync: open "/mnt/user0/TV/Parades End/Season 1/metadata/S01E02 - Episode 2-mediainfo.data" failed: No space left on device (28) Jan 30 03:42:31 Tower2 logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1042) [sender=3.0.7] (Errors) Jan 30 03:42:31 Tower2 logger: ./TV/Parades End/Season 1/metadata/S01E05 - Episode 5-mediainfo.data Jan 30 03:42:31 Tower2 logger: >f+++++++++ TV/Parades End/Season 1/metadata/S01E05 - Episode 5-mediainfo.data Jan 30 03:42:31 Tower2 shfs/user0: shfs_create: open: /mnt/disk6/TV/Parades End/Season 1/metadata/S01E05 - Episode 5-mediainfo.data (28) No space left on device Jan 30 03:42:31 Tower2 logger: rsync: open "/mnt/user0/TV/Parades End/Season 1/metadata/S01E05 - Episode 5-mediainfo.data" failed: No space left on device (28) Jan 30 03:42:31 Tower2 logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1042) [sender=3.0.7] (Errors) Jan 30 03:42:31 Tower2 logger: ./TV/Parades End/Season 1/metadata/S01E03 - Episode 3-mediainfo.data Jan 30 03:42:31 Tower2 logger: >f+++++++++ TV/Parades End/Season 1/metadata/S01E03 - Episode 3-mediainfo.data Jan 30 03:42:31 Tower2 shfs/user0: shfs_create: open: /mnt/disk6/TV/Parades End/Season 1/metadata/S01E03 - Episode 3-mediainfo.data (28) No space left on device Jan 30 03:42:31 Tower2 logger: rsync: open "/mnt/user0/TV/Parades End/Season 1/metadata/S01E03 - Episode 3-mediainfo.data" failed: No space left on device (28) Jan 30 03:42:31 Tower2 logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1042) [sender=3.0.7] (Errors) Jan 30 03:42:31 Tower2 logger: ./TV/Parades End/Season 1/metadata/S01E01 - Episode 1-mediainfo.data Jan 30 03:42:31 Tower2 logger: >f+++++++++ TV/Parades End/Season 1/metadata/S01E01 - Episode 1-mediainfo.data Jan 30 03:42:31 Tower2 shfs/user0: shfs_create: open: /mnt/disk6/TV/Parades End/Season 1/metadata/S01E01 - Episode 1-mediainfo.data (28) No space left on device Jan 30 03:42:31 Tower2 logger: rsync: open "/mnt/user0/TV/Parades End/Season 1/metadata/S01E01 - Episode 1-mediainfo.data" failed: No space left on device (28) Jan 30 03:42:31 Tower2 logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1042) [sender=3.0.7] (Errors) Jan 30 03:42:31 Tower2 logger: ./TV/Parades End/Season 1/metadata/S01E04 - Episode 4-mediainfo.data Jan 30 03:42:31 Tower2 logger: >f+++++++++ TV/Parades End/Season 1/metadata/S01E04 - Episode 4-mediainfo.data Jan 30 03:42:31 Tower2 shfs/user0: shfs_create: open: /mnt/disk6/TV/Parades End/Season 1/metadata/S01E04 - Episode 4-mediainfo.data (28) No space left on device Jan 30 03:42:31 Tower2 logger: rsync: open "/mnt/user0/TV/Parades End/Season 1/metadata/S01E04 - Episode 4-mediainfo.data" failed: No space left on device (28) Jan 30 03:42:31 Tower2 logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1042) [sender=3.0.7] (Errors) Also, aflores3, what gui is that in your screen shots. I assume it's some kind of skin.