Jump to content

John_M

Members
  • Content Count

    3805
  • Joined

  • Last visited

  • Days Won

    10

John_M last won the day on November 29 2018

John_M had the most liked content!

Community Reputation

239 Very Good

About John_M

  • Rank
    Away for much longer than I expected

Converted

  • Gender
    Male
  • Location
    London

Recent Profile Visitors

1454 profile views
  1. Did you try running without dockers and VMs?
  2. Try not using the Marvell controller.
  3. From the screen shot it looks like your boot flash is corrupt. What is that reference to nvidia-smi in line 4? Are you not using stock Unraid? I suggest you do until you get the problem fixed.
  4. You have a Marvell-based disk controller 03:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller [1b4b:9235] (rev 11) It's based on the 9235 chip, which seems much less troublesome than the RAID-enabled 9230, which has caused a lot of problems recently for a number of people. I've actually got the 9235 in one of my servers and it has never caused me any problems but different people's experiences vary so much. Interestingly, the 9235 isn't included on the original list, while the 9230 is:
  5. Yes, Unraid keeps track of disks by their serial numbers so the actual port it's connected to doesn't matter. That's the case with motherboard SATA ports and simple add-in cards. There's clearly something different about your setup that's causing the issue that you're seeing but as you've provided no information I can only guess. Are you using a hardware RAID controller or SAS disks, perhaps? Post your diagnostics for both configurations.
  6. You don't say which version you're using but that's a known issue with the current stable release. It's fixed in version 6.8, which is available for testing.
  7. Also make sure that your server is whilelisted by any adblockers you're using.
  8. If it's set to autostart it will fail with a "too many missing disks" error. The work-around would be to prevent the server from restarting automatically when power is restored. It's much better to start it manually after an outage because you can wait a while until you are confident that the power is not likely to drop out again. Remember that your system is vulnerable at this point, with partially depleted batteries and quite possibly insufficient capacity to allow for another controlled shutdown. Ideally you would wait until the UPS has fully charged but I can't imagine many people being willing to do that!
  9. More recent threads here: and here:
  10. The news broke in this thread and at first some people were either not affected or could work round the problem by disabling IOMMU. With each new Linux kernel the situation has become worse and we are now at the point where anyone using one of the controllers on the list risks having the disks connected to it drop offline randomly. There is one Marvell controller that isn't on the list that seems less problematic than the rest - it's the 9235, which is the non-RAID capable version of the 9230, which ironically seems to be one of the more problematic. I still use the one I mention in this thread in one of my servers though I wouldn't want to appear to be encouraging anyone to use it because I'm fully expecting it to break one day with a new Linux kernel. For the moment it's fine but some people are worse affected anyway than others. In other servers I used to use the popular SAS2LP-MV8 controller, which was capable of controlling eight SATA disks straight out of the box, but I've given it up in favour of LSI-based controllers. That's the best choice for controlling eight SATA disks. For a simple 2-port SATA controller, the ones that use the ASMedia ASM1061 or ASM1062 chips are reliable. Look at the syslog from the very first diagnostics you posted: Nov 12 17:24:24 TrueSource kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Nov 12 17:24:24 TrueSource kernel: ata11.00: failed command: WRITE DMA EXT Nov 12 17:24:24 TrueSource kernel: ata11.00: cmd 35/00:40:d0:dd:24/00:05:05:00:00/e0 tag 9 dma 688128 out Nov 12 17:24:24 TrueSource kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 12 17:24:24 TrueSource kernel: ata11.00: status: { DRDY } Nov 12 17:24:24 TrueSource kernel: ata11: hard resetting link Nov 12 17:24:25 TrueSource kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Nov 12 17:24:30 TrueSource kernel: ata11.00: qc timeout (cmd 0xec) Nov 12 17:24:31 TrueSource kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x4) Nov 12 17:24:31 TrueSource kernel: ata11.00: revalidation failed (errno=-5) Nov 12 17:24:31 TrueSource kernel: ata11: hard resetting link Nov 12 17:24:31 TrueSource kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Nov 12 17:24:40 TrueSource kernel: ata12.00: exception Emask 0x0 SAct 0xfd80000 SErr 0x0 action 0x6 frozen Nov 12 17:24:40 TrueSource kernel: ata12.00: failed command: READ FPDMA QUEUED Nov 12 17:24:40 TrueSource kernel: ata12.00: cmd 60/00:98:38:e0:15/01:00:0d:00:00/40 tag 19 ncq dma 131072 in Nov 12 17:24:40 TrueSource kernel: res 40/00:00:00:b4:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 12 17:24:40 TrueSource kernel: ata12.00: status: { DRDY } The SATA link between the controller and one disk begins to fail (ata11), and then the link to another disk also begins to fail (ata12). Eventually we see this Nov 12 17:25:16 TrueSource kernel: sd 12:0:0:0: [sdi] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Nov 12 17:25:16 TrueSource kernel: sd 12:0:0:0: [sdi] tag#10 CDB: opcode=0x8a 8a 00 00 00 00 00 05 24 dd d0 00 00 05 40 00 00 Nov 12 17:25:16 TrueSource kernel: print_req_error: I/O error, dev sdi, sector 86302160 Nov 12 17:25:16 TrueSource kernel: md: disk2 write error, sector=86302096 Nov 12 17:25:16 TrueSource kernel: md: disk2 write error, sector=86302104 which fills the whole of the rest of the syslog. If we look at the SMART report for Disk2 we see that it's empty, indicating that the disk had dropped off line. That can happen with any controller and it usually indicates a bad SATA cable, or a bad powers supply (the PSU itself, or the cabling, splitters, etc), or a bad controller, or bad drive electronics. It's usually a cable problem, unless the controller is a Marvell. Looking at your most recent diagnostics Disk2 is showing its SMART report, which shows it to be healthy and connected to the controller, but it's likely to drop again. If you do a search of the forums or just type "unraid marvell" into Google you'll find numerous examples of people having the same problem as you.
  11. Disk2 is back online but it's attached to a Marvell 9230 controller, which is known to be problematic with Linux and drops disks randomly. I would recommend using a different SATA or SAS controller.
  12. That isn't how it works. Files aren't destined eventually to end up on any particular disk when they are written to cache. The destination is decided when the mover runs. Split level takes precedence over allocation method.
  13. Tools -> Diagnostics and post the resulting zip file.
  14. Well, it's up to you how you proceed from here. I can understand that not having your router will be a nuisance (that's the reason mine is a physical box) but on the other hand you want to get to the bottom of the problem. Your call.
  15. If you don't start the pfSense VM does the server remain stable?