dev_guy

Members
  • Posts

    126
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by dev_guy

  1. @MacModMachineThank you for sharing your experiences with Unraid. They mirror my own experiences. The simple fact is Truenas runs perfectly on the exact same hardware Unraid wrongly disabled perfectly good drives. And that's true right down to every SATA cable being the same. Tom, and the Unraid fan boys, need to acknowledge this is a real issue and stop blaming it on convenient excuses when it's clearly an Unraid issue. I no longer trust Unraid for storage and just use it as a last resort backup and docker platform given how many times Unraid has disabled perfectly good drives running on hardware that any other OS has no issues with. It's a very real problem and Unraid users shouldn't accept the whole hardware blame game. If Unraid disables a drive, if said drive passes diagnostics, just try a better NAS operating system instead of replacing your cable, controller, motherboard, power supply, etc, as the Unraid faithful insist is the problem. Unraid disables perfectly good drives on perfectly good hardware but the people who matter pretend the problem doesn't exist.
  2. Thank you @limetech for responding. The above is exactly the point I've been trying to make. A supposedly fault tolerant storage array is not basic Linux file I/O and Unraid should not rely on an ancient 1993 file system (XFS), a Linux distro that has very few fans (Slackware), and Fuse that has many known problems. But, as you said, that's exactly what Unraid is doing. Unraid does not handle drive I/O errors in a fault tolerant way. That's why there are so many documented examples of perfectly good drives being disabled by Unraid on this forum, Reddit, and elsewhere. Blaming it all on SATA cables and power supplies is just a convenient excuse when the exact same drive, cable, and power supply work perfectly with other storage operating systems. This is very much an Unraid problem. Why does Unraid disable drives that are not eligible for warranty replacement and work perfectly even when subjected to a 24+ hour torture test with the exact same cable, power supply, SATA interface, motherboard, etc? And why do those same drives, cables, etc, work perfectly with TrueNAS, Open Media Vault, etc? This problem is even worse in that a wrongly disabled drive leaves the array in a fragile state and the parity rebuild, that can take days, can cause another I/O error and unnecessary data loss from disabling a perfectly good drive. This is not a minor problem. This is very much an Unraid issue and it would be great if Unraid addressed it instead of blaming it on cables, etc. What's the harm in adding some decent retry code especially given the marginal foundation Unraid is based on? To put it really simply Unraid should not disable drives that work perfectly under other operating systems even with the same cable, same power supply, same SATA interface, same motherboard, are not eligible for warranty replacement, etc. By kicking perfectly good drives out of the array Unraid is needlessly putting user data at risk.
  3. This is the typical advice for those who get a drive disabled by Unraid when the problem is most often Unraid itself. Hello Microsoft? "My Windows PC crashed and won't boot!" Microsoft: "have you tried unplugging it and plugging it back in?" Yeah, that's useless advice in this case. The main issue here is how Unraid handles, and perhaps even creates, file I/O errors. It's not about SATA cables when any other operating system or even drive diagnostic software is perfectly happy with the same cable, drive, etc.
  4. Putting it in perspective, none of those attempts appear to be retries. Unraid was just blindly trying to proceed when there obviously was a problem. And those 1000 attempts happened in around 1 second or less. So there was no attempt to allow a normal period for recovery of a mechanical hard drive. As a hypothetical example if a drive gets jostled during an I/O operation it may issue errors and then has to mechanically recalibrate the heads. If Unraid just continues to beat on the on the drive while it's recalibrating it's likely to keep getting errors until the recalibration is completed. By the time drive has sorted itself out Unraid has already disabled it. A much better strategy would be for Unraid to retry on the FIRST error. If the retry fails, wait a few seconds, and try it again. Instead Unraid seems to just want to blindly go ahead trying new I/O operations and push what's often a perfectly good drive off the edge of a cliff. Unraid seems to be creating its own mess here often triggered by a brief transient glitch. EDIT: I should add the problem could be at different levels. Slackware is not an especially well regarded distro these days and struggles with newer hardware. Unraid apparently further modifies the Slackware kernel and layers other things on top of it like the known buggy Fuse. So, in terms of file I/O, Unraid isn't built on a very solid foundation especially compared to mdadm, zfs, etc.
  5. Looking at the syslog for the above I see a bunch of I/O errors in immediate succession. I don't see any retries of the same I/O command or sector. The drive (sdi) is also obviously still connected and communicating as it later passes a SMART read and is successfully unmounted with both logged before the server is shut down. I'm not saying the issue sometimes isn't a bad cable or connection. I'm just saying Unraid could handle such errors better and seems far more sensitive to such issues.
  6. Because the drives, cables, everything test fine without touching anything including the connections. So your concept of "bad connections" has the same end result which is it needlessly puts data at risk by disabling drives that no other software or operating system seems to have any issues with. Rebuilding data can be a 24+ hour process of continuous pounding on all the drives in the array which can just trigger additional problems when the array is already in a fragile state.
  7. I've already explained multiple times I don't have log data of a drive being disabled due to Unraid's unfortunate default log setup. I don't know what you mean by drives being "disconnected" as I don't think that's part of the SATA interface. The drives I'm talking about are very much still connected in that you can read the SMART data, run self tests on them, etc. but Unraid has disabled them from the array. If something is "disconnecting" drives it's likely Unraid NOT the hardware.
  8. @splendidthunder I appreciate your input and support around this being something Unraid could greatly improve if they'd only acknowledge the issue. There are ways to have Unraid write log data to a SSD cache drive/array which I, unfortunately, didn't know about until relatively recently as I never encountered that option in any of the docs, setup guides, etc. It's such a basic thing they should make it more obvious and arguably even put the logs, by default, on your SSD cache if you have one. And even if you don't have a cache, they could write the logs to the system folder on the array if you don't enable drive spin down. But instead, by default, your log is destroyed every time you power down or reboot your Unraid server which conveniently destroys the evidence of what happened to your wrongly disabled perfectly good drive. I've learned the hard way to reboot ASAP into linux to get the data off disabled drives out of fear the emulated data will also become corrupted, as it beats on all the drives for every access to the emulated data and/or during the rebuild. I've lost data that way. But the reboot, by default, destroys the logs. You are absolutely correct Unraid should be optimized for consumer grade hardware and not put data at risk by wrongly disabling drives that suffer a brief transient problem. Disabling drives very significantly puts data at risk and should be a last resort instead of happening all too often when there is no ongoing problem. I get that Unraid is trying to maintain the integrity of the parity but kicking perfectly good drives out of the array is not the best option and greatly increases risk of data loss. And you are also correct this is a common issue and the Unraid fans just blame your cables, drives, power supply, SATA interface, etc. The real issue for me is the exact same hardware works perfectly with TrueNAS, Open Media Vault, etc. and the disabled drive/cable/controller/power supply passes even extensive extended diagnostics. This seems to be something Unraid just wants to pretend isn't a problem when it really is. People complain about TrueNAS/FreeNAS being demanding on hardware but, honestly, it's been way more trouble free on the exact same hardware Unraid didn't like.
  9. I appreciate that. I'm an electrical engineer by education with extensive experience in the IT world. The SATA interface was designed to be robust and seems to work very well for years on end even with really marginal hardware, cables, etc, in most every application EXCEPT Unraid which is all too happy to disable even new NAS-grade prime drives connected with quality cables to expensive SATA cards for no good reason. I'm also repeating myself in that I don't have log data to analyze because, by default, Unraid log data is destroyed by even a reboot. But the evidence is Unraid is well known to disable perfectly good drives for no good reason. You keep bringing up the point of what is Unraid suppose to do if there's a drive error. How about a few retries? Honestly, there's lots of evidence Unraid doesn't attempt any sort of reasonable error recovery when it encounters what could be be a brief transient error reading or writing from a drive. That's perhaps the biggest issue here. Unraid seems to just fall on its face at the slightest I/O error. As I mentioned previously my dog might be chasing his toy and accidentally run into my Unraid server sitting on the floor. Unraid might be doing some file I/O when the server is jostled and the result seems to be Unraid would rather disable a drive from my clumsy dog rather than retry the same operation a second later when it would work perfectly. That's the problem the Unraid fan boys don't want to acknowledge. The whole "I've been running Unraid for years with no issues" argument is roughly the same as "I've never had a drive fail so I don't need to do backups" argument. My anecdotal evidence with multiple perfectly good Unraid drives being disabled is equally as valid as anyone else's anecdotal evidence of not having Unraid problems. I'd really like to hear Tom weigh in on this issue as it's a genuinely serious issue. Disabling a drive is not a trivial thing. The recovery process can easily result in permanent unrecoverable data loss and that risk is entirely preventable if Unraid wouldn't disable perfectly good drives in the first place.
  10. And to clarify if a disk can't be written to at all, or is experiencing easily reproducible problems, I have NO issue with it being disabled and kicked out of the array. But that's not we're discussing here. Basically if a drive works perfectly with everything you can throw at it after Unraid disabled it, something is wrong with Unraid.
  11. But that's the whole point. It's EXTREMELY unlikely the disk is that broken when the same disk, untouched, works perfectly. When Unraid disables a disk the first thing I do is shut down the server and reboot it into Linux from a USB thumb drive. And, guess what, the drive Unraid disabled works just fine, passes all diagnostics, has no unrecoverable errors in the SMART data, etc. If it's a full moon and the wind blows the wrong way Unraid seems to disable perfectly good drives. I don't know exactly why but that's very much my experience and the experience of many others. I sincerely doubt Unraid has any sort of robust retry logic and instead simply disables a drive at the first hint of trouble.
  12. And that goes back to my comments about Unraid having a fragile parity scheme. Let's face it. Unraid doesn't use a proven open source fault tolerant file system like the well proven Linux MDADM, ZFS, etc. AFAIK, it uses a unique proprietary parity system built on top of an ancient 1993 file system (XFS), the rather buggy unstable FUSE layer on top of that, and some proprietary "glue" to make it all sort of work. Yeah it has some advantages but it clearly has some disadvantages as well in performance and in disabling perfectly good drives.
  13. As I've said that's not been my experience at all. I've had the opposite experience. Every drive Unraid has disabled, and there have been several over the years, have been perfectly fine with often only a single CRC error in the SMART data. There have been countless other posts on this forum of similar experiences of drives that test fine but only have one or more CRC errors and Unraid disabled them. The common wisdom is to replace the SATA cable, replace the disk controller, replace the power supply, etc. But, in reality, the user might be better off replacing Unraid with a more fault tolerant operating system that doesn't cry wolf or that the sky is falling when there is no real problem with hardware other NAS operating systems are perfectly happy with.
  14. The problem is it can take 24+ hours of pounding on ALL the drives in the array to rebuild a drive that was wrongly disabled and that opens the door to causing another failure during the rebuild process. There is clearly something more sensitive and failure prone in the criteria Unraid uses to disable drives.
  15. I can only repeat what I've already said that a large number of Unraid users have had perfectly good drives disabled while this almost never happens with other NAS operating systems. Even some very experienced users here have admitted it's a side effect of how Unraid's parity works. I can't factually dispute if there is or isn't a write error but it doesn't really matter if the result is a perfectly good drive being disabled. A drive that will pass 24+ hours of continuous testing and then go on to work perfectly in a TrueNAS ZFS or Open Media Vault RAID array for a year or more. And, as I've said, this is true even with the exact same SATA cables, controller, motherboard, power supply, chassis, etc. It's also true every drive Unraid has disabled on me are not eligible for warranty replacement as they pass all diagnostics and work perfectly with everything but Unraid.
  16. Thanks for supporting my position. I'm kind of amazed how the Unraid fans try to dismiss these issues as being some other problem when everything points to the real issue being Unraid. There's a huge amount of evidence to support Unraid wrongly disables perfectly good drives. Yet, rather than acknowledge the issue, Unraid, and their fan boys, seem to prefer to ignore and/or discredit anyone who suffered potential data loss from this very real issue. It's really disappointing. And, as is nearly always the case lately, Tom Mortensen the man responsible for Unraid, is silent on the issue despite being all too happy to take our money.
  17. You seem to be a devout Unraid fan boy where Unraid can do no wrong. But, the reality is Unraid increasingly does a lot wrong as can be easily proven.
  18. You seem to live in a fantasy world. I've had multiple Unraid severs disable perfectly good drives multiple times. You seem to want to ignore the real issues here.
  19. I have 107 posts to the forum and have been an Unraid user for many years so this is hardly my first post. But I appreciate your comments. I'm aware of how CRC errors work and know they're a function of the SATA protocol, firmware, etc. My point is they only seem to be an issue with Unraid. I've literally had Unraid disable a drive and then used the exact same hardware, drives, cables, motherboard, chassis, power supply, etc. to run TrueNAS Core and it's still running perfectly with zero issues. TrueNAS is perfectly happy with the exact same hardware Unraid wasn't happy with. I've also never had Open Media Vault, Synology, or Qnap disable a perfectly good drive. I realize that's anecdotal evidence but many others here have had similar experiences with Unraid disabling perfectly good drives that test fine and/or no other OS has an issue with. There's a lot of evidence it's an Unraid issue. Unraid seems unique in its propensity to disable perfectly good drives for even a single CRC error. I can't prove if the CRC error triggered a write error but I've changed my logging to try and trap such events in the future. Regardless, in my experience, 4 other NAS operating systems don't have this problem but Unraid does. It's as simple as that. It can be a serious problem for Unraid to wrongly disable a drive as the rebuild process can create other problems and/or data can be lost in other ways. Part of the issue here is Unraid's parity system is inherently fragile. A single bit error anywhere in the array can destroy your ability to recover from a drive failure. And, to make it worse, the process of rebuilding parity after a failure is rather likely to trigger a new CRC error and perhaps disable another drive. If two drives are disabled a normal Unraid config falls on its face. The good news, if you can call it that, is you're only likely to lose the files on the failed drive(s) not the entire array. In many cases, the data on the "failed" Unraid drive is just fine and is 100% readable despite Unraid disabling the drive. But a user has to know the best practice is arguably to get the data off that drive before they attempt any recovery or replacing the drive. Unraid creates problems and endangers data while TrueNAS, OMV, Synology, and Qnap don't have similar issues even, in some cases, on the exact same drives/hardware.
  20. No argument but there have been issues with support for the Realtek RTL8125 family of 2.5Gb ethernet interfaces lagging behind in Unraid as documented at the start of this thread. Many popular Linux distros supported the RTL8125 before Unraid did. I was trying to be helpful especially in explaining how Realtek may be a better choice than Intel in some cases.
  21. Yeah, I assumed as much that's why I mentioned Slackware. Interestingly, I've seen references Intel is dropping support for FreeBSD in some cases including never providing drivers for their I225V 2.5 Gb ethernet interface (the only 2.5 Gb Intel option I've seen used). So FreeBSD based applications, like TrueNAS Core and pfSense, are currently at the mercy of Intel and can only support Realtek for 2.5 Gb. For everyone who thinks Intel is the safer choice, that's changing. At least Realtek still supports FreeBSD which is more than Intel can be bothered to do. I don't know much about the Slackware kernel but do wish Unraid was Debian based like TrueNAS Scale which tends to have the best driver support for consumer and small business hardware in the Linux world.
  22. I've seen some conflicting info on this issue. I can say with 6.11.3 the Realtek RTL8125 is working for me and it doesn't seem to matter if there's a suffix to the chip part number (i.e. RTL8125B, RTL8125BG, sometimes known as "Dragon" etc.). I've tried a variety of hardware. Sometimes the driver seems slow to initially sort out the connection but once past that it seems to work fine. Using iperf3 I've verified consistent 2.4 Gbit speeds which is the same as I get with an Intel I225V 2.5Gb interface for the many who love to hate on Realtek. All things being equal I'd rather have Intel but the fact is the Realtek interface is FAR more common both on motherboards and reasonably priced NICs. And, in the real world, they seem to deliver similar performance. I can say I've been unable to get a RTL8125 based USB3 interface to work but that didn't surprise me and is likely a bad idea regardless. As others have pointed out here, 2.5Gb is going to become a relatively dominant standard for most wired networking with consumer grade gear. While Unraid's write performance directly to an array is fairly miserable, and below 1 Gbit ethernet even with turbo write enabled, with SSD cache or to a pool, an Unraid server greatly benefits from 2.5 Gb ethernet. Even spinning drives can sustain 220 MB/sec on their outer tracks which is nearly twice what 1 Gbit ethernet can manage. Hopefully Unraid and Slackware will prioritize 2.5 Gbit and 10 Gbit ethernet going forward as faster ethernet hardware becomes common.
  23. I appreciate your support and the support of others here. It's one of the best things Unraid has going for it. I do understand about hardware compatibility issues, the changing linux kernel, etc. I've run Unraid on everything from a Supermicro Xeon rack server with ECC RAM to a J5040 build that draws only 12 watts from the wall with the drives in standby along with a few other hardware platforms. With various builds I've used only the motherboard SATA ports, 2 different LSI 8 port PCIe cards, and ASMedia PCIe cards as well. I've used a wide variety of drives with Unraid including brand new WD and Seagate NAS drives. In all my years of using Unraid on a wide variety of hardware one thing that's been consistent are CRC errors resulting in perfectly good drives being disabled. As I've said it's not a problem I've ever had with any other NAS/server OS or commercial NAS products like Qnap. It's also not been specific to just one hardware platform, type of SATA controller, drive, etc. The reason I have two LSI 8 port boards is I replaced the first one when I started having CRC issues. I first changed out the cables, as is common wisdom here and, when that didn't solve the problem, I replaced the entire LSI board as is also common wisdom here. This was in an enterprise grade Supermicro Xeon rack server. It wasn't the cables, LSI controller, or the drives. The problem, in my opinion, is Unraid. That exact same hardware, right down to the drives Unraid wrongly disabled, worked flawlessly for over a year with FreeNAS/TrueNAS and ZFS. So yeah, I blame the Unraid OS for some things. But I'll stop beating the dead horse. I do appreciate the support here from everyone regardless.
  24. @Hoopster Thanks for all that. I agree there's a lot to like about Unraid. But I'm not sure about "The team at Limetech does a great job listening to users and addressing concerns" as they often ignore well documented problems release after release. I'm not at all trying to push Unraid to it's limits as you suggest. For me Unraid falls on its face just trying to be even the most basic file server when it disables a perfectly good drive in the array creating all sorts of headaches in doing so. To me it doesn't matter if Unraid is great at hosting Docker apps, or passing through a graphics card to a VM, if it often fails at the basic task of being a file server by kicking perfectly good drives out of service and creating lots of pain in doing so. That's an issue the "team at Limetech" seems to want to ignore.
  25. Unraid has arguably more support than any of the other DIY NAS/Server OS options except perhaps FreeNAS/TrueNAS and that's great. I've read countless Unraid guides, spent time with the official Unraid docs, watched countless videos by Spaceinvader and many others, all on how to optimally set up Unraid, and somehow I've never seen anything about local persistent logs. Unraid tries to be a beginner friendly OS but it quickly turns into a geek fest in so many ways especially when things go wrong like having a perfectly healthy drive disabled in your array. It's all too easy to make obscure mistakes that could cost you a big portion of your data. Many on this forum have ended up restoring from corrupt parity, having to reformat a drive disabled by Unraid destroying all the data on it, having data lost during a unnecessary lengthy parity rebuild, being confused about how Unraid emulates a disabled drive, trying to replace a drive in the array and losing data doing so, having their array completely disabled because Unraid mistakenly thinks two drives have failed, having Unraid's parity become corrupted because their server was simply physically bumped, etc. The list goes on. Unraid has a lot of compelling advantages which is why I paid for multiple licenses for my multiple Unraid servers. But Unraid also seems to be a fragile snowflake when compared to most any other network storage solution which are generally far more robust. At least I'll now be able to have persistent logs to capture Unraid's failures so thanks again for that.