FreeMan

Members
  • Posts

    1411
  • Joined

  • Last visited

Everything posted by FreeMan

  1. Does that also explain the files that the mover failed to copy Sent from my moto g(7) using Tapatalk
  2. 120GB is A LOT of file system overhead! Sent from my moto g(7) using Tapatalk
  3. Additionally, can anyone explain this math: how does 360 GB minus 224 MB leave only 238 GB free?
  4. In an attempt to clear my cache pool so I can replace a failing drive, I've set the use cache setting for all the previously "cache only" shares to be "yes", and I've run the mover. When it finished, it left 224MB of data on the cache pool. After turning on mover logging, my log is full of messages like this: The only thing that appears to be left are a few docker's config data in the appdata directory on my cache. UNRAID somehow is seeing them, but then, when it attempts to move them, it can't find them. Why would this be? Could it be that these files are in the failing areas of the SSD that I'm trying to replace and may be lost forever? If that's the issue, would reinstalling these 4 dockers after I've replaced the cache drive and moved everything else back likely resolve the issue? nas-diagnostics-20211023-1829.zip
  5. Ah, gotcha, the service itself, not just the dockers. Thank you! And this is why I double check... Happily, UNRAID and my hardware are stable enough that I don't often deal with these things...
  6. @JorgeB now that I've got my replacement SSD, just to confirm: 1. Set all shares using this cache pool to "yes" (from "Prefer" or "Only") 2. Stop all dockers (appdata is the main thing on the cache and I don't want dockers trying to write there while the rest of this happens) 3. Run the mover to get everything off the cache 4. Physically swap out the drive 5. Rebuild the cache pool (since a drive will have been removed, it won't recognize it properly and will want to do that, right?) 6. Set all shares using the pool back to cache: "Prefer" or "Only", as they were originally 7. Run the mover to get everything back to the pool 7. Restart dockers.
  7. Happy Halloween to me then. I guess I get to go shopping. Now I need to figure out when I bought that drive and see if there's any warranty left on it. Thanks Sent from my moto g(7) using Tapatalk
  8. I have 4 of these messages being reported on my UNRAID Dashboard this morning: I presume that this means the SSD is failing and needs to be replaced immediately, or is it possible that it's just a cable gone bad? nas-diagnostics-20211022-0649.zip
  9. Sorry if this has been addressed - I was lazy and didn't read all 18 pages... Is anyone aware of this issue https://github.com/Koenkk/zigbee2mqtt/issues/8663 that is apparently a driver bug and requires a kernel update to fix? It seems to cause various ZigBee related issues, though not specifically related to MQTT...
  10. Well, that does seem to have done the trick. This, fortunately, seems to have been unnecessary. I know it's not a big deal to delete the img and start over, but why mess if I don't have to... Joy... I've got another disk throwing some CRC errors, too. Don't really want to have to replace 2 drives at the same time. The spinning disk started throwing errors after I physically moved the server just a bit while it was running. It could be as simple as a able is a smidge loose. That's what I'm counting on, anyway. Unfortunately, my plan for an easy to access setup isn't as easy as I thought it would be, so it's a bit of a pain to get to the server now. I do need to shut it down and double check all the cables. I just need to muster the oomph to do it.
  11. Additionally, some of the dockers have a Question mark symbol instead of their usual icon:
  12. I ran out of cache disk space - my fault! I'm clearing unnecessary files and fixing configs so it doesn't happen again. However, various dockers won't respond, and attempting to restart them gives an Error 403. I'm now at 63% cache space utilization, so there shouldn't be any issues there. nas-diagnostics-20210917-0712.zip Is this simply a case of reboot the server or are there other trouble shooting tips I should try first?
  13. Well, my problem is fixed! We just had the AC replaced and while they were doing it, I had them install a new vent in my office right in front of my server. With a steady dose of cooled air blowing up its skirt, the server temps stayed sane and I had 2 drives occasionally hit 40°C, but nothing higher during a parity check. Of course, I'll have to get a cover for heating season because I don't want to cook the server, but it'll run nice and frosty now during the summer.
  14. I moved them as I've been doing for ages. I've gone back and forth between using cache and not, but don't recall having run into this before. Based on your question, I copied them from the array to the cache-based temp directory, deleted them from the array, copied them back from cache to the array and deleted them from Cache. Now there are no file on cache. Thanks! Thanks. I have been very much aware of the "don't cross the streams" admonition for years. Since I'm using Krusader and both directory windows work from "/media" I assumed (with all inherent danger, obviously) that it would work properly since it's always worked that way in the past. I guess I know better now, and will be sure to "copy/delete" instead of "move" from here on out.
  15. I received a warning from Fix Common Problems that I have files on my cache drive for a share that's set to not use cache. Here's the config for the "Sport" share: Browsing the cache pool shows that recent files are there for the Sport share: I copied the files from a temp directory (on Cache) to the Sport share using Krusader. Here's the path as reported by Krusader: And here is the mapping for /media from the Krusader config: Why is Krusader writing these files to the cache pool instead of waking up the drive (if necessary) and writing them directly to the array? nas-diagnostics-20210830-1450.zip
  16. Just for fun, today, the Cache Utilized Percentage is almost right, but the actual amount used is off. 175GB out of 360 is 48.61%. 147GB out of 360 is 40.83%. So either the math is wrong or it's not finding all 360GB of available cache space.
  17. This had, initially, fixed the display issue. However, it's back. These were from yesterday: And this is from this morning: No amount of adjusting the time frame will cause the Grafana reported utilization to get back in sync with the WebGUI. To address the other issues noted in your original response: * The mover is running at 01:10. It is currently 06:56, it's long since completed its task, the first shots were from sometime after noon yesterday * As noted previously, the query is pulling `"path" = '/mnt/cache'`. If you have a specific recommendation on how to modify it to pull in the 3 individual drives that make up the cache pool, I'll be happy to make that mod to see if it makes a difference. I suppose this isn't critical, as UUD is a nice addition, and I'm relying on the WebGUI being accurate as the last word, however, it's mildly annoying. I'm willing to test out suggestions, but I'm not going to be heavily digging into finding a solution myself. I'd like to say that this is a recent change (though I don't believe I've changed anything in either my UNRAID or Grafana setups that would have caused this), but it may well have been off from day one and I just never noticed. Out of curiosity, I just loaded UUD v1.5 and it is reporting the same incorrect number that v1.6 is, so this is probably nothing new.
  18. If the plugin was ever updated (I'm guessing not since the newest version I see is 2.0.0 and my plugin is dated 2018.02.11), your changes don't seem to have worked for me. I have auto updates turned on, but don't recall having seen the change come through (doesn't mean it didn't, just that I don't recall). My speed tests have been reliably failing before & since your post. I appreciate your efforts and hope that it does get updated! The V0.3.4 change appears to be working for me as well. A manual test functions, now to wait for my regularly scheduled test to ensure all is good. Thank you for this work-around!
  19. Have you notified whoever asked you to post diagnostics that they are up here? Maybe describe the issues you're having in more detail and someone else may be able to take a look. Most modern CPUs will throttle back if they get too hot, and will probably shut the computer down if temps continue to go up. You'd probably need to look at the docs for your mother board to determine if it has that feature and where in the BIOS settings it may be. The Parity Check Tuning plugin can be set to pause a parity check or disk rebuild if disk temps get too hot, but it won't shut down the whole server.
  20. That is one. I wasn't aware of any, but figured they'd have a report somewhere. It is decidedly inconclusive. The first thing I noted was their extremely cool temps - the min temps any of my drives report in SMART history is about 30°C (86°F). Right now my "server room" is about 25°C (77F) and I've got drives spinning between 36-44°C. My SSDs are always reporting either 30 or 33C (1 @ 30, 2 @33). They never change (makes me a bit suspicious, but they're cool enough I'm not concerned). In general, it seems that occasionally hitting 50°C isn't quite the "instant death" I was initially fearing, but it is best if they don't get that toasty.
  21. Interesting, I set the time frame to 1 hour and it sync'd the numbers. I set it back to 24 hr and it remained correct. I'm not sure where in the query I would need to make modifications, since it's selecting on "path" = '/mnt/cache'. I do have all 3 drives in the pool specified in the Cache Drive(s) drop down at the top of the page. And I don't care about the multiple images enough to be bothered, but thanks for the tip!
  22. I've just noticed an interesting inconsistency. As reported by the WebGUI: As reported by UUD: I know that the UUD is only refreshing every 30 seconds, but trust me, my system is NOT capable of writing 62GB to the cache drives in 30 seconds. Forgive this second image. I've tried deleting it twice, but it persists...
  23. After a brief DuckDuckGo search... Here's an undated PDF from Icy Dock with scary warnings about how heat kills drives. Of course, they want to sell you their docks to keep your drives cool, so one should take it with a grain of salt. Here's a 2020 page from ACKP claiming that "prolonged operation at temperatures under 20°C (68°F) or above 50°C (122°F)" will shorten a drive's lifespan. Of course, they want to sell you cooling solutions for your NOC, so there's a grain of salt with this one, too. Both of those actually reference the same white paper from National Instruments, so there is at least some credibility (or, at least, consistency) to them. The NI paper states: Of course, NI wants to sell you their hardware for running tests on your equipment, and they want to sell you the "extended life" option if your conditions are outside those ranges, so again, a grain of salt. Finally, I found a Tom's Hardware story from 2007 reporting on a Google Labs research paper (404, I couldn't find it at the Wayback Machine, maybe it's me). Tom's summary indicates that heat is a factor, but not the only or even biggest factor in drive death. According to their summary, Google didn't (yet) have any particular parameters that were credible in predicting drive death. It does, however, mention that drives operating in cooler temperatures did seem to die more frequently than drives operating hotter and that only at "very high" temps did the trend reverse. Tom's quotes may very well be the source of a lot of the conventional wisdom at this board: Once a drive is past the infant mortality stage (about 6 months of high activity), death rate drops until about 5 years have passed Age alone isn't necessarily a factor, after 3 years, death rate stabilizes at about 8% Drives with SMART scan errors show a 10x likelihood of dying of those that don't have scan errors While 85% of drives with one reallocation error survive more than 8 months after the error, the overall death rate is 3-6x higher after the first allocation error than those without errors 56% of all their drive failures had no SMART warnings at all. All in all, it sounds like drive temp isn't the worst thing. All the things I found (again, just a quick scan) said that up to 50°C operating temp is OK. Of course, cooler is going to be better, but you don't want the drive reaching for a sweatshirt, either. I still haven't found anything from BackBlaze, and they seem to be the preferred go-to for drive life metrics. Wonder if they do have anything on causes of failure, or just statistics...
  24. This is more on line with what I believed and understood, but I've certainly got no proof one way or the other. Thank you for your input. I wonder if anybody has done /can find some research on what effect temp really has on drive lifespan. Sounds like something BackBlaze might have. I may see if they've got something. Sent from my moto g(7) using Tapatalk
  25. Interesting, and thanks for the feedback @Hoopster. I just don't think I'd ever seen a drive hit above about 40-41°C before, even during a parity check. I've got 5-in-3 cages, and this is the first time in quite a while that I've actually had 4 drives in any one cage. I've seen some comments about the IronWolf running hot, so maybe with it running hot and 4 drives (even though it was next to a drive that wasn't part of the array and spun down), the whole mess was just hotter than I'm used to. I'd still welcome other's input, feedback, comments.