jhyler

Members
  • Posts

    41
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

jhyler's Achievements

Rookie

Rookie (2/14)

2

Reputation

2

Community Answers

  1. Much experimentation later I concluded it must be the power supply. Replaced it and things seem to have settled down.
  2. Could be, I recently replaced my 10+ year flash drive. The new one is 64G - what I happened to have - so obviously the vast majority of space I'm not using. Anything I can do to resolve this? Something that will mark sectors bad?
  3. I was starting Unraid 6.12.6 today after an unclean shutdown. I also took the opportunity to plug in a new (precleared) disk. I glanced at the console during boot and saw line after line of this: nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn nnn:nn,nn where n is a digit. At the end of the list was a message "These errors will not be automatically corrected". All the above is from memory because the screen cleared and boot continued before I could write it down, so consider all this as approximate. After the boot, the array seemed fine. I looked in the syslog, those messages are not there. I grepped around in /var and didn't find it anywhere else, either. I did find a bunch of BTRFS errors and warnings in the log, which is strange because I use XFS exclusively. Can anyone enlighten me as to (1) what these messages are and what they mean, (2) where if anywhere they are logged, and (3) what I should be doing about them? Thanks! tower-diagnostics-20240203-1141.zip
  4. I have an opportunity to pick up an old but near-mint Norco RPC-4220 case for cheap. The problem is, the backplane in this beast requires 5 Molex power connectors, each one powering 4 3.5 inch SATA drives. (It actually requires 6, the last one powering fans behind the backplane, but I could always swap the fans out). The case is built for an ATX power supply, but I don't know of any modern power supplies that come with that many Molex connectors. They seem to be almost afterthoughts these days - you get one, maybe two cables with Molex connectors. The power supply I have on hand (an EVGA Supernova 850 G7) has three SATA power cables with four connectors each, and one cable with 3 Molex connectors. I know I could use Molex splitters to get more connectors, and/or use adapters to convert a SATA power cable to Molex, but they make me concerned since each connector ultimately ends up powering 4 drives. How would you deal with this case? Am I worrying over nothing, should I just go ahead and adapt SATA power to Molex and/or use Molex splitters? Is there a different power supply that would be a better fit for this case? (I'm thinking 850 watts would be about what I need). Or should I just pass on the case entirely? Thanks for your advice.
  5. Thanks for looking at it. The parity check/rebuild was not manually initiated, Unraid automatically started it after I assigned the second parity disk and started the array, Referring to the first syslog, you'll find it starting at line 2343 (15:13:20). The SATA link drops almost immediately afterwards and the drive goes offline. Almost makes me wonder about the power supply, but the system had been working fine and all I really did was replace two WD Red 10TB drives with 22TB drives. (The specs say they consume less than 2 watts more each, but that won't be peak). The second syslog I uploaded looks like it got truncated somehow. I probably did get that log after it was too late.
  6. I am at wit's end on this one. I'd been meaning to upgrade my parity drives (I use two) to larger ones for some time now. Around Christmastime WD had a pretty good price on a bundle of two 22T red disks, so I went ahead and bought them. When I got them, I precleared both of them (just to get some burn-in use), then stopped the array, replaced the old disks with the new ones, and restarted to let the parity build. After a few minutes the rebuild failed on parity disk 2 with write errors. After trying some things with no luck, I let parity rebuild on just the new Parity 1 disk, which succeeded. I RMA'd the other disk. I now have a replacement for that "failed" disk and precleared it successfully. I then assigned it as Parity 2 and let a parity rebuild start. The rebuild stopped after 4.5 hours with write errors again. Plus I can neither stop the parity rebuild nor can I stop the array. This being the second new Parity 2 disk, it's getting hard to believe in dsk errors now. I did the usual trick of starting in maintenance mode, unassigning, reassignng. Then I moved the disk to a different slot in the case (so the adapter and cables are different) and tried the rebuild again. This time the parity rebuild fails in just a minute. Maybe Unraid just doesn't like 22T disks? I'm attaching two sets of diagnostics, one from after the first failed parity rebuild and one from after the most recent rebuild. Neither one contains a smart report for the Parity 2 disk, so I got one while in maintenance mode and am uploading that too. That report looks clean. EDIT: I've removed the second diagnostics, they were taken too late and there's nothing useful there. Help! This has gotten beyond me. If I go back to WD and tell them that the disk they replaced my RMA'd disk with has the same problems the first one did, I expect they're going to want some hard evidence. first fail, tower-diagnostics-20240123-1521.zip tower-smart-20240123-1550.zip
  7. I've been looking at the documentation and forum to see what the recommendations for flash drives are these days. As best as I can see, the most recent recommendations are still: USB 2, not USB 3 At least 2GB but no more than 32GB Favorite: Samsumg BAR Plus 32GB These strike me as rather dated, and the 32GB BAR is basically unobtanium these days, unless you want used or to pay a heavy premium over larger sizes. Are there more current recommendations? Thanks!
  8. Sometime during the night my unraid system locked up. Checked the log and it looks like my flash drive is beginning to fail. I've backed up the flash drive and am ordering a new one, but it will be a day or two before I can rebuild. In the meantime, I have to leave the system running because it's used for other things as well (not using the array drives), but I want to spin down the array disks. I'm not starting the array because I'm unwilling to do a parity check for this. Can anyone describe a method to do this? Thanks! (I am not posting diagnostics because I don't want this thread to be about my particular system. I'd like to limit the thread to answering the above question).
  9. Thanks for the replies, folks. Not sure how we got onto the subject of backups, but that really isn't the point for me. LIke most people I suppose, my array contains data I could lose with no issue, data which if I lost would be expensive or difficult to recreate, and essential data that must be preserved. My backups are planned and managed accordingly. Data survivability is not the issue. Data availability is what can keep me up nights. I won't go into the details of me and my business, let's suffice to say it's very important to have near-real-time access to a lot of data and having to get a backup retrieved and reloaded would be looked at as a serious failure. Unfair, maybe, but such is life. (Digression - you may be tempted at this point to wonder why I don't have multiple servers, then. I have definitely thought about it, and it may eventually happen. Now I just keep spare parts. Redundant servers creates its own set of issues, and even if I did have them, I'd still be faced with this question. End digression.) It's interesting to note that in back-to-back posts we have (1) make decisions based on product track records (i.e., rely on statistics) and (2) "be careful with statistics". Both statements are correct, if seemingly contradictory. As Frank points out, statistics describe populations, not elements of a population. I buy drives based on track records because it's better than not doing so. But statistics don't tell me if that particular 4.5-star disk I just bought is any good. Which is what happened here. And it's why I asked for comments on how other people address the issue of whether the individual disks they buy out of good-track-record populations are individually good ones. Not for proof positive, which is of course impossible, but I was hoping for "I run this script and it's weeded out most of my clunkers". Not having gotten any replies like that, I'm starting to think most people do rely on statistics without realizing it. Which I suppose is probably fine for many if not most Unraid users. Thanks again for the input.
  10. Thanks for the reply, Frank1940. What you describe is indeed a possibility, hard to know except in retrospect. My main concern now is data integrity in emergencies. I, like I hope a lot of other people, keep a spare, pre-cleared disk on hand. When a disk starts showing signs of failure I can swap it in immediately before things get out of hand. Now I am increasingly concerned that my hot spare could be ready to fail me too because I didn't stress test it sufficiently. In my situation that could be bad, bad, bad. So how can I regain confidence? My immediate thought is that in preparing a standby replacement disk, I will first use it to replace one of my "archive" array disks that I know isn't being written to anymore, let Unraid rebuild it on the as-received new disk. Afterwards I'll replace the original disk and trust the array. Then - if the rebuild succeeded - I'll preclear the new disk and set it aside as a spare. The downside of doing that is it puts a lot of I/O on all the other disks, enough to make me think that keeping two ready-spare disks might be wise. In other words that solution may be overkill. Which is why I asked the community here what they do to gain confidence in their new disks - which, unfortunately, remains unanswered.
  11. It ain't preclear. Or at least preclear ain't enough. I posted another thread today where it was determined that a brand new disk bought straight from the manufacturer was put through a full 2-day run of preclear, passed with flying colors, and immediately failed with read errors when added to the array as a parity disk. I didn't do that preclear because I wanted the disk zeroed - it was going to be a parity disk, after all. I ran preclear because I was under the impression that it also acted as a sort of stress test that would get the disk to fail if it was going to. And maybe it does for some kinds of errors. But obviously it's not enough. Certainly it's not impossible that the disk failed after the preclear completed, but the overwhelming likelihood is that there was some kind of manufacturing error that preclear didn't find but it showed up almost instantly when the disk went into the array. So. Let's say I want to do something to a new disk when I acquire it. Some test or series of tests that if the disk passes, there is a strong likelihood that it will work in the array. Preclear is part of it - because usually i do want the disk zeroed, and some errors it may find - but what other test steps should I be taking? Anybody else feel this way? What do you do?
  12. I am in the process of rebuilding parity on a larger disk. Due to other errors reported in an earlier thread, I am rebuilding on a single parity disk instead of the two I had previously. I am about 6% through the rebuild and just got informed of read errors on one of my older data disks. Since I still have the older parity disks, I believe I can recover from this. What I plan to do is this: Let the rebuild complete and hope for no other errors, or at least no other disks with errors. After the rebuild completes, stop the array, power down. Replace the newly built parity drive with the two old ones. Remove the data drive with the read errors and replace it with a new (precleared) drive (which will have to be larger than the original). Restart the array, let it rebuild the data disk. After the rebuild, stop and shut down again. Pull out the two old parity drives, replace with the one which was just rebuilt. It's now invalid, so rebuild it yet again. I am letting the current rebuild complete, at least for now, because I'm not entirely sure the steps above will work, and that I may not want that as a valid disk except the the sectors with read errors. I'm also worried that as the parity rebuild started all the data disks got a small number of writes - I wasn't expecting that and it worries me. But that rebuild has over a day to go before it completes. I'd appreciate your advice as to whether the steps above should work, whether I should let the current rebuild complete, and anything else I should know about this situation. Also how to "force rebuild" a parity disk - I suppose in my case that's just a parity check by any other name. Thanks!
  13. I moved the drive to where parity1 was and tried, it got a bunch of read errors almost immediately. I've since pulled it out, put the other new drive in its place, and restarted a single-parity rebuild. So far it's working, it at least got past the 0.3% point where it had stopped before. How - HOW - does a disk get through an almost 2-day long preclear, all those pre-clear and post-clear reads succeeding, and then fail with read errors immediately thereafter? The only difference I can think of is that when being precleared it was connected directly to a SATA port, and then I moved it to an HBA. That shouldn't matter, though - right? Come to think of it, something like this happened to me not long after the WD Red 10TB disks first came out, there was something in how the disks spun down that something in Unraid or Linux didn't like, it took another couple Unraid releases before they were working reliably. These new disks are the new red 22T models, I wonder if something like that could be going on. Of course the new disk that is working is 22T also, so that seems unlikely. Though now that I'm looking I see other people in the forum having issues with 22T drives. Thanks for the help, I appreciate it.
  14. Diagnostics attached, sorry about that. Things may be worse than I thought. The array operations page shows the elapsed rebuild time increasing, but the current position (0.3%) and speed are not changing. The array devices page shows no disk activity on any disk. Help! tower-diagnostics-20231227-1003.zip
  15. I needed to replace both parity drives in my perfectly healthy system with larger ones. So I got two new drives and put them both through preclear to stress them a bit first. They finished successfully. I then stopped and shut down, pulled the old (valid) parity drives, moved the precleared drives to the slots that contained the old parity drives I just pulled. I then booted the system, assigned the new drives to the parity and parity 2 slots, and started the array to rebuild the parity drives. The rebuild stopped almost immediately with the new parity 2 drive 'X'd out with a bunch of errors. Obviously I'm skeptical because it just got through a preclear. So I shut down again, rechecked cabling and seating of things, and tried again. Now the system is rebuilding Parity 1 but (what I should have thought of earlier) the second parity drive remains 'X'd out. So I'm basically accidentally building a single-parity drive system. Being a bit paranoid, I am (1) going to let that finish and (2) not touch anything else so the old parity drives remain valid for the time being. When the rebuild finishes, I will then want to un-"X" that other drive and get it added again as the second parity drive. How do I do that? If possible, I'd prefer a solution that doesn't involve magic "don't mess this up or you're dead" command line commands. Thanls!