Pauven

Members
  • Posts

    747
  • Joined

  • Last visited

  • Days Won

    7

Everything posted by Pauven

  1. I would highly recommend at least starting with the original slots where the replacement drives are being detected, remove the replacements and install the originals there. For now, don't touch any of the other "good" drives, as that could be compounding the problem, especially if you start to lose track of which drives are which. Keep it simple.
  2. Okay, slow down. Up to this point, most of the advice has been either about doing tests or hypothesizing options. Anytime you think you're ready to take action, please post here your planned steps for review and approval. Anytime you take an action, you're one step closer to losing data if it is the wrong action. I believe your data is still intact, so don't give up hope. But slow down and work with the guys here, don't do anything that's not reviewed and approved. Did you follow this guidance to backup the current flash drive first, before restoring from backup? From a planning perspective, we need to know what options remain.
  3. Hey guys, I'm just chiming in here as jkwaterman is a friend of mine, and we've already been chatting via email - I sent him here for expert advice. I'm super happy to see JorgeB, trurl and Frank1940 are helping out - you guys are sharp so I know he's in good hands. I read through everything, and I do have a few thoughts. Everything you guys are suggesting is pretty much a match for what I've advised via email as well, so we're all already on the same page. Restoring the super.dat from his Apr 2023 backup is a great idea, but I think that only applies if he didn't change any drives between the backup and before the drives failed out. If we send him down this path, I think he first needs to confirm he didn't upgrade/swap/add any drives post backup, and also he should have a new backup of his current (bad) config, in case this goes sideways and he wants to get back to the current state. I wanted to point this out since I didn't see anyone ask this particular question. I also strongly agree with trying to use the original failed drives, and that he should perform SMART tests to validate the drives are okay before re-using them. One thing I'm not sure about is if he uses the old drives, should he use the Trust Parity feature (I assume that's still a feature, been a decade since I last did this). I'm imagining that he's got two paths forward with the old drives. He could recreate the array config using all the original drives, and do a Trust Parity so it won't be rebuilt, and then immediately swap out the two suspect drives and rebuild onto the replacements. Basically, with this approach he's using the GUI to recreate the pre-failed drive config state, and then manually failing/upgrading the drives. Otherwise, he could again recreate the array config using all the original drives, but don't Trust Parity and instead rebuild new Parity via the data on the suspect drives. This second approach sounds slightly riskier, as we're trusting the suspect drives to survive the parity rebuild, and unfortunately we don't know the nature of the errors that started this whole fiasco. I know for a fact that he has started the array numerous times in disk emulation mode, so data could have been written to the array. Additionally we are both users of the My Movies software, which has a habit of updating local movie data from online web contributions that other users continually submit, and this metadata in turn gets written to the array. It's probably safe to assume that My Movies was running at some point during disk emulation mode, so that the current parity data no longer matches the data on the failed drives. I just wanted to point this out, so that we all know to only either trust the parity data, or trust the suspect drive data, but expect the two data sources to be slightly out of sync with each other. Note that the updates from My Movies are trivial and will automatically be reapplied if he reverts to the old drive data, so no risk of data loss there if he reverts to them. One question I had myself is: Is it possible to manually fix the drive config, via text editing, so that the parity drives are re-added to the array in a trusted state, but the 2 failed drives are still shown as missing/wrong/replaced? I was thinking there was a way to accomplish this via text file edits, but I really don't know. I helped with his server build. This power supply has 62A on +12V if I'm not mistaken. Thanks for helping jkwaterman out, guys, I know we both really appreciate it!!!
  4. Thanks Rysz. Actually, it's my signature that's really outdated, hah! But I was still on 6.9.2, and I had to upgrade to 6.10+ even to use the URL method. I'm on the latest 6.12 now, and I was able to install from URL. I assume it's the same MergerFS release as the CA version. I like MergerFS, it's working as I hoped. But it's not perfect. The "Create" option is static for files vs. directories, and I was finding that it would create a directory, write some files to it, drop below the minimum free space, and then create a new directory on a different branch. Considering that I'm backing up uncompressed blu-rays, typically around 45 GB in size, I need the min free space for creating a directory to be at least 45 GB higher than the min free space for creating files. To solve this, I customized the mirror.sh script someone else wrote (which is used to create each directory right before files are written to it, rather than creating all empty directories first and then copying files). I changed it to have it create directories based upon 100 GB min free space, and to evaluate my MergerFS branches in a particular sequence. I then was able to configure MergerFS with a much lower 4 GB min free space, which only applies to files since my script creates the directories. When used with MergerFS's "ep" Existing Path option, I now have MergerFS writing the backup files to where my backup script creates the directories. This allows me to keep my blu-ray disc directories whole on a single drive, and all my MergerFS branches fill up one-by-one. I'm in backup nirvana!!!
  5. A year ago I created an easy, affordable backup solution for my Unraid server. Essentially just a stack of external USB drives that I mounted with Unassigned Devices and joined together in a BTRFS JBOD style pool. With 5x 16TB drives, this gave me a single 80TB storage volume. At the time, this solution seemed perfect. I had a backup script that used RSYNC to copy my files to the single mount point, and I thought that BTRFS filled up each drive one-by-one. Since my Unraid data is basically already a backup of my physical data, having this portable backup volume that could be stored offsite was more than I needed, even without any built-in redundancy. This week, while adding a new 20TB drive to expand this pool up to 100TB, I learned I made several mistakes in my backup solution. First, when adding the new drive I made a few mistakes and ended up corrupting the BTRFS pool. And since my pool had no redundancy, BTRFS prohibits mounting it in RW mode to fix it, so the only option was to start over, recreate the entire pool, and re-backup the original 80TB of data. That was painful enough. But in redoing all this, I discovered that BTRFS is automatically balancing, writing to the drive with the most free space for each file. With the nature of the data I'm storing, losing a single drive would now make the entire backup worthless as I need each directory to remain whole on a single drive, and can't lose any files inside each directory. While my BTRFS backup pool is better than nothing, this is way too fragile for me to continue using it. While researching solutions, I came across MergeFS and eventually this thread. This sounds like the right type of solution. My core requirements are to plug in my USB drives, mount them as a single filesystem, and run a backup script to copy any new/altered data to my backup pool, with data filling up each drive, one-by-one, before moving on to the next drive. That way, if I lose a drive, I only lose the data backed up to that one drive, plus any directories that happened to be spanning the transition between drives. Sorry for the long lead-in. Now to my questions: Is the plugin on CA yet? I searched and can't find it, so I'm assuming I have to install it via URL. Can someone help me with the configuration? I read through the MergerFS github page, and there's tons of options and the examples don't seem to apply to my use case. I'm a bit overwhelmed. I need commands for configuring, mounting, unmounting, and expanding the pool. Thanks! -Paul
  6. I did see that, but then you appended with your edit and I thought you were changing your answer, hence my confusion. I currently have 78.6 TB of data backed up in this pool, as-is. If I follow those steps, is there any risk I could lose that data and have to repopulate the back-up? It was over a week of copying, I don't want to have to do that again. If I'm understanding you correctly, I can remove the 5 disks, delete the history, then 1 at a time insert the disk and rename it to the same pool name, delete my history again just to make sure, and then the next time I bring in all 5 drives at the same time, they will appear as a single pool. Does that sound right?
  7. So does that mean this isn't possible? Sorry, I got confused. Since the mount point has to be the disk label, and it won't let me rename to an existing value, that makes it impossible to do the solution you offered, right?
  8. I just tried changing the Mount Point to all be the same, and it won't let me. It reports "Fail". I think it is because it's changing the disk label and the mount point at the same time. Is there a trick to doing this? Errors in the log: Apr 10 17:24:51 Tower unassigned.devices: Error: Device '/dev/sdx1' mount point 'Frankenstore' - name is reserved, used in the array or by an unassigned device.
  9. Hey dlandon. Thanks so much for this awesome tool. I've been using it for years, and it's been a big help for certain tasks. One of the things I occasionally use it for is for a removable btrfs JBOD drive pool of 5 USB HDD's. It's so easy to plug it in, mount it, run my rsync backup job, then put it back in offline storage when I'm done. I love the fact I don't have to stop/start the array to use it, that I don't get warnings when I unplug it, and that I don't get any fix common problems warnings for duplicated data on a cache disk. I was recently sharing my solution with some fellow users, and I discovered that the tutorial for how to create the btrfs drive pool for UD was removed. I reached out to JorgeB and he restored that post so now I have those instructions again. While working with these other users on how to do this hot-pluggable backup pool, and comparing with how it works using stock Unraid pools, a few things cropped up that I wanted to ask you about. After all, UD is the best tool for creating hot-pluggable drive pools that are normally stored offline, but there are a couple things Unraid pools do a bit better. First, when mounting the 1st pool device, the buttons to mount the other devices remain enabled. One of my fellow users got confused, and clicked mount on all devices, and then saw the pool was mounted multiple times. Would it be possible to both make it more obvious that all the drives in the pool are now mounted, and to disable/hide the mount button on the other drives? Currently the only indication is the partition size on the mounted drive. Perhaps even the other drives that got mounted in the pool can be inset to the right, beneath the parent, to better indicate what is going on. Second, would it be possible to add a feature in the GUI to add a partition to an existing pool? I believe that Unraid pools let you do this, but in UD you have to go out to the command line and do the btrfs dev add... command to add the partition to a mount point. I know it's a pretty easy command line, but some users are very uncomfortable with the cmd line and prefer the GUI approach. I know most people seem to think that Unraid pools are the only game in town now, even your own documentation states to use them. But for hot-pluggable, removable drive pools, UD is so much better, I hope you continue to support and enhance this capability. Thanks!!! Paul
  10. Awesome, thank you Johnnie (should I still call you Johnnie, or Jorge, or something else?), that's exactly what I needed. I was surprisingly close in my recreation of the steps based upon my research, but was full of doubt. You're extremely helpful as always. 😊 Another user did a test and discovered he was able to mount a pool created in Unraid using UD, no big surprise I guess since these are just standard btrfs pools. So for some users it might be easier to create the pool using Unraid, remove it, delete the definition, and then use UD from then on for hot-plugging. I would definitely use the Unraid pools feature if it more gracefully handled hot-pluggable backup pools, and didn't require the stop/start. I'm not complaining, though, since UD does this extremely well.
  11. Hey Johnnie/@JorgeB, I could use some help on this. Side note, your new username and logo had me all confused, I couldn't figure out how you seemed to have been here for years/decades, yet I didn't recognize the name. I finally figured out your provenance, though I'm still baffled by the user name change. Anyway, to my issue. I created a portable backup drive pool, as described above, with Unassigned Devices back when I was running 6.8, using the directions you linked to above. I plug it in 2-4 times a year and do a backup, it's fantastic. Those instructions (which I think you wrote) have since been deleted, since the preferred way is to use the multiple drive pools feature in 6.9. But the functionality in 6.9 is not the same. If you create a drive pool and then unplug it, Unraid is unhappy about the missing drives. You can make the warnings go away if you delete the pool, but then you have to make sure you add recreate the pool with all the drives back in the correct order before doing your next incremental. You also have to stop/start the array to do any changes to the pool. Unassigned Devices did this particular task so much better. No warnings, don't have to delete the config, just plug it in and mount it, don't have to stop the array. While I can understand that the preferred method is to use Unraid for multiple permanent drive pools, I don't understand why the documentation for doing it with UD was deleted, as that still serves a niche. I'm trying to help some other users get up and going with the solution I'm using, and since I can't find the documentation I can't fully help them. I think there were some command lines I used when setting up the btrfs pool as jbod, possibly related to formatting but I don't recall. I also need to expand my UD backup drive pool soon, almost ran out of space on my last backup so I need a 6th drive, and I'm worried I won't be able to do this correctly without the instructions. Even the UD support thread points to the now deleted instructions, and the internet archive doesn't have any successful copies of the FAQ. Is this something you can help with, or point me to someone who can? Thanks! Paul
  12. Thanks JorgeB. I've followed your advice and ripped out the Highpoint 2760A. I installed a couple Dell H310's, combined with 8 SATA ports on my motherboard, to get back to 24 ports. So far it's been smooth sailing, but my Call Trace problems don't usually crop up for a couple weeks, so I'm not in the clear yet. Fingers crossed.
  13. I've been searching the forum, trying to see if any other users have the same issue. I do see plenty of call trace reports, but so far none have matched mine. My log just keeps repeating the same info over and over. What I posted above was just the errors, here's the full detail for a complete error segment: Apr 1 13:50:20 Tower kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Apr 1 13:50:20 Tower kernel: rcu: 10-....: (2 GPs behind) idle=61e/1/0x4000000000000002 softirq=118394498/118394499 fqs=10418875 Apr 1 13:50:20 Tower kernel: (detected by 8, t=42541182 jiffies, g=291127741, q=55535900) Apr 1 13:50:20 Tower kernel: Sending NMI from CPU 8 to CPUs 10: Apr 1 13:50:20 Tower kernel: NMI backtrace for cpu 10 Apr 1 13:50:20 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 13:50:20 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Professional Gaming, BIOS P4.80 07/18/2018 Apr 1 13:50:20 Tower kernel: RIP: 0010:mvs_slot_complete+0x31/0x45f [mvsas] Apr 1 13:50:20 Tower kernel: Code: 00 00 41 56 41 55 41 54 55 53 89 c3 48 6b cb 58 48 83 ec 18 89 44 24 10 83 c8 ff 89 74 24 14 4c 8d 34 0f 4d 8b be 08 fd 00 00 <4d> 85 ff 0f 84 16 04 00 00 49 83 bf e8 00 00 00 00 0f 84 08 04 00 Apr 1 13:50:20 Tower kernel: RSP: 0018:ffffc900003c0e78 EFLAGS: 00000286 Apr 1 13:50:20 Tower kernel: RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000 Apr 1 13:50:20 Tower kernel: RDX: 0000000000000000 RSI: 0000000000010000 RDI: ffff888138a80000 Apr 1 13:50:20 Tower kernel: RBP: ffff888138a80000 R08: 0000000000000001 R09: ffffffffa02eda65 Apr 1 13:50:20 Tower kernel: R10: 00000000d007f000 R11: ffff8881049a9800 R12: 0000000000000000 Apr 1 13:50:20 Tower kernel: R13: 0000000000000000 R14: ffff888138a80000 R15: 0000000000000000 Apr 1 13:50:20 Tower kernel: FS: 0000000000000000(0000) GS:ffff888fdee80000(0000) knlGS:0000000000000000 Apr 1 13:50:20 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 1 13:50:20 Tower kernel: CR2: 00000000002a925a CR3: 0000000281a36000 CR4: 00000000003506e0 Apr 1 13:50:20 Tower kernel: Call Trace: Apr 1 13:50:20 Tower kernel: <IRQ> Apr 1 13:50:20 Tower kernel: mvs_int_rx+0x85/0xf1 [mvsas] Apr 1 13:50:20 Tower kernel: mvs_int_full+0x1e/0xa4 [mvsas] Apr 1 13:50:20 Tower kernel: mvs_94xx_isr+0x4d/0x60 [mvsas] Apr 1 13:50:20 Tower kernel: mvs_tasklet+0x87/0xa8 [mvsas] Apr 1 13:50:20 Tower kernel: tasklet_action_common.isra.0+0x66/0xa3 Apr 1 13:50:20 Tower kernel: __do_softirq+0xc4/0x1c2 Apr 1 13:50:20 Tower kernel: asm_call_irq_on_stack+0x12/0x20 Apr 1 13:50:20 Tower kernel: </IRQ> Apr 1 13:50:20 Tower kernel: do_softirq_own_stack+0x2c/0x39 Apr 1 13:50:20 Tower kernel: __irq_exit_rcu+0x45/0x80 Apr 1 13:50:20 Tower kernel: common_interrupt+0x119/0x12e Apr 1 13:50:20 Tower kernel: asm_common_interrupt+0x1e/0x40 Apr 1 13:50:20 Tower kernel: RIP: 0010:arch_local_irq_enable+0x7/0x8 Apr 1 13:50:20 Tower kernel: Code: 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 d2 39 c5 Apr 1 13:50:20 Tower kernel: RSP: 0018:ffffc9000016fea0 EFLAGS: 00000246 Apr 1 13:50:20 Tower kernel: RAX: ffff888fdeea2380 RBX: 0000000000000002 RCX: 000000000000001f Apr 1 13:50:20 Tower kernel: RDX: 0000000000000000 RSI: 00000000238d7f23 RDI: 0000000000000000 Apr 1 13:50:20 Tower kernel: RBP: ffff888105d0d800 R08: 00028e38c0a3fe38 R09: 00028e3ab9ddf5c0 Apr 1 13:50:20 Tower kernel: R10: 0000000000000045 R11: 071c71c71c71c71c R12: 00028e38c0a3fe38 Apr 1 13:50:20 Tower kernel: R13: ffffffff820c8c40 R14: 0000000000000002 R15: 0000000000000000 Apr 1 13:50:20 Tower kernel: cpuidle_enter_state+0x101/0x1c4 Apr 1 13:50:20 Tower kernel: cpuidle_enter+0x25/0x31 Apr 1 13:50:20 Tower kernel: do_idle+0x1a6/0x214 Apr 1 13:50:20 Tower kernel: cpu_startup_entry+0x18/0x1a Apr 1 13:50:20 Tower kernel: secondary_startup_64_no_verify+0xb0/0xbb Comparing with others, and taking a closer look at the output in my log, I'm noticing a few too many [mvsas] related entries. That's for my Marvel based Highpoint 2760A 24-port SAS controller. For years Fix Common Problems has been warning me about my Marvel based controller, but I ignore those warnings since I've never had any issues with it since I bought it in 2013. Almost 9 years of trouble-free operation all the way through 6.8.3. Maybe I'm jumping to conclusions and the issue is something else. Can anyone tell?
  14. This appears to still be an issue for me. Need help to move forward. Quick recap: Last year I upgraded to 6.9.2 and had issues with Seagate IronWolf (actually Exos) drives, plus the issue described here. I thought it was all related. I ended up rolling back to 6.8.3, and the issues went away. A little over a month ago, 6.8.3 stopped working correctly for me, I believe due to an incompatible Unassigned Devices update. About a week ago I decided to apply the Seagate drive fix (disabling EPC) and try upgrading to 6.9.2 again. I thought everything was successful. Multiple spin-ups/spin-downs, a record-fast parity check, and a perfectly working GUI and Dockers and VM's, I thought I was in the clear. Which brings us to today. Being the 1st of the month, the parity check kicked off at 2am. When I woke up this morning, I found that parity check progress was stalled at 0.1% after 6+ hours, and several hours later I can confirm it's not moving. In general, the GUI feels responsive, letting me browse around, but I noticed that the Dashboard presents no data, the drive temps don't appear to be updating, and the CPU/MB temp and fan speeds are wrong and frozen. I connected to the Terminal and ran an mdcmd status to see if the parity check was actually running, but the mdResyncPos is frozen at 9283712. Best I can tell, it seems like Unraid is frozen, even though the GUI isn't hung. First things first, I decided to run diagnostics. An hour later, it still reads "Starting diagnostics collection...". JorgeB is right. I checked Unraid's System Log, and it is full of Call Trace errors: Apr 1 10:53:07 Tower nginx: 2022/04/01 10:53:07 [error] 9804#9804: *2157788 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.1.218, server: , request: "GET /Dashboard HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "tower", referrer: "http://tower/Main" Apr 1 10:53:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 10:53:19 Tower kernel: Call Trace: Apr 1 10:56:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 10:56:19 Tower kernel: Call Trace: Apr 1 10:59:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 10:59:19 Tower kernel: Call Trace: Apr 1 11:02:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 11:02:19 Tower kernel: Call Trace: Apr 1 11:05:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 11:05:19 Tower kernel: Call Trace: Apr 1 11:08:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 11:08:19 Tower kernel: Call Trace: Since running Diagnostics didn't work, I'm not sure what my next step should be. Do I need to gather more info, or is the issue already confirmed as a Ryzen on Linux Kernel related issue? Are there any solutions?
  15. Big thanks to @Cessquill for the write-up, and for all the other contributors to this thread! I just successfully fixed this on my server. I finally took the plunge after waiting half a year, as I started having major compatibility issues with Unassigned Devices on 6.8.3, and couldn't put off upgrading anymore. I followed all the steps that Cessquill outlined, and disabled EPC on my four Seagate ST8000NM0055 drives. I used the latest SeaChest Utility files from Seagate's website, downloaded yesterday. They appear to have changed again, and this time I used the files from path: \Linux\Non-RAID\centos-7-x86_64\ Oddly different than @optiman's experience with his ST8000NM0055's, mine all had Low Current Spinup disabled, so I didn't mess with that. I'm also running on a Marvel based controller, which probably creates a unique data point that this issue doesn't just affect LSI controllers. Last time I upgraded to 6.9.x, I had major issues, and could not get beyond 66GB of my parity rebuild, which is why I rolled back to 6.8.3. After applying the EPC fix and upgrading to 6.9.2, I've done multiple drive spindown's/spinup's with no issues, and a full parity check which completed in record time. It's perhaps too early to celebrate, but it does seem like the issue is resolved on my setup. I also have two pre-cleared ST8000VN0022's that are not in my array. They had both EPC and Low Current Spinup (Ultra Low) enabled. I decided to leave Low Current Spinup alone, but went ahead and disabled EPC for both of these drives. These will migrate into my array in the coming months, so I don't know yet how they'll behave. I don't even know if I would have had issues with them, but since other users here mentioned them I decided to play it safe. I also used SeaChest_Info to examine my other non-Seagate drives (surprisingly it works), and found that EPC exists and is enabled on my HGST_HUH728080ALE drives, but those don't cause any problems. I kinda hate that these Seagate Exos 8TB drives are such a good value, as they've become my chosen upgrade path, so now I'll have to remember to disable EPC on all new drives going forward. While I do like the HGST drives better, the price premium is just too much for a server this large. Thanks again!!! -Paul
  16. If this is true, then why does the problem only appear after upgrading to Unraid 6.9.x? I'd been running on 6.8.3 for a long time without issues. Last spring I upgraded to 6.9.3 and bam! the issues hit immediately. I never did the EPC fix. The problem was incessant on 6.9.x, and I didn't want to risk loosing data playing around with drive setting as I had 2 drives out and was already risking data loss, so I rolled back to 6.8.3, and the problem went away. Half a year later and it's been smooth sailing on 6.8.3. I stayed on 6.8.3 because it works and there wasn't anything in the 6.9.x branch I'm needing. Even the very first post here mentions that the problems started with 6.9.0, which 100% matches my experience. Perhaps what you are saying is that the problem lies in the Linux kernel or one of the various drivers that were upgraded in the 6.9.x releases, and the issue is not in any of LimeTech's Unraid code. That may be true, though I'm not sure I've seen it clearly detailed in this thread exactly where the problem lies, so I would appreciate pointers to any additional information I may have missed. It certainly seems reasonable to me that since a change in 6.9.x broke this, another change in 6.10.x could fix it, so I'm not inclined to give up hope entirely. And there have been many times LimeTech has chased down bugs in other components on behalf of their users - and this issue has been reported to them in more than one ticket so they should be aware of it, though disappointingly I've never seen them weigh in on the topic.
  17. I can confirm that ST8000NM0055 drives are most definitely affected by this issue. This bit me hard when I upgraded to v6.9.2 back in April. I had to roll back to 6.8.3 to recover from a dual-drive "failure" and inability to rebuild on 6.9.2, and never attempted any of the fixes posted here. I felt extremely lucky to escape without losing data, and I'm still running 6.8.3. optiman, glad to read this worked for you. Since it has been a couple weeks, is your system still okay? I'm starting to feel a little trapped on 6.8.3, so I'll probably have to apply this fix. Since we both have ST8000NM0055 drives, your results matter most to me. I was hopeful that this was a bug in 6.9.x that would be fixed in 6.10, and that I wouldn't need to do the drive fix. Came here to see if anyone had tested this on 6.10 without applying these fixes, but no dice. Paul
  18. Not the restore Unraid version feature (which I used) but rather a restore flash drive from backup. I had to manually copy some config files from the flash drive backup to get 6.8.3 working correctly. It took me a while to figure out which files needed restoring. Some type of automation here would have been nice. Really cool if it was integrated into the restore Unraid version feature - it could prompt to optionally restore certain files from an existing flash drive backup. That could certainly be the issue. But no way I'm going back to 6.9.2 on my production server to gather diags once it fails. I'm still 4 hours away from a full recovery, and I'm not into S&M. I know it's my personal perspective, but I feel that if 6.9.x issues as bad as this, it shouldn't be considered "stable". I wasn't gearing up for a testing run, I was upgrading my production server to a "stable" dot-dot-two release, with a reasonable expectation that the kinks were worked out, and with no awareness that I could be signing up for data loss. I was completely unprepared to deal with these issues, and my main goal was simply surviving.
  19. Cross-posting here for greater user awareness since this was a major issue - on 6.9.2 I was unable to perform a dual-drive data rebuild, and had to roll-back to 6.8.3. I know a dual-drive rebuild is pretty rare, and don't know if it gets sufficiently tested in pre-release stages. Wanted to make sure that users know that, at least on my hardware config, this is borked on 6.9.2. Also, it seems the infamous Seagate Ironwolf drive disablement issue may have affected my server, as both of my 8TB Ironwolf drives were disabled by Unraid 6.9.2. I got incredibly lucky that I only had two Ironwolfs, so data rebuild was an option. If I had 3 of those, recent data loss would likely have resulted. Paul
  20. As a long time Unraid user (over a decade now, and loving it!), I rarely have issues (glossing right over those Ryzen teething issues). It is with that perspective that I want to report that there are major issues with 6.9.2. I'd been hanging on to 6.8.3, avoiding the 6.9.x series as the bug reports seemed scary. I read up on 6.9.2 and finally decided that with two dot.dot patches it was time to try it. My main concern was that my two 8 TB Seagate Ironwolf drives might experience this issue: I had a series of unfortunate events that makes it extremely difficult to figure out what transpired, and in what order, so I'll just lay it all out. I'd been running 6.9.2 for almost a week, and I felt I was in the clear. I hadn't noticed any drives going offline. Two nights ago (4/27), somehow my power strip turned off - either circuit protection kicked in, or possibly a dog stepped on the power button, regardless, I didn't discover this before my UPS was depleted and the server shut itself down. Yesterday, after getting the server started up again, I was surprised to see my two Ironwolf drives had the red X's next to them, indicating they were disabled. I troubleshot this for a while, finding nothing in the logs, so it's possible that a Mover I kicked off manually yesterday (which would have been writing to these two drives) caused them to go offline on spin-up (according to the issue linked above), but that the subsequent power failure caused me to lose the logs of this event. [NOTE: I've since discovered that the automatic powerdown from the UPS failure was forced, which triggered diagnostics, and those logs were lost after all - diagnostics attached!!!] I was concerned that the Mover task had only written the latest data to the simulated array, so a rebuild seemed the right path forward to ensure I didn't lose any data. I had to jump through hoops to get Unraid to attempt to rebuild parity to these two drives - apparently you have to un-select them, start/stop the array, then re-select them, before Unraid will give the option to rebuild. Just a critique from a long-time user, this was not obvious and seems like there should be a button to force a drive back into the array without all these obstacles. Anyways, now to the real troubles. Luckily, I only have two Ironwolf drives, and with my dual parity (thanks LimeTech!!!), this was a recoverable situation. The rebuild only made it to about 46 GB before stopping. It appeared that Unraid thought the rebuild was still progressing, but obviously it was stalled. I quickly scanned through the log, finding no errors but lots of warnings related to the swapper being tainted. At this point, I discovered that even thought the GUI was responsive (nice work GUI gang!), the underlying system was pretty much hung. I couldn't pause or cancel the data rebuild, I couldn't powerdown or reboot, not through the GUI, and not through the command line. Issuing a command in the terminal would hang the terminal. Through the console I issues a powerdown, and it said it was doing it forcefully after awhile, but hung on collecting diagnostics. I finally resorted to the 10-second power button press to force the server off (and diagnostics are missing). I decided that the issue could be those two Ironwolf drives, and since I had two brand new Exos drives of the same capacity, I swapped those in and started the data rebuild with those instead. I tried this twice, and the rebuild never made it further than about 1% (an ominous 66.6 GB was the max rebuilt). At this point, I really didn't know if I had an actual hardware failure (the power strip issue was still in my thoughts), or software issue, but with a dual-drive failure and a fully unprotected 87 TB array, I felt more pressure to quickly resolve the issue rather than gather more diagnostics (sorry not sorry). So I rolled back to 6.8.3 (so glad I made that flash backup, really wish there was a restore function), and started the data rebuild again last night. This morning, the rebuild is still running great after 11 hours. It's at 63% complete, and should wrap up in about 6.5 hours based on history. So something changed between 6.8.3 and 6.9.2 that is causing this specific scenario to fail. I know a dual-drive rebuild is a pretty rare event, and I don't know if it has received adequate testing on 6.9.x. While the Seagate Ironwolf drive issue is bad enough, that's a known issue with multiple topics and possible workarounds. But the complete inability to rebuild data to two drives simultaneously seems like a new and very big issue, and this issue persisted even after removing the Ironwolf drives. I will tentatively offer that I may have done a single drive rebuild, upgrading a drive from 3TB to an 8TB Ironwolf, on 6.9.2. Honestly, I can't recall now if I did this before upgrading to 6.9.2 or after, but I'm pretty sure it was after. So on my system, I believe I was able to perform a single drive rebuild, and only the dual-drive rebuild was failing. I know we always get in trouble for not including Diagnostics, so I am including a few files: The 20210427-2133 diagnostics are from the forced powerdown two nights ago, on 6.9.2, when the UPS ran out of juice, and before I discovered that the two Ironwolf drives were disabled. Note, they might be disabled already in these diags, no idea of what to look for in there. The 20210420-1613 diagnostics is from 6.8.3, the day before I upgraded to 6.9.2. I think I hit the diagnostics button by accident. Figured it won't hurt to include it. And finally the 20210429-0923, is from right now, after downgrading to 6.8.3, and with the rebuild still in progress. Paul tower-diagnostics-20210427-2133.zip tower-diagnostics-20210429-0923.zip tower-diagnostics-20210420-1613.zip
  21. Everything these fine gents wrote is correct. I stopped development of UTT after Unraid v6.8 came out. There was some chatter that even v6.8 had some tunables that affected performance, and that what LimeTech was doing didn't work perfectly on all hardware, though as you can see it has been quiet here for well over a year, so I'm guessing the issues weren't enough for users to chase solutions. And perhaps LT did resolve some of those earlier v6.8 performance issues a few users experienced. Ultimately, my perspective is that beginning with v6.8, LT was actively working on internalizing performance tuning, and the need for UTT is no more. Additionally, the original major performance issue that I experienced on my hardware, that led me to create this tool, is gone since v6.8. So even if there were performance issues affecting some hardware configs, I'm lacking the motivation or time to troubleshoot them by revamping this code. I willingly pass the mantle on to anyone else that has a need to refine the code for newer Unraid versions. My shift has ended.
  22. Thanks Johnnie this is exactly the info I needed. I have created a "Frankenstore" backup solution (pic below), using 5 USB 16 TB drives. These are cheap drives, at ~$310 each, and even with 3D printing an exoskeleton for portability, wiring up a single power supply, and using a 7-port USB 3.0 hub with toggle switches, my total cost for an 80 TB backup solution is under $1700. The final solution is extremely portable, making it easy to take offsite for security. The 10A 12V power supply could easily support 6 drives, and possibly even 7, so I have a bit of room to grow to 96 or even 112 TB of backup capacity in the future, though for the next year 80 TB is plenty. The toggle switches on the USB hub are really cool, as it allows me to control the power-up order and get the same disk ID each time, though I'm not sure if that matters with the BTRFS pool. Of course, at such a low cost, I am expecting drive failures. Since this is primarily just an offline backup for my main array, I'm cool with taking that risk. When I read through your linked instructions, you talk about replacing a drive, but not specifically replacing a failed drive. Is the process the same, or will it be different? I'm assuming with a JBOD, I only lose the data on the failed drive, plus any files that might have been split across two drives onto the failed drive - I don't suppose there is a way to prevent splitting files across drives in a pool, is there? Also, with Unraid v6.9 in the wings, is using UD still the right way to go? I'm running 6.8.3, and do not run beta or even RC on my production server. Do you know if my UD BTRFS JBOD pool will migrate to v6.9's new multi-pool functionality, or would I have to recreate it from scratch and re-do my backup? Thanks! Paul
  23. I don't know that DPI really matters for monitors like it does for printing. Most DPI for monitors is 120 or below. What is probably more important is simply having a physical size large enough to cover a high-resolution 4K monitor. So if every banner image was sized to 3840 x 200, that would be high enough resolution to cover 4K widths, and easily scale down to lower resolutions, i.e. 1920 x 100 for a standard Full HD monitor. I don't know if there is an official banner height, but when I investigated it a while back I was coming up with a size of 91 pixels high, which seems a little odd. Perhaps it is correct, I don't know. If 91 pixels high, then that could mean we want to target 3840 x 182 as a banner size, and scale down from there. But then again, that might cause problems for even lower resolutions, as narrower windows would zoom in further, and keeping the aspect ratio locked would cause the image to run out of pixels height-wise. Perhaps we have to plan for a minimum width, i.e. 960 x 91, which would scale up to 1920 x 382 and 3840 x 764 1920 x 182 and 3840 x 364. If every Banner was 3840 x 764 3840 x 364, that is a still reasonable 3 megapixel 1.4 megapixel image. EDITED to correct some crazy typos. I'm actually decent at math... no really. I'm sure you're right, but I was thinking it might be nice to have some artist community commentary on the requirements before making a feature request. Perhaps other banner creators have some unique needs that I haven't thought about.