Pauven

Members
  • Posts

    742
  • Joined

  • Last visited

  • Days Won

    7

Pauven last won the day on August 11 2019

Pauven had the most liked content!

Converted

  • Gender
    Male
  • Location
    Atlanta Metro Area

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Pauven's Achievements

Enthusiast

Enthusiast (6/14)

122

Reputation

  1. I did see that, but then you appended with your edit and I thought you were changing your answer, hence my confusion. I currently have 78.6 TB of data backed up in this pool, as-is. If I follow those steps, is there any risk I could lose that data and have to repopulate the back-up? It was over a week of copying, I don't want to have to do that again. If I'm understanding you correctly, I can remove the 5 disks, delete the history, then 1 at a time insert the disk and rename it to the same pool name, delete my history again just to make sure, and then the next time I bring in all 5 drives at the same time, they will appear as a single pool. Does that sound right?
  2. So does that mean this isn't possible? Sorry, I got confused. Since the mount point has to be the disk label, and it won't let me rename to an existing value, that makes it impossible to do the solution you offered, right?
  3. I just tried changing the Mount Point to all be the same, and it won't let me. It reports "Fail". I think it is because it's changing the disk label and the mount point at the same time. Is there a trick to doing this? Errors in the log: Apr 10 17:24:51 Tower unassigned.devices: Error: Device '/dev/sdx1' mount point 'Frankenstore' - name is reserved, used in the array or by an unassigned device.
  4. Hey dlandon. Thanks so much for this awesome tool. I've been using it for years, and it's been a big help for certain tasks. One of the things I occasionally use it for is for a removable btrfs JBOD drive pool of 5 USB HDD's. It's so easy to plug it in, mount it, run my rsync backup job, then put it back in offline storage when I'm done. I love the fact I don't have to stop/start the array to use it, that I don't get warnings when I unplug it, and that I don't get any fix common problems warnings for duplicated data on a cache disk. I was recently sharing my solution with some fellow users, and I discovered that the tutorial for how to create the btrfs drive pool for UD was removed. I reached out to JorgeB and he restored that post so now I have those instructions again. While working with these other users on how to do this hot-pluggable backup pool, and comparing with how it works using stock Unraid pools, a few things cropped up that I wanted to ask you about. After all, UD is the best tool for creating hot-pluggable drive pools that are normally stored offline, but there are a couple things Unraid pools do a bit better. First, when mounting the 1st pool device, the buttons to mount the other devices remain enabled. One of my fellow users got confused, and clicked mount on all devices, and then saw the pool was mounted multiple times. Would it be possible to both make it more obvious that all the drives in the pool are now mounted, and to disable/hide the mount button on the other drives? Currently the only indication is the partition size on the mounted drive. Perhaps even the other drives that got mounted in the pool can be inset to the right, beneath the parent, to better indicate what is going on. Second, would it be possible to add a feature in the GUI to add a partition to an existing pool? I believe that Unraid pools let you do this, but in UD you have to go out to the command line and do the btrfs dev add... command to add the partition to a mount point. I know it's a pretty easy command line, but some users are very uncomfortable with the cmd line and prefer the GUI approach. I know most people seem to think that Unraid pools are the only game in town now, even your own documentation states to use them. But for hot-pluggable, removable drive pools, UD is so much better, I hope you continue to support and enhance this capability. Thanks!!! Paul
  5. Awesome, thank you Johnnie (should I still call you Johnnie, or Jorge, or something else?), that's exactly what I needed. I was surprisingly close in my recreation of the steps based upon my research, but was full of doubt. You're extremely helpful as always. 😊 Another user did a test and discovered he was able to mount a pool created in Unraid using UD, no big surprise I guess since these are just standard btrfs pools. So for some users it might be easier to create the pool using Unraid, remove it, delete the definition, and then use UD from then on for hot-plugging. I would definitely use the Unraid pools feature if it more gracefully handled hot-pluggable backup pools, and didn't require the stop/start. I'm not complaining, though, since UD does this extremely well.
  6. Hey Johnnie/@JorgeB, I could use some help on this. Side note, your new username and logo had me all confused, I couldn't figure out how you seemed to have been here for years/decades, yet I didn't recognize the name. I finally figured out your provenance, though I'm still baffled by the user name change. Anyway, to my issue. I created a portable backup drive pool, as described above, with Unassigned Devices back when I was running 6.8, using the directions you linked to above. I plug it in 2-4 times a year and do a backup, it's fantastic. Those instructions (which I think you wrote) have since been deleted, since the preferred way is to use the multiple drive pools feature in 6.9. But the functionality in 6.9 is not the same. If you create a drive pool and then unplug it, Unraid is unhappy about the missing drives. You can make the warnings go away if you delete the pool, but then you have to make sure you add recreate the pool with all the drives back in the correct order before doing your next incremental. You also have to stop/start the array to do any changes to the pool. Unassigned Devices did this particular task so much better. No warnings, don't have to delete the config, just plug it in and mount it, don't have to stop the array. While I can understand that the preferred method is to use Unraid for multiple permanent drive pools, I don't understand why the documentation for doing it with UD was deleted, as that still serves a niche. I'm trying to help some other users get up and going with the solution I'm using, and since I can't find the documentation I can't fully help them. I think there were some command lines I used when setting up the btrfs pool as jbod, possibly related to formatting but I don't recall. I also need to expand my UD backup drive pool soon, almost ran out of space on my last backup so I need a 6th drive, and I'm worried I won't be able to do this correctly without the instructions. Even the UD support thread points to the now deleted instructions, and the internet archive doesn't have any successful copies of the FAQ. Is this something you can help with, or point me to someone who can? Thanks! Paul
  7. Thanks JorgeB. I've followed your advice and ripped out the Highpoint 2760A. I installed a couple Dell H310's, combined with 8 SATA ports on my motherboard, to get back to 24 ports. So far it's been smooth sailing, but my Call Trace problems don't usually crop up for a couple weeks, so I'm not in the clear yet. Fingers crossed.
  8. I've been searching the forum, trying to see if any other users have the same issue. I do see plenty of call trace reports, but so far none have matched mine. My log just keeps repeating the same info over and over. What I posted above was just the errors, here's the full detail for a complete error segment: Apr 1 13:50:20 Tower kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Apr 1 13:50:20 Tower kernel: rcu: 10-....: (2 GPs behind) idle=61e/1/0x4000000000000002 softirq=118394498/118394499 fqs=10418875 Apr 1 13:50:20 Tower kernel: (detected by 8, t=42541182 jiffies, g=291127741, q=55535900) Apr 1 13:50:20 Tower kernel: Sending NMI from CPU 8 to CPUs 10: Apr 1 13:50:20 Tower kernel: NMI backtrace for cpu 10 Apr 1 13:50:20 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 13:50:20 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Professional Gaming, BIOS P4.80 07/18/2018 Apr 1 13:50:20 Tower kernel: RIP: 0010:mvs_slot_complete+0x31/0x45f [mvsas] Apr 1 13:50:20 Tower kernel: Code: 00 00 41 56 41 55 41 54 55 53 89 c3 48 6b cb 58 48 83 ec 18 89 44 24 10 83 c8 ff 89 74 24 14 4c 8d 34 0f 4d 8b be 08 fd 00 00 <4d> 85 ff 0f 84 16 04 00 00 49 83 bf e8 00 00 00 00 0f 84 08 04 00 Apr 1 13:50:20 Tower kernel: RSP: 0018:ffffc900003c0e78 EFLAGS: 00000286 Apr 1 13:50:20 Tower kernel: RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000 Apr 1 13:50:20 Tower kernel: RDX: 0000000000000000 RSI: 0000000000010000 RDI: ffff888138a80000 Apr 1 13:50:20 Tower kernel: RBP: ffff888138a80000 R08: 0000000000000001 R09: ffffffffa02eda65 Apr 1 13:50:20 Tower kernel: R10: 00000000d007f000 R11: ffff8881049a9800 R12: 0000000000000000 Apr 1 13:50:20 Tower kernel: R13: 0000000000000000 R14: ffff888138a80000 R15: 0000000000000000 Apr 1 13:50:20 Tower kernel: FS: 0000000000000000(0000) GS:ffff888fdee80000(0000) knlGS:0000000000000000 Apr 1 13:50:20 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 1 13:50:20 Tower kernel: CR2: 00000000002a925a CR3: 0000000281a36000 CR4: 00000000003506e0 Apr 1 13:50:20 Tower kernel: Call Trace: Apr 1 13:50:20 Tower kernel: <IRQ> Apr 1 13:50:20 Tower kernel: mvs_int_rx+0x85/0xf1 [mvsas] Apr 1 13:50:20 Tower kernel: mvs_int_full+0x1e/0xa4 [mvsas] Apr 1 13:50:20 Tower kernel: mvs_94xx_isr+0x4d/0x60 [mvsas] Apr 1 13:50:20 Tower kernel: mvs_tasklet+0x87/0xa8 [mvsas] Apr 1 13:50:20 Tower kernel: tasklet_action_common.isra.0+0x66/0xa3 Apr 1 13:50:20 Tower kernel: __do_softirq+0xc4/0x1c2 Apr 1 13:50:20 Tower kernel: asm_call_irq_on_stack+0x12/0x20 Apr 1 13:50:20 Tower kernel: </IRQ> Apr 1 13:50:20 Tower kernel: do_softirq_own_stack+0x2c/0x39 Apr 1 13:50:20 Tower kernel: __irq_exit_rcu+0x45/0x80 Apr 1 13:50:20 Tower kernel: common_interrupt+0x119/0x12e Apr 1 13:50:20 Tower kernel: asm_common_interrupt+0x1e/0x40 Apr 1 13:50:20 Tower kernel: RIP: 0010:arch_local_irq_enable+0x7/0x8 Apr 1 13:50:20 Tower kernel: Code: 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 d2 39 c5 Apr 1 13:50:20 Tower kernel: RSP: 0018:ffffc9000016fea0 EFLAGS: 00000246 Apr 1 13:50:20 Tower kernel: RAX: ffff888fdeea2380 RBX: 0000000000000002 RCX: 000000000000001f Apr 1 13:50:20 Tower kernel: RDX: 0000000000000000 RSI: 00000000238d7f23 RDI: 0000000000000000 Apr 1 13:50:20 Tower kernel: RBP: ffff888105d0d800 R08: 00028e38c0a3fe38 R09: 00028e3ab9ddf5c0 Apr 1 13:50:20 Tower kernel: R10: 0000000000000045 R11: 071c71c71c71c71c R12: 00028e38c0a3fe38 Apr 1 13:50:20 Tower kernel: R13: ffffffff820c8c40 R14: 0000000000000002 R15: 0000000000000000 Apr 1 13:50:20 Tower kernel: cpuidle_enter_state+0x101/0x1c4 Apr 1 13:50:20 Tower kernel: cpuidle_enter+0x25/0x31 Apr 1 13:50:20 Tower kernel: do_idle+0x1a6/0x214 Apr 1 13:50:20 Tower kernel: cpu_startup_entry+0x18/0x1a Apr 1 13:50:20 Tower kernel: secondary_startup_64_no_verify+0xb0/0xbb Comparing with others, and taking a closer look at the output in my log, I'm noticing a few too many [mvsas] related entries. That's for my Marvel based Highpoint 2760A 24-port SAS controller. For years Fix Common Problems has been warning me about my Marvel based controller, but I ignore those warnings since I've never had any issues with it since I bought it in 2013. Almost 9 years of trouble-free operation all the way through 6.8.3. Maybe I'm jumping to conclusions and the issue is something else. Can anyone tell?
  9. This appears to still be an issue for me. Need help to move forward. Quick recap: Last year I upgraded to 6.9.2 and had issues with Seagate IronWolf (actually Exos) drives, plus the issue described here. I thought it was all related. I ended up rolling back to 6.8.3, and the issues went away. A little over a month ago, 6.8.3 stopped working correctly for me, I believe due to an incompatible Unassigned Devices update. About a week ago I decided to apply the Seagate drive fix (disabling EPC) and try upgrading to 6.9.2 again. I thought everything was successful. Multiple spin-ups/spin-downs, a record-fast parity check, and a perfectly working GUI and Dockers and VM's, I thought I was in the clear. Which brings us to today. Being the 1st of the month, the parity check kicked off at 2am. When I woke up this morning, I found that parity check progress was stalled at 0.1% after 6+ hours, and several hours later I can confirm it's not moving. In general, the GUI feels responsive, letting me browse around, but I noticed that the Dashboard presents no data, the drive temps don't appear to be updating, and the CPU/MB temp and fan speeds are wrong and frozen. I connected to the Terminal and ran an mdcmd status to see if the parity check was actually running, but the mdResyncPos is frozen at 9283712. Best I can tell, it seems like Unraid is frozen, even though the GUI isn't hung. First things first, I decided to run diagnostics. An hour later, it still reads "Starting diagnostics collection...". JorgeB is right. I checked Unraid's System Log, and it is full of Call Trace errors: Apr 1 10:53:07 Tower nginx: 2022/04/01 10:53:07 [error] 9804#9804: *2157788 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.1.218, server: , request: "GET /Dashboard HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "tower", referrer: "http://tower/Main" Apr 1 10:53:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 10:53:19 Tower kernel: Call Trace: Apr 1 10:56:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 10:56:19 Tower kernel: Call Trace: Apr 1 10:59:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 10:59:19 Tower kernel: Call Trace: Apr 1 11:02:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 11:02:19 Tower kernel: Call Trace: Apr 1 11:05:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 11:05:19 Tower kernel: Call Trace: Apr 1 11:08:19 Tower kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: G S 5.10.28-Unraid #1 Apr 1 11:08:19 Tower kernel: Call Trace: Since running Diagnostics didn't work, I'm not sure what my next step should be. Do I need to gather more info, or is the issue already confirmed as a Ryzen on Linux Kernel related issue? Are there any solutions?
  10. Big thanks to @Cessquill for the write-up, and for all the other contributors to this thread! I just successfully fixed this on my server. I finally took the plunge after waiting half a year, as I started having major compatibility issues with Unassigned Devices on 6.8.3, and couldn't put off upgrading anymore. I followed all the steps that Cessquill outlined, and disabled EPC on my four Seagate ST8000NM0055 drives. I used the latest SeaChest Utility files from Seagate's website, downloaded yesterday. They appear to have changed again, and this time I used the files from path: \Linux\Non-RAID\centos-7-x86_64\ Oddly different than @optiman's experience with his ST8000NM0055's, mine all had Low Current Spinup disabled, so I didn't mess with that. I'm also running on a Marvel based controller, which probably creates a unique data point that this issue doesn't just affect LSI controllers. Last time I upgraded to 6.9.x, I had major issues, and could not get beyond 66GB of my parity rebuild, which is why I rolled back to 6.8.3. After applying the EPC fix and upgrading to 6.9.2, I've done multiple drive spindown's/spinup's with no issues, and a full parity check which completed in record time. It's perhaps too early to celebrate, but it does seem like the issue is resolved on my setup. I also have two pre-cleared ST8000VN0022's that are not in my array. They had both EPC and Low Current Spinup (Ultra Low) enabled. I decided to leave Low Current Spinup alone, but went ahead and disabled EPC for both of these drives. These will migrate into my array in the coming months, so I don't know yet how they'll behave. I don't even know if I would have had issues with them, but since other users here mentioned them I decided to play it safe. I also used SeaChest_Info to examine my other non-Seagate drives (surprisingly it works), and found that EPC exists and is enabled on my HGST_HUH728080ALE drives, but those don't cause any problems. I kinda hate that these Seagate Exos 8TB drives are such a good value, as they've become my chosen upgrade path, so now I'll have to remember to disable EPC on all new drives going forward. While I do like the HGST drives better, the price premium is just too much for a server this large. Thanks again!!! -Paul
  11. If this is true, then why does the problem only appear after upgrading to Unraid 6.9.x? I'd been running on 6.8.3 for a long time without issues. Last spring I upgraded to 6.9.3 and bam! the issues hit immediately. I never did the EPC fix. The problem was incessant on 6.9.x, and I didn't want to risk loosing data playing around with drive setting as I had 2 drives out and was already risking data loss, so I rolled back to 6.8.3, and the problem went away. Half a year later and it's been smooth sailing on 6.8.3. I stayed on 6.8.3 because it works and there wasn't anything in the 6.9.x branch I'm needing. Even the very first post here mentions that the problems started with 6.9.0, which 100% matches my experience. Perhaps what you are saying is that the problem lies in the Linux kernel or one of the various drivers that were upgraded in the 6.9.x releases, and the issue is not in any of LimeTech's Unraid code. That may be true, though I'm not sure I've seen it clearly detailed in this thread exactly where the problem lies, so I would appreciate pointers to any additional information I may have missed. It certainly seems reasonable to me that since a change in 6.9.x broke this, another change in 6.10.x could fix it, so I'm not inclined to give up hope entirely. And there have been many times LimeTech has chased down bugs in other components on behalf of their users - and this issue has been reported to them in more than one ticket so they should be aware of it, though disappointingly I've never seen them weigh in on the topic.
  12. I can confirm that ST8000NM0055 drives are most definitely affected by this issue. This bit me hard when I upgraded to v6.9.2 back in April. I had to roll back to 6.8.3 to recover from a dual-drive "failure" and inability to rebuild on 6.9.2, and never attempted any of the fixes posted here. I felt extremely lucky to escape without losing data, and I'm still running 6.8.3. optiman, glad to read this worked for you. Since it has been a couple weeks, is your system still okay? I'm starting to feel a little trapped on 6.8.3, so I'll probably have to apply this fix. Since we both have ST8000NM0055 drives, your results matter most to me. I was hopeful that this was a bug in 6.9.x that would be fixed in 6.10, and that I wouldn't need to do the drive fix. Came here to see if anyone had tested this on 6.10 without applying these fixes, but no dice. Paul
  13. Not the restore Unraid version feature (which I used) but rather a restore flash drive from backup. I had to manually copy some config files from the flash drive backup to get 6.8.3 working correctly. It took me a while to figure out which files needed restoring. Some type of automation here would have been nice. Really cool if it was integrated into the restore Unraid version feature - it could prompt to optionally restore certain files from an existing flash drive backup. That could certainly be the issue. But no way I'm going back to 6.9.2 on my production server to gather diags once it fails. I'm still 4 hours away from a full recovery, and I'm not into S&M. I know it's my personal perspective, but I feel that if 6.9.x issues as bad as this, it shouldn't be considered "stable". I wasn't gearing up for a testing run, I was upgrading my production server to a "stable" dot-dot-two release, with a reasonable expectation that the kinks were worked out, and with no awareness that I could be signing up for data loss. I was completely unprepared to deal with these issues, and my main goal was simply surviving.
  14. Cross-posting here for greater user awareness since this was a major issue - on 6.9.2 I was unable to perform a dual-drive data rebuild, and had to roll-back to 6.8.3. I know a dual-drive rebuild is pretty rare, and don't know if it gets sufficiently tested in pre-release stages. Wanted to make sure that users know that, at least on my hardware config, this is borked on 6.9.2. Also, it seems the infamous Seagate Ironwolf drive disablement issue may have affected my server, as both of my 8TB Ironwolf drives were disabled by Unraid 6.9.2. I got incredibly lucky that I only had two Ironwolfs, so data rebuild was an option. If I had 3 of those, recent data loss would likely have resulted. Paul