Geck0

Members
  • Posts

    102
  • Joined

  • Last visited

Everything posted by Geck0

  1. Hi, I'm mainly using Seagate Ironwolf Pros in my rig. The latest ones, which are 18TB - 20TB are showing up as SkyHawks when I register them on Seagate's registration portal. Does anybody know if there is a reason for this, before I contact them? On another note, I'm starting to rethink my unRaid ecosystem, I seem to have had at leat 3 IronWolf Pro failures in a couple of years. The only reason I've stuck with them is the 5 year warranty. Although, it looks like Toshiba NAS Pro is also 5 years.
  2. ST18000NT001-3LU101 Seagate Ironwolf Pro NAS. Installed for a year, ready as a spare and precleared. Added to the Array for a couple of days and had pre-allocation and uncorrectable errors. Will be an RMA warranty.
  3. Hi, This isn't a request for help, its more of a situation recovery for those that end up in the same place. Jump to the bold underline below, if you just want the final solution. I've been on a mission to consolidate photos that are on different devices onto unRaid. Going through a bunch of portable SSDs and thumb drives, I plugged in one of the SSDs which contained an opnSense installation. From what I can tell, it mounted in root. The GUI became unresponsive, although I could still ssh into the server. There was high cpu usage on shfs and then gzip (can't explain that) and I couldn't do a graceful shutdown, even after umounting the disks. I had to do a hard shutdown. After a reboot, the GPU wouldn't pass through and I realised that one of my cache drives had disappeared. I tried rebooting and checking CMOS, but the drive wasn't listed. It wasn't listed under shell's df command either (why would it, when its not in the CMOS). I started the array without it and everything seemed normal, except the cache drive contained all my VMs. Long story short, I came across a post where somebody had a similar issue and couldn't recover from a reboot, but did after a power cycle. The power cycle helped rediscover the drive and adding it back into the array, it seems everything is fine, with no extra steps needed. I'm running a parity check to see if there is anything else. Losing the VMs would be a pain, but not so much as removing the m.2 drive, as its inside the motherboard and its a watercooled setup. I think its time to build a second unRaid server that can do scheduled backups. The other lesson learnt, is to be careful what is automounting, including your own unlabelled drives.
  4. My two cents (FWIW), I used to be loyal to the Sandisk brand. From SD cards for photography, right up to SSD. Since becoming an unRaid user, I've burnt through at least 2 Sandisk USB sticks, one was a Sandisk Cruzer and I forget the other. As a side note, I've also burnt through a Sandisk SSD, acting as a cache drive in less than 6 months. I just don't think Sandisk has the same credibility it used to have. I've burnt through one Samsung key, which looks identical to the Samsung FIT and probably is one. An identical one is currently running my unRaid and has been great for the last couple of years. I own at least three Samsung portable SSD, T5 and T7 with no reliability issues, again as a side note. I'm pretty trustworthy of the Samsung product range, if not the company, because of their poor customer service.
  5. Okay, thanks for replying guys. Appreciate it.
  6. Hi JorgeB et al, I've had an interesting week. Drive 5 started failing today, it kicked off with reallocated sectors, which increased from 17 to 126 within 4 hours and then up to 215 after another 45 mins. It also came up with a pending sector of 1, which later returned to normal. The disk then went off line, after becoming "uncorrectable is 1" and entering "Disk 5 in error". Fortunately, I still had a brand new 18TB on standby, already hooked up. I've started a rebuild. The original disk can still be mounted, but I've left this alone for now, in case the rebuild fails. I've not had two drives with errors in the same week before. Can you advise if there is anything else I should consider? I'm not aware that a faulty cable or disk controller could cause this issue, I'm just wondering if there is anything else to look at? The two drives this week are both Iron Wolf Pro and purchased a couple of years apart. The one that is failing today is only a couple of years old. It failed the extended SMART test and dropped like a rock from there. I'm starting to rethink the quality of Seagate's drives. nexus-diagnostics-20240306-1704.zip
  7. Hi JorgeB, I've completed the extended smart test, it came back as "completed without error". The Extended smart test results; The parity test completed today and came back with no issues. I was running a backup of my Nextcloud data and noticed in the logs that a number of excel files had an md5 hash difference from the last backup. All of them are on disk1. I've only just found them and still need to compare to see if there is an issue with the server side ones, as it may be the backup drive thats at fault here. However, it makes me nervous that there are other issues as well, I don't backup the entire drive, just the important data. I'm not great at reading SMART drive results, is it worth swapping out the drive and performing a rebuild from parity? Do you mean corrected from parity or reallocated sectors? I'm not sure what happens in this instance, but have this concern that corrupted files have been written to parity. Any input would be appreciated.
  8. Hi Jorge, I cancelled the existing parity check and put the array into maintenance mode. I'm currently running an extended smart test. I'll reverr after it completes. I have got a new drive on standby if I need to swap out. Thanks for taking the time to respond.
  9. Hi, the weekend has greeted me with read errors on Drive 1. Unraid was in the middle of a monthly parity check, which I've now paused until I've received some feedback. Under "Fix Probles", it states this Drive hasn't been disabled, a short SMART test shows no errors. Disk log information is .....and more importantly here are the diagnostics attached. I would appreciate it if somebody could cast their eye over this, I've taken docker offline and paused the parity check. nexus-diagnostics-20240302-0902.zip
  10. This is the solution. Thanks for your help, saved me some time.
  11. I came across this post elsewhere. I thought it may be useful for those people still struggling. The Resizable Rebar setting in bios may be an issue for some people, if you still cannot get your card passed through after all of the above, then read this.... solved-rtx-3090-gpu-passthrough-just-displays-a-black-screen-with-qemu Hope this helps others that may be at their wits end.
  12. These are my unRaid boot settings. Note, that I don't have UEFI boot option enabled, despite it being on as an option in my bios, "OTHER PCI DEVICE ROM PRIORITY: UEFI Only". Leave CSM enabled in the BIOS. Don't set unRaid to boot into Gui mode, you can connect to it remotely via VNC. Remember, you're trying to pass through your graphics card, you cannot have it running in gui to do this, unless you're planning on using two graphics cards and giving one to the gui on boot. It doesn't sound like thats your plan. On another note, Spaceinvader, like a lot of people on Unraid are simply smarter than I and his guides are great, but it simply didn't work for me in getting the graphics card to pass. I'm tempted to upgrade my bios, as its several updates old and I shudder at the thought, as its likely to break my IOMMU set up for my VMs. Repost your settings as in-line into your next post, if you need further help. Nobody is going to download all those BMP. You can literally print screen and paste into your post.
  13. I have an Aorus X570 Extreme, I know its a different motherboard but the bios will be similar (me thinks). Have a look at this post guide that I did, it saved a few people the headache that I went through. I didn't download all your images to look at, its too much work. I suggest adding them to your post in-line instead of as attachments. The first section shows my bios settings. Note that I used: GPU RTX 3090 PASSTHROUGH TO WINDOWS 10 GUIDE WITHOUT VBIOS (CODE 43 BYPASS)
  14. Hi, Change this in your nginx configuration file. Mine is done on my reverse proxy nginx. Are you using a reverse proxy? If you're still stuck then respond and I will see if I can help with examples.
  15. Hi, if you've installed or upgraded to Nextcloud 27 and above then you will find that the command path that you've entered no longer works. I figured this out yesterday. Instead to access occ do this: docker exec -it nextcloud /bin/bash occ (followed by the rest of the command that you want) For example: Let me know if that helps or you have found anything else that I may have missed.
  16. ECC post Hey /0, Check this post, it may be the answer to your query, I seemed to recall seeing this not too long ago.
  17. I use Discord for notifications to the app on my phone. It works really well.
  18. Hi Veer, did you get this sorted? I wrote a fairly decent guide for this on the forum. I did this without the bios dump file.
  19. @Michael_P My tolerance is low. I wasn't referring to removing the drive, but excluding it from Share folders to stop any further writes to it. I haven't got much of a choice at the moment, I have to wait a couple of weeks for a new drive to show up. The one that's causing me aggravation was actually a brand new, redundent drive which I had in my case for a couple of months, ready to go in case there was an issue like this or low disk space. I ended up allocating it to the array recently and probably should have ordered another one at the same time, now its a 2 week wait for a new cache drive and a replacement drive for the array to show up.
  20. @JorgeB Thanks for your input. I'm going to keep it running. I've ordered a new 20TB, but it will take a couple of weeks to get here. @itimpi I agree with your comments. Cabling was the only issue I could think of. I'll use it as normal and keep an eye on the reallocation count. I don't know if its neccesary to exclude the drive from Share folders in the meantime?? I'm guessing a new drive virtually empty, with no increase in reallocation has a relatively low risk, as there is space for further reallocation. For those that follow, I hadn't seen this on the Unraid wiki page; Understanding Smart Reports
  21. Hi, My latest disk, which is disk 5 Ironwolf Pro ST16000NE000 has a Reallocated Sector Count today of 352. This drive is my newest addition. One of my cache pools, single SSD Ironwolf Pro failed yesterday and had to be removed. I took it out of the pool but didn't disconnect it and it was still causing the log to fill up. This resulted in not being able to shutdown the server properly and I eventually had to a hard shut-down. I'm now seeing this, which may not be be related but all my other drives are Reallocated Sector Count = 0. To me, it seems quite a high number for a new drive. I'll do an extended smart test in a sec. But I wanted to get some advice on whether I should move the data off there and run an extended pre-clear or keep going in the mean time until I receive a new drive and let parity rebuild it? I don't want to face data corruption, so whatever is best and safest to do. Currently, there is very little on the drive. The only other thing I can think of is if I managed to knock a cable when removing the failed SSD, which I doubt, its a spacious case. The drives are connected to an LSI controller. I've attached the diagnostics. Any help is always appreciated, I probably need to do a bit of reading on what to look out for on these SMART reports. nexus-diagnostics-20230516-1817.zip
  22. I ran btrfs dev stats -z /mnt/cache_nvme/ as per this excellent post listed earlier in this post by JorgeB and a scrub afterwards. I already had the script running to detect pool errors. The issue appeared after the original drop out that sparked off this post. I didn't realise the stats needed resetting. For those that follow, I suggest seeing this post BTRFS Issues
  23. It occurred to me that the btrfs stats would still be static after the first issue of the second drive dropping out. I ran this; ...and again; I ran an rsync before this, twice to two separate drives as a backup, so I'm hoping this may be resolved. Is it possible after re-adding the second drive back into the pool and it rebuilds that the btrfs error messages were still reporting the original issue? Is there a way to see what files were affected by the 6 corruptions? I've also added my latest diagnosticsnexus-diagnostics-20230423-2024.zip
  24. It appears that its only the one drive. Would the following steps be reasonable; 1. Use Mover to move contents of cache to the array. 2. Take cache offline, reformat and then use mover to move appdata, etc back to the cache. I've run a balance and a scrub since originally posting this new problem.