GMAsterAU

Members
  • Posts

    32
  • Joined

  • Last visited

Everything posted by GMAsterAU

  1. Thank you Dave, I didn't think to look. I was working on a solution to essentially implement a similar function using user scripts. Maybe you find some of the code useful in implementing the logic: #!/bin/bash #VARS=$(awk -F= '/\[/{prefix=$0; next} $1{$1=$1; print prefix $0}' OFS='=' "/var/local/emhttp/var.ini") check_interval=300 parity="off" counter=0 Script="/boot/config/plugins/user.scripts/scripts/PLACEHOLDER/script" ####check if parity check is running paritycheck(){ VARS=$(awk -F= '/\[/{prefix=$0; next} $1{$1=$1; print prefix $0}' OFS='=' "/var/local/emhttp/var.ini") test=$(grep -wn "mdResyncPos" <<< "$VARS") test=$(grep -oP '"\K[^"\047]+(?=["\047])' <<< "$test") #echo "$test" if [ $test == 0 ] && [ "$counter" == 0 ]; then echo "Parity Check / Sync / Rebuild not running." echo "Stopping here" exit 0 elif [ $test == 0 ] && [ "$counter" != 0 ]; then parity="off" runscript elif [ $test != 0 ]; then if [ "$counter" == 0 ]; then echo "Parity Check / Sync / Rebuild in progress." parity="on" else : fi fi } ####continue to recheck if parity is running paritycheckstatus () { while [ "$parity" == "on" ] do echo "Parity operation still running" echo "Waiting $check_interval seconds to check again" ((counter=counter+1)) echo "$counter" sleep $check_interval paritycheck done } ####run script when parity check has completed runscript() { bash "$Script" } ##execute arraycheck paritycheck paritycheckstatus My Bash is not the cleanest but it did do the job technically; however I had issues with executing the whole thing reliably from the main server to start it, execute the parity check and then shut down again.
  2. I just noticed that on Version: 2023.07.08 an option for 'Shutdown server when array operation completes:' exists, however it is greyed out. Is there a condition that has to be given to enable it? Thank you for all the great work with this plugin
  3. Just tested it on my system and original files in the extra folder were deleted before the nertool packages were installed. Any info I can provide you to help with this? before after
  4. You are totally right. I assumed that due diligence was implied with any purchase, especially with mission critical hardware. (The sellers I am referring to I have had very good experiences with and they often specify that this is local stock under local warranty, see below). I have another story to share on this note actually. In Australia we are in a bit of a funny position where not few American companies sell stock under Australian company names/divisions of their mother corps, however items get shipped from overseas or from OEM stock. One of those is Newegg. They have localisation in Australia and once I found an amazing deal for some hardware. Ordered it and found that I had to return some of the items. I was instructed by Newegg to contact the manufacturer as I was outside the US. So I did, turns out that the warranty on the items sold was restricted due to the products not intended for the Australian market. After asking further I was told that there is no physical, or spec difference. Just certain serial numbers are destined for certain locations. What they offered me to do was to give me a discount on a repair/return of a product that was faulty out of the box, even though under Australian law they have to provide me with a refund/exchange. So I went back to Newegg who after much debate (I did not expect having to argue with a customer rep about whether I was entitled to an RMA few days after the items arrived...) agreed to a refund under their international refund policy. Turns out that the warranty period they offer is way less than the manufacturer warranty that I do not have access to was I to purchase the item in Australia locally. Long story short, make sure not only that the sellers are reputable, but also where the item comes from (thinking of Amazon here in particular, even though I personally never had an issue with warranty returns via Amazon).
  5. Over the years I have migrated from WD to Seagate. Within Seagate from Ironwolf/Ironwolf Pro to EXOS. The reasons were in the beginning better pricing and speeds. I found that when I started out with 3TB drives on an experimental basis with WD Red drives that they were just not as performant as Seagate drives which were of a similar cost. As I learnt to better handle UNRAID and it started to mature substantially, its deployment in my environment also increased; in the beginning it essentially was a glorified homelab, however now it was becoming mission critical. The move to Seagate followed a thorough evaluation of drives and I decided to go with 8TB IronWolf drives (ST8000VN004). Unfortunately for me, those particular drives were plagued by constant issues, with drives seemingly uncontrollably and randomly dropping offline. This wasted a lot of time for me and pushed me to upgrade to 12TB drives (ST12000VN0008) just to escape the hellhole that was having to RMA drives constantly, to the point where I got to first name basis with the local Seagate rep, and waiting to rebuild the array and associated down time. [Just a brief note: this appears to be a predominant issue related to certain(?) LSI contollers. Once I moved to drives to a backup server without LSI they have been working fine]. The 12TB drives Ironwolf fared a lot better, however I must have gotten onto a bad batch and I was experiencing hardware failures at a much higher rate than expected. Drives would last anywhere from a couple days to a couple months. You may be familiar with the bathtub curve. Here I was with a semi stable system that was really not fulfilling its needs, while also needing to expand the storage one more time as our operational needs grew. That is when the 8 drive limit came into play... should one go lower, cheaper capacity over more drives or not.... I started experimenting with Ironwolf Pro and Exos drives and found that the difference in $/TB in my location was generally minimal; so it came down to features. Ironwolf Pro is marketed as a Prosumer drive for a NAS of up to 24 drives, with slightly worse performance across the board compared to EXOS. In the end I settled for the EXOS x16 16TB drives (ST16000NM001G). I have not experienced any drive failures in over a year of 24x7 operation (other than a DOA, but that may happen). I agree with @KingfisherUK, the higher speeds are a welcome bonus. Additionally, where I am, there are way better deals available for EXOS drives and often they show up in large quantities on commercial IT sites for low prices, reduced by as much as 50% of RRP. These sites are definitely not geared towards the regular home consumer. At the end of the day it comes down to budget and if you can save the money, go for it.
  6. @Squid thank you for a great plugin, which I personally use every day and that enables myself and the community to run our servers in such customised ways! Out of curiosity, which version of cron is currently used by the plugin? when using crontab.guru it warns that some expressions are not standard and may not run. Do you have experiences with using expressions such as 'run every 3rd month, Jan-Dec'?
  7. ok for me the (semi)-permanent solution was: set MB BIOS to 'PME wake from S5 state' and add ethtool -s eth0 wol g at the end of the GO file. Now as long as I do unplug the computer I can WOL and shutdown no issues. If I unplug the machine, then I have to manually start it and can proceed from there.
  8. I am having the exact same issue. when I start motherboard and go into bios, then shutdown, WOL works without issue; somehow the shutdown procedure overwrites the 'Wake-on:' setting of eth0 to 'd'. Did you ever find a permanent solution? I am also finding that the Sleep plugin does not allow me to wake the machine up again, no matter which letter is set for eth0. Update: when setting the controller manually to g with ethtool -s eth0 wol g then I can wake the server from sleep and it retains the setting.
  9. oh! thank you for that! It is true that I had no idea that there were any issues. It already corrected over 50 errors. I will post an update when it is complete. Thank you again!
  10. Pool Device is called "Backupmachine" and the new disk is sdg also called "Backupmachine 4" tower-diagnostics-20210716-2010.zip
  11. Thanks JorgeB. I just added an another disk to the pool and nothing is happening. It was renamed automatically in sequence, but is not encrypted (does not have the lock symbol on the disk icon) and shows no io at all. Am I missing something or do I have to empty the pool and redefine it from scratch to expand it?
  12. I am curious about the expandability and behaviour of pool devices (6.9.2). My current setup is the following: 3x3TB HDD System and Metadata in RAID 1 and Data in RAID 5, as it was recommended by Spaceinvader in one of his videos about Pool devices. He recommends RAID 5 for it is fault tolerant within itself. I use it for TimeMachine backups as I have had issues with speed and io bog-downs on the main array. My question is about what would happen in either of these scenarios: 1) One of the disks fails. Can I just swap it out and it will rebalance and recalculate appropriately? 2) I want to expand the size of the pool with a matching disk of equal size. Can I just add it to the pool and it should be automatically be added in an appropriate manner, or does it require moving all data and then setting up the pool device again?
  13. I just discovered the same issue. Copying 300 GB to my BTRFS encrypted Cache (RAID 5) makes the IO Wait time go through the roof, rendering the server useless for the time of copying. It completely normalises after the transfer is complete. I am running 6.9.1. Happy to run tests and provide information, as I can see this thread has been open for a long time. I sympathise with you @CowboyRedBeard
  14. what an expensive experience. Follow up question in terms of damage control: is there anywhere a folder tree saved so we can see what files/folders were lost in the process?
  15. @itimpiUFS Explorer is not looking like it produces anything valuable for me. Largely probably because only some of the physical data was written on the drive before it was formatted. Now that I have accepted that the data is lost, I am trying to use this as a learning experience. I may be misunderstanding this and clasping at straws here, but is there any way to 'roll back' a parity state in any way, or are all writes to parity permanent and irreversible? If the parity was intact and emulating disks, then the parity should still exist to get that parts of the data back perhaps?
  16. Hi all, I had two drives with the manufacturer for replacement. They arrived and I installed them for parity rebuild, however we suffered a major power outage. When the power came back on, I started the server and saw that the UI was saying that the two disks were unreadable and needed to be formatted. Even though it warned me that formatting is never part of the parity rebuild, I went ahead with it to then discover that half the data was lost. I know that this is my mistake, but somehow it made sense in my mind that the power outage may have caused some issues that made the drive unreadable to UNRAID. The disk were xfs encrypted. Is there any way to recover any of the data or is it gone for ever? I am currently trying to see if I can discover any files via UPS Explorer.
  17. How unfortunate. Well either way thank you very much for your help with this. It looks like after a lot of trial and error I have reached a stable server configuration again.
  18. @JorgeB I am amazed! so far the swap worked fine. No errors reported with the same use, after more than 2 days when it previously used to show errors after 1 day. What I did was change the RAID card being connected from the 8TH Seagate Ironwolf drives to two 3TB WD drives and now there are no issues so far. Is there a way to raise this with UNRAID? I imagine I am not experiencing this as an isolated case
  19. sure! I will try that tomorrow morning and report back
  20. yes they are; both are 8TB IronWolf
  21. thank you @trurl. Do you have a recommendation on how to proceed? I was thinking to wait for the replacement drives to arrive, rebuild the array and I also have a tiny cooling fan coming for the RAID card as I have read that temperature issues can lead to corruption.
  22. Hi all, so I did a lot of poking and testing and the symptoms keep getting weirder. As a note any and all of these issues have started and persisted with Version: 6.9.0-rc2 1. currently I have 2 disks sent away for replacement and they are missing from the array as discussed above. 2. I have discovered that two out of 4 disks that are connected to the RAID card (Silverstone ECS04), show errors after about 1 -2 days of Server up-time. However when I restart the server the errors are removed and everything is good again until the cycle restarts. 2.1 in response to the RAID card, I have increased cooling, however when I checked on its temps it did not exceed the manufacturers recommendations 3. as part of the whole 'weird errors are happening' situation, I have also discovered that user shares do not show up in the 'SHARES' menu, and when I connect to the server I can only see select trees and everything else is missing, requiring a restart to fix. Before restart: After restart: I have a couple key questions to understand what is going on: 1. does anyone know what kind of errors Unraid is recording and counting in the Main menu when the disk error rate goes up? 2. why do these errors get reset? 3. what could lead yo me having to restart the server to get it all sorted? 4. what governs the shares information and where is it stored? am I looking at a failed RAM module perhaps? thanks for all your help with this tower-diagnostics-20210226-0643_before restart.zip tower-diagnostics-20210226-0702_after_restart.zip
  23. thanks JorgeB I will give that a try and see what happens. It is still unclear to me how the SMART stats stay ok and the disk has been totally fine including a complete parity check.
  24. After almost a month without issues, the identical disk issue appeared again. Overnight at 2 am DISK 1 showed read errors again. Once again there are no SMART errors reported as far as I can see. Following our previous discussion, the bottom line then is that the disk is failing in spite of no SMART errors? Is there any way to know what kind of errors these are? tower-diagnostics-20210131-0809.zip