rclifton

Members
  • Posts

    29
  • Joined

  • Last visited

Everything posted by rclifton

  1. From Unraid 6.9 and up btrfs volumes are supposed to be mounted with the discard=async mount option which should mean that trim isn't needed. However, I have found that if I stop using trim, after a few days the performance of my nvme cache drive drops dramatically so I have left it installed. It could just be my use case, I don't know. My cache drive has several hundred GB's of writes daily with the files being then moved from the cache onto either my array or sent to my TrueNAS server running on another machine. I still see, the details of how much space was recovered every time trim executes and do not have any conflicts with it and V-Rising or any other games I host. How is your drive connected to your system? Is it nvme? Connected to onboard sata port? Or is it connected to a HBA or another way?
  2. I run trim hourly on my cache drive and have no issues with stuck threads, or anything else. The world has been up 12 days since the last reboot and everything is fine. The one difference I do notice between us tho is that my cache is formatted btrfs not xfs, not sure why it would matter but perhaps that is the issue if it is somehow related to trim.
  3. I've been using this container for years and it's been great! Thanks first of all for all of them as I use quite a few! Something has broken tho and I'm not sure where/what it is. I was on the server yesterday messing with some commands for the chia container I had just installed and saw that SAB had just downloaded something, so I know it was working as late as yesterday afternoon. This morning when I logged in I noticed the container was stopped which I thought was odd. Trying to restart it resulted in an immediate "execution error" popup. I tried looking at the logs but it looked like the container was dying before it even started. Not sure what else to do I deleted the container thinking I would just download it again and everything would be fine, except when I try to pull down the container I am getting the following message and the pull fails. Pulling image: binhex/arch-sabnzbd:latest IMAGE ID [960334309]: Pulling from binhex/arch-sabnzbd. IMAGE ID [701d67ccb854]: Already exists. IMAGE ID [ebf9b61b3eda]: Already exists. IMAGE ID [c295ce7a4387]: Already exists. IMAGE ID [6295d38d4fc8]: Already exists. IMAGE ID [61bd528f9496]: Already exists. IMAGE ID [5ec61f70bff6]: Pulling fs layer. IMAGE ID [02f939f4c3b7]: Pulling fs layer. IMAGE ID [312948fc17be]: Pulling fs layer. TOTAL DATA PULLED: 0 B Error: open /var/lib/docker/tmp/GetImageBlob914982683: no such file or directory **EDIT** Nevermind, slowly over the course of today most of my containers stopped and would not restart. It appears something was corrupted somehow, I deleted the docker directory and was able to pull down all my containers again. Everything is back to 100% now...
  4. Because I will do a parity sync once everything is back in place, But mainly because copying 4TB+ of data without parity copies at about 160MBps and with parity it's more like 70-90MBps.. I've got backups of everything so I really don't see the need to write parity for all that data when I'm going to sync it all after the copy anyway..
  5. Thanks! And now that I've done this and can see what has happened. I'll mention in case someone in the future finds this post. Reformatting the drive I assume essentially zeros out the parity for that drive as Unraid will not attempt to rebuild it after you format it so make sure you have a backup of what's on that drive specifically and not just a full backup. I have now disabled my parity drive and am copying over the contents of what was on that specific drive now. Once that is finished I will put the parity drive back in place and run a parity sync. Thank you for all the help!
  6. Yes, you did. And I thank you for the help, but at the risk of sounding like an idiot basically I'm asking what do I do after I format it? Do I recopy the data back to the same drive and put it back and Unraid thinks everythings a-ok now? Do I put the drive I copied everything to into the old drives spot and Unraid will figure it out? Or something entirely different that I'm not seeing? I guess I'm kind of confused because telling me to format the drive doesn't help when you also said parity can't help me, but if I format the drive what fixes the problem of getting the data back safely? I hope all of this actually makes sense, thanks...
  7. Hi, On Sunday I had a freakout moment and noticed some data was missing (thread here). I followed the link that was posted in one of the responses and ran the restore command and copied the data to a spare drive. I then went one extra and copied all the data off the array using krusader to some usb drives. After doing all that I ran the btrfs check --repair /dev/md4 command and it found and corrected errors. The problem is after rebooting the system is still reporting the same errors on drive 4. I'm not really sure what to do at this point. I know I could just blow it up and restore from my backups, but not sure I really want to go that route before exhausting any other avenues.. At this point could I run the BTRFS check --repair command again and then power down the system, remove the drive, format a spare and copy the contents from the old drive that I already backed up onto the new drive and then restart? Or something else? I'm kind of stuck at this point as I'm not real sure where to go from here at this point... I pasted the output of the BTRFS check --repair command below : The operation will start in 10 seconds. Use Ctrl-C to stop it. 10 9 8 7 6 5 4 3 2 1 Starting repair. Opening filesystem to check... Checking filesystem on /dev/md4 UUID: f10d28a7-144b-4b86-8489-c8be6efcc9f8 [1/7] checking root items Fixed 0 roots. [2/7] checking extents parent transid verify failed on 87638016 wanted 25935 found 25928 parent transid verify failed on 87638016 wanted 25935 found 25928 Ignoring transid failure bad block 87638016 ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots parent transid verify failed on 87638016 wanted 25935 found 25928 Ignoring transid failure (the above line repeated several 100x and was removed) Ignoring transid failure Wrong key of child node/leaf, wanted: (65781, 1, 0), have: (256, 1, 0) Wrong generation of child node/leaf, wanted: 25928, have: 25935 Deleting bad dir index [6934,96,5] root 5 Deleting bad dir index [63235,96,3] root 5 Deleting bad dir index [63235,96,4] root 5 Deleting bad dir index [63235,96,5] root 5 Deleting bad dir index [1264,96,18] root 5 Deleting bad dir index [1254,96,5] root 5 Deleting bad dir index [6934,96,6] root 5 ERROR: errors found in fs roots found 4022054645760 bytes used, error(s) found total csum bytes: 0 total tree bytes: 30851072 total fs tree bytes: 2523136 total extent tree bytes: 26148864 btree space waste bytes: 7056862 file data blocks allocated: 545990963200 referenced 544234590208 Thanks for any help!
  8. I have not yet, most of them if I'm reading right, simply copy the data to another drive. I plan to do that later this afternoon and then run the check --repair command and if it fails I'll nuke it all and copy it all back over.. Either way it looks like I'll be copying all the data, I just don't have enough USB drive's for a complete backup So I plan to pick some up later this afternoon..
  9. I'm not sure at this point if it would be easier to copy as much of the array as possible onto some USB drives and then just nuke it and start over. Or nuke it and reload my backup, which is on a system I literally just moved to my sister's house a few weeks ago and will probably take a week at least to download... Sometimes this is all a little to much like actual work lol...
  10. I've got a spare drive, can I just pull this drive and replace it with the spare and then rebuild from parity? I recently moved my server into a new case and I'm wondering if something happened during that move (I very briefly attempted to use a different controller card and think that might have caused this). If I can just remove and replace the drive, I'll reformat it on another system and then add it back to the server and run preclear on it to see if there is actually a real issue with the drive or if I caused it..
  11. Attached is a copy of the diagnostics. I ran a BTRFS scrub on drive 4 with the "repair corrupted blocks" option checked. It found 1 error and said it was uncorrectable.. I'm still seeing : Nov 9 00:00:42 Tower kernel: BTRFS error (device md4): parent transid verify failed on 87638016 wanted 25935 found 25928 Nov 9 00:00:42 Tower kernel: BTRFS error (device md4): parent transid verify failed on 87638016 wanted 25935 found 25928 Spamming the log and at this point am hoping someone else has dealt with this in the past and has some suggestions.. Thanks tower-diagnostics-20201108-2357.zip
  12. Today being Sunday I was doing weekly housekeeping on my server and noticed that 2 directories on my server were suddenly empty!! The directory /downloads and TV/TV are empty, every file, sub folder etc GONE! I have no drives in a degraded state, no errors showing on any of the drives, I'm currently running a parity check to be sure but highly doubt it will find anything as one just ran on Nov 1ST and everything was fine. I have only 3 user accounts on this server, the root account, a backup account and a generic account. I control the passwords to all 3, no one else knows them and they are 16 char, randomly generated passwords so again, not something someone could get easily.. No other data is missing or affected, just those 2 directories, The only apps on the server that access them are sabnzbd, radaar, sonaar and delugevpn, I use the binhex containers for all of the above but they also have access to a number of other directories that are fine.. I access the /TV/TV from my PC as well as 3 MiTV boxes in my house, they all use the generic account with a password to access it and I rarely access the /downloads directory from my pc only. I'm not sure what to make of it at this point, it's VERY odd it's everything inside these 2 directories but the directories themselves are still there and just fine and the free/used space remains exactly the same as if the files were still there. I was watching TV last night until after midnight so I KNOW at least some of it was still there at that point. Is there a log or some other journal that tracks file deletions that I am unware of that might show what happened? Thanks, and sorry for the long rambling post I'm really scratching my head at this point as to what the heck happened and frankly slightly nervous about starting anything back up again until I figure out exactly what the heck happened! **EDIT** I can see that at least some, but hopefully all of the files are actually still there if I use krusader and look at the individual disks.. I'm just not sure why the directory within the share shows empty or how to actually fix it so that the files that are on the individual disk start showing within said directory on the actual share.. **EDIT 2** I see the log is filling up with : Nov 8 17:13:48 Tower kernel: BTRFS error (device md4): parent transid verify failed on 87638016 wanted 25935 found 25928 Nov 8 17:13:48 Tower kernel: BTRFS error (device md4): parent transid verify failed on 87638016 wanted 25935 found 25928 Nov 8 17:13:48 Tower kernel: BTRFS error (device md4): parent transid verify failed on 87638016 wanted 25935 found 25928 Just being spammed, I assume this means something happened to device md4, after I figure out which device md4 is would a btrfs scrub command fix the issue if anyone knows? I'm a bit hesitant to just start trying stuff. While I do have backups of everything, it would be a REAL pain to restore 12TB+ of data...
  13. Is anyone else having an issue with the container not letting their drives spin down anymore all of a sudden? Sometime between now and the last update I noticed that my drives were no longer spinning down. After spending the better part of this weekend trying to track it down I've discovered that it is this container. If I spin my drives down manually, after about 35 to 50 seconds the container will make a write to the array in a specific order and size every time.. A few seconds later the remaining drives will spin up and it will make another write. If I spin them down again the exact same thing happens.. This is a fairly new issue as I have always in the past been able to spin the array down without issue.. Did something change? Anyone else seeing the same behavior?
  14. The RTL8117 on that board is for management only, using Asus's Control Center software.. You should be using the intel nic as the default network connection (eth0) on your setup.
  15. Nevermind, I figured out what the issue was. User error =(
  16. Fixed!! As soon as I ran that command it started running as it should.. Thanks a bunch!!
  17. Here is the results of the docker exec command: -rw------- 1 root root 5654 Feb 6 13:15 /ddclient.conf The run command when I update the container is: root@localhost:# /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='ddclient' --net='bridge' -e TZ="America/Los_Angeles" -e HOST_OS="Unraid" -e 'PUID'='99' -e 'PGID'='100' -v '/mnt/cache/appdata/ddclient/':'/config':'rw' 'linuxserver/ddclient' 38e494fbb43ced3a2a5262aba904fc95d8592d23dc2ec4ac14c0b47c68f5b5a3 I've tried completely removing and reinstalling a couple times now and end up with the same results every time.
  18. Loving the container, I've had it up and running for awhile now, but today I noticed that something has broken. I'm not sure exactly when it happened but I do know that I updated a domain this past weekend and ddclient updated its ip so it was working as late as Sunday. However my log is now filled with: ErrorWarningSystemArrayLogin readline() on closed filehandle FD at /usr/bin/ddclient line 1130. stat() on closed filehandle FD at /usr/bin/ddclient line 1117. Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1118. readline() on closed filehandle FD at /usr/bin/ddclient line 1130. WARNING: file /ddclient.conf: Cannot open file '/ddclient.conf'. (Permission denied) I'm not sure how the permission on the config file was changed and I'm really unsure of how to fix it, as linux command line is not something I have a lot of experience with.. Thanks!
  19. Thanks for the heads up!! Never even thought to look at the github page for some reason lol..
  20. Hi, not sure if this is the right place. I installed the container "nut influxdb exporter" and it points to this post as the support post. Anyway I'm trying to use it to bring in my UPS stats and have run into a problem. My UPS is a Cyberpower OR2200PFCRT2Ua. When I install the container if I delete the entry for WATTS the container works but the data is incorrect, it shows a usage of only about 44W, when the UPS front panel indicates the load is actually 172W. In NUT I have to configure the setup as: UPS Power and Load Display Settings: Manual UPS Output Volt Amp Capacity (VA): 2200 UPS Output Watt Capacity (Watts): 1320 If I do this then in Unraid, all the UPS information is displayed correctly on the dashboard. However if I enter 1320 into the WATTS entry of the container it instantly stops after starting and displays the following error message: [DEBUG] Connecting to host Connected successfully to NUT [DEBUG] list_vars called... Traceback (most recent call last): File "/src/nut-influxdb-exporter.py", line 107, in <module> json_body = construct_object(ups_data, remove_keys, tag_keys) File "/src/nut-influxdb-exporter.py", line 85, in construct_object fields['watts'] = watts * 0.01 * fields['ups.load'] TypeError: can't multiply sequence by non-int of type 'float' So to get accurate data I need to enter the WATTS info but then the container doesn't like it. If I omit the Watts info the container runs but reports the wrong info. Any help is appreciated and sorry if this is perhaps the wrong thread... *EDIT* As an aside I did some digging, my UPS is reporting ups.load as 14. If I do the math in the last line watts(1320) * .01 * ups.load (14). I get 184.8W. The front panel is reporting 185W currently. So the math is right, it just appears that maybe one of the entries isn't seen as an actual number for some reason..
  21. I think what he was saying is, now that he is back on 6.6.7 he checked and the queue depth is 1 on 6.6.7 as well. Which means that the speculation that NCQ in 6.7 might be part of the problem with that release would be incorrect since for him queue depth was 1 for both 6.6.7 and 6.7 but he has no issues with 6.6.7. Or at least that's how I read what he said anyway.
  22. After pulling my hair out for the last week looking for what I originally assumed was probably a network issue, I found this thread which describes the issue I'm having exactly. My system is a dual xeon 2650 setup with 96GB of ram, dual LSI2008SAS2 cards, two cache drives connected to the onboard sata controller intel c600/x79 chipset in raid1. Mover is currently configured to run hourly as my cache drives are relatively small @ 120GBs for the number of users within my household (8). I was already planning to jump to a 1TB nvme drive but guess I may need to seriously consider downgrading as my wife's identical twin lives with us which means WAFx2 is a major issue! 😱 Is there anything major to look out for when downgrading?
  23. Thanks, any idea why I am just now seeing this message? Since I've never seen it before I would assume that means it actually was working up until yesterday when I first started seeing these messages.
  24. I'm suddenly having a very strange problem with my cache drive and not really sure what the cause could be. This morning while checking logs/updating containers etc.. I noticed the following entries: Jan 3 18:00:36 Tower kernel: print_req_error: critical target error, dev sdd, sector 230729786 Jan 3 18:00:36 Tower kernel: BTRFS warning (device sdd1): failed to trim 1 device(s), last error -121 Jan 3 18:00:36 Tower root: /etc/libvirt: 920.8 MiB (965472256 bytes) trimmed Jan 3 18:00:36 Tower root: /var/lib/docker: 8.6 GiB (9238429696 bytes) trimmed The first line doesn't concern me to much, I've seen it ever since I first set unraid up and assume it's just a bad sector on the ssd (it's over 4 years old at least). The second line however is new and concerns me. I did a google search and found a post by someone from December and the consensus seemed to be check your cables. All my drives are in hotswap bays on my Supermicro server so I powered the system down, swapped the drive into a new bay and powered it back on. The problem remained. What's interesting is that even though it says trim failed, the next two lines below that appears to show that it actually did complete the trim. I'm really not sure what the problem could be and was hoping someone more experienced with the inner workings of unraid could point me in the right direction. Thanks, diagnostic info is posted below... tower-diagnostics-20190103-1850.zip
  25. Yes actually, I did read them. Neither of them really explain the massively random difference in transfer speeds that I see. I mean 22MB/s vs 108MB/s is a pretty wide margin is it not? You would not be wondering if there was something seriously wrong if you frequently saw a swing like that moving files around? By the way, that first transfer was my kids movie folder, so all fairly large files. That second transfer as you can see is music. Obviously much smaller individual files and yet its FASTER!