Garbonzo

Members
  • Posts

    57
  • Joined

  • Last visited

Everything posted by Garbonzo

  1. I have been sticking with MacVlan for some time now, but am wanting to switch to IPvlan to see if it stops the once-a-month or so crash I am having... I don't have the skill currently to figure out WHY the crash is happening, but I do keep getting this message telling me to switch to IPvlan. Back when the issue (about macvlans) started, I read through the "help" and it was pretty confusing and complicated. Since I wasn't having problems, I put it on the back burner. Now I am trying to switch and see that there might be some issues with which versions of what docker I use, and so on... plus my mediastack is a custom network, and the latest instructions I just read through said to put anything needing to be proxied in "BRIDGE" so I am wondering where I am gonna land there. I guess, if anyone has good current info on just making the switch that might come in handy, I would appreciate it. -G
  2. Disk5 started throwing some errors while I was out of town for Xmas. It has been emulated since then, and I am going to try to move forward with it today, but may not have time to open it up and move some drives around (the 2x 14tb drives I added just last month run hot where they are mounted without another fan, so I want to move them a bit). But I am not sure if I should move the contents of DIsk5 to the mostly empty DIsk6 and then just replace Disk5 later, but at least have the array back operating in the green... or remove and re-add Disk5 and run some test/diags to make sure I am ok... This seems to happen whenever I have a hard drive problem.. errors of any kind really, I just don't feel like I know where to START or the best way to BEGIN reading/trying to understand what is going on... I understand there are many variables with harrd drives. But is there any resource that will give me some direction like: Look "here" to determine type of error, or something.. then based on that, probable reasons WHY they occurred... Also, I have many of these shucked 8tb Seagate drives from Costco (SMR) that are problematic for multiple reasons.. Several that are removed because of various "reasons" over time, but they are working without issue in other devices (botrh in TrueNAS Scale and a 5bay usb enclosure using drive pooling) Other than doing a pre-clear, how can I check to see if one of these drives would be BETTER to replace Disk5 with on a temporary basis until I can get ANOTHER new drive to replace it with. The NEW Seagate 14tb drives that Costco had turned out to be the newer dual actuator Exos 2x14 drives. A step up for sure! (but they do run even warmer than the SMR drives). So whenever they get them back in stock, I am planning to replace ALL of the 8tb's over this coming year... as I CAN (hopefully NOT as I HAVE TO). Anyway, every time I have an issue, I post the diags and get my problem solved, but never feel like I have learned how to figure anything out for myself moving forward.. Obviously I appreciate the help, but would like to be moving toward self sufficiency for these kinds of situations.. So two related questions: best way to handle the "x" disabled drive TODAY. best way to understand why/what is cause behind the failure (I suppose) TIA, -G ezra-diagnostics-20231226-0540.zip
  3. Somehow the Windows Server VM that is connected to these terrible drives for the purpose of a second "backup" had it's network discovery turned off. I need to better understand the plugin anyway, especially the common script in general, its over my head atm tbh, and I really would love a way to delay the mount until the VM is online... that is something I have to do manually every time I restart the server, and would LOVE to automate! But sincere thanks for helping me focus in to find the problem I was having with the mount... it surprised me that the plugin searched and found the server, and share and let me set it up again, but I guess to be fair, when I pinged the EzraWin server it showed the IP address, just didn't return any pings, so I guess DNS resolved the ip, but I am shocked it took the username/password and showed me the mountable shares... but really, thanks!
  4. Ok, back home and here is the new diags. I really do appreciate you taking a look! ezra-diagnostics-20231210-1702.zip
  5. yeah, sorry, that was my bad, I forgot I turned that on when trying to figure this out when I first noticed it a few days ago after the update... I will get on that and post back soon, thanks for the quick reply though!
  6. I have been using UD without issues before updating from 6.12.4 to 6.12.6 but have not been able to mount anything since. The log shows this happens ever few seconds... here are my diags ezra-diagnostics-20231210-1212.zip I really don't know where to start and haven't been able to google anything particularly helpful on my own, so asking for help from those that know... TIA -G
  7. Yep, I actually had forgotten that, thank you... and the way you describe doing things seems the most practical as well... last time I was swapping a dead drive, this time I will have that peace of mind about having the old drive... So thanks for the help on that.... was thinking I may just add the second 14 to the array and leave the other drives so I have more room to move things around for the time being. There currently is no parity2, I had 2 in the past, had to pull one for another use, sorry if I was confusing
  8. Sorry to reuse a solved issue, but it is the conclusion (hopefully) and maybe will help someone else. I have replaced a parity drive in the past; I followed a video from Spaceinvaderone and it went fine, as far as I recall... Here is where I am today. I finally bought 2 x 14tb drives (to start upgrading past the 8tb drives I am using. I am using shucked drives again, but I have my reasons for that.. -unfortunately. So I was originally thinking I'd replace the parity and either add or replace one of the other drives (so I can maybe use the 8's I pull in my truenas experiment). I realize that it is still just basically "a spin of the wheel" as to which drive will die first, but if there is any insights the log can bring as to which main drive to pull (besides the parity) from the array, it is over my head, so any assistance is welcomed. I hate to @ people, but @JorgeB has been helpful in the past but didn't want to PM in case this info helps someone else find their way to a solution of sorts... Seems like I should be able to leave the 8tb parity and add one of the 14tbs as parity2 then remove the first parity drive (although if parity2 stays named that way, it will mess with my OCD for sure But I don't recall if there was a reason that wasn't/isn't the way to go. Then either just add the 14 to the main array and leave or remove the WORST 8tb offender. <-please advise which that "could" be I will re-watch the videos I did last time (or newer ones if they are out there) prior to attempting this, but since the two drives are going to finish pre-clearing today, I may have time to shuck them and put them in this evening... just wanted to get some experienced perspectives on the best approach if I am missing something obvious... Again, I am sorry to throw this out there in such a vague way- but I will be doing some more research prior to pulling the trigger, but if I can't get to it tonight it will be another week or more probably, so I was trying to preemptively get some assistance about order of operations, etc. TIA! -G ezra-diagnostics-20231116-0854.zip
  9. Right! I guess I had to just process the fact that having 2 parity drives protects against 2 failures INCLUDING one of the parity drives (which could happen to me). SO, as long as it covers that case, it would be the upside I need right now. thanks again
  10. Ok, these were just the only drives that showed the agno = x x so I wasn't 100%... but if that looks OK, I just wanted thoughts adding a second parity drive (it would be another similar age/used drive) to cover me a little more... unless there is some downside in doing that (other than my having to dismantle my truenas test box, but thats fine)... it should give me extra protection until I can physically replace the bad drive with a larger/better drive.... thanks for everything guys, really! -G
  11. I realize this is like 5 yrs old, but considering the drive issues I am dealing with (you are helping me currently actually) I thought something like this might be a good idea. I was wondering if there was a specific tool for unraid that you recommend for creating and reconciling the checksums. TIA
  12. Sorry about the formatting (or lack of) Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2- using internal log - zero log... - 09:18:57: zeroing log - 521728 of 521728 blocks done - scan filesystem freespace and inode maps... - 09:19:00: scanning filesystem freespace - 32 of 32 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - 09:19:00: scanning agi unlinked lists - 32 of 32 allocation groups done - process known inodes and perform inode discovery... - agno = 0 - agno = 15 - agno = 30 - agno = 31 - agno = 1 - agno = 16 - agno = 2 - agno = 17 - agno = 18 - agno = 3 - agno = 19 - agno = 4 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - 09:19:22: process known inodes and inode discovery - 135040 of 134944 inodes done - process newly discovered inodes... - 09:19:22: process newly discovered inodes - 32 of 32 allocation groups done Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - 09:19:22: setting up duplicate extent list - 32 of 32 allocation groups done - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 12 - agno = 13 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 3 - agno = 4 - agno = 16 - agno = 14 - agno = 5 - agno = 15 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - 09:19:22: check for inodes claiming duplicate blocks - 135040 of 134944 inodes done Phase 5 - rebuild AG headers and trees... - 09:19:27: rebuild AG headers and trees - 32 of 32 allocation groups done - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... - 09:19:45: verify and correct link counts - 32 of 32 allocation groups done done So I can assume there is some other flags or something I need to include, but nothing sticks out to me looking at that... other than some thing being ambiguous like "verify" instead of "verifying" and "rebuild" instead of "rebuilding"... It seems like it may be telling ME to do that (but some have timestamps, so it appears it is what the process is doing... -g
  13. I had just ran it without the -n. this was the first run after, next time I can take it down I will re-run it again and see what it looks like. thanks, I will try to get to it tomorrow. -g
  14. as a side question, was there anything (besides studying the logs on a regular basis) that would have clued me into the disk4 situation? You have always been helpful with pointing out this kind of thing whenever I post a diagnostics, but I feel like maybe I should have seen a warning (similar to hdd temps or something) but I could be wrong. Anyway, thanks for pointing it out, I think its fixed, I just wanted to know how to catch these things if/as they occur. I am not sure if that is just diligence (scanning the log for a key word or two on a regular basis) or some other plug-in or setting I am missing... that type of question.. cheers, -g EDIT: Ok, so not OK, disk4 still shows (after removing the -n) : Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - 10:29:34: zeroing log - 521728 of 521728 blocks done - scan filesystem freespace and inode maps... - 10:29:36: scanning filesystem freespace - 32 of 32 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - 10:29:36: scanning agi unlinked lists - 32 of 32 allocation groups done - process known inodes and perform inode discovery... - agno = 30 - agno = 15 - agno = 0 - agno = 16 - agno = 31 - agno = 1 - agno = 17 - agno = 18 - agno = 2 - agno = 19 - agno = 3 - agno = 20 - agno = 4 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - 10:29:56: process known inodes and inode discovery - 134912 of 134816 inodes done - process newly discovered inodes... - 10:29:56: process newly discovered inodes - 32 of 32 allocation groups done Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - 10:29:56: setting up duplicate extent list - 32 of 32 allocation groups done - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 8 - agno = 12 - agno = 15 - agno = 6 - agno = 7 - agno = 1 - agno = 10 - agno = 9 - agno = 11 - agno = 13 - agno = 14 - agno = 2 - agno = 5 - agno = 4 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - 10:29:56: check for inodes claiming duplicate blocks - 134912 of 134816 inodes done No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... - 10:30:13: verify and correct link counts - 32 of 32 allocation groups done No modify flag set, skipping filesystem flush and exiting. I really want to understand... I need to learn more about file systems (that aren't NTFS) as well... I need to find some good videos on this type of stuff... as reading understanding some of this stuff takes me alot longer to grok the situation. Based on other threads, unsure if there are additional flags I should use for the xfs_repair OR if I need to rebild (that may only be if the drive is disabled, this one is mounting fine atm), so ALSO - ORDER OF OPERATIONS (considering replacing the parity is playing into this game) Again, a sincere thanks for the help. (as a side note, disk5 has some AGNO's listed as well, but disks1-3 seem fine) -g
  15. I was talking about the 55 listed in the error column on the array devices tab, but looking back at my email alerts from the previous month with errors, there were 3000+ so it was a different "event". I REALLY thought I had replaced parity when I pulled the pairity2 drive that was in there... It may be a week or 2 until I research (and get a little more coin together) something like 16tb+ enterprise drives, at the minimum a single drive to start with parity. SO, in the meantime, I think I will pull 2 of these 8tb drives from the Truenas box and get them running pre-clears in unraid... then rebuild them as parity/parity2 (not sure if I can do that at one time)... unless replacing disk4 is a better option (although these are gonna be in that 3+yr range as well) as it looks like it was disk5 I replaced earlier this year... I'd love to be on like 4 x 20tb drives by end of year if possible, so I can feel SOLID about this media server, and use the other drives to learn new stuff with... I just have so many issues to resolve on this unraid server... this macvlan to ipvlan has been confusing for me, as by the time I had read and re-read how to setup ipvlan with multiple nics, only to have that seem to be fixed in 6.12.4 (yet fix common problems still gripes), but mainly I need to track down why accessing this server seems SO LAGGY for the last few months... from wired and wireless connections... its possible that its network related, but learning wireshark at the same time has got me moving VERY slow to figure out what is going on... Here is all I REALLY know: when I first upgraded/migrated from a dual xeon X5680's with 72gb ram to a AMD Ryzen 7 1700 w/32gb seemed to perform at a similar "feel" to the old server for a few months... then got worse... I always planned to upgrade to a 5000 series cpu (never even intended on the 1700, it just came with the MB I bought) but I don't want to throw money at a problem atm if that isn't the main issue... I will start the correcting parity check now, and move on from there... oh, and file system on disk4 and extended SMART test @JorgeB recommended as well, and I guess post the extended SMART results would be req'd Again, thanks for your help, I will report back and/or mark solutions afterwards. -G
  16. I am on the 6.12.4 release. First, I am using shucked Seagate 8tb drives from Costco (bad habit I started years ago) I replaced Parity and Disk4 earlier this year. I had a bad cable off my HBA that led to some CRC errors on a few drives, but that resolved when the cable was replaced... (I believe, since they haven't increased since) Now I am getting a 187: Reported uncorrect error that iirc can be a bad sign. My plans are to start moving to larger/enterprise drives (or at least NOT more SMR drives) starting with parity... but I haven't had the time + money to sit down and research deals. (so if anyone has GOOD info on that front, TIA!) So, for now I am trying to determine the best path forward... I have another 8tb or three sitting in a old dell t7500 I am messing with Truenas Scale on, but they are as old or older than the others and have many of the CRC errors from the bad cables mentioned, but haven't increased since moving them, and can certainly be pulled to replace the parity drive here (or add a second)... whatever is the best course of action.... that is what I am really asking I guess... what to do, what to do.... As a second (related) question, when I had a bad shutdown (last night) and after like 20mins I hard reset and it started a parity check... that is all normal behavior. However, the last 2 times it has done a parity check it has found errors (today like 2476) it doesn't tell me anywhere I see that the errors were fixed, do I need to do anything (like start a parity check manually with the box checked). But that error count of 55 confused me (not sure if that is new errors since the 2476 or in addition to them... just need some clarification on that part (might be helpful to be more explicit if they are being repaired the first time OR that you need to do something after it is completed. I guess I am wondering if I have all of the errors STILL because I didn't correct them last time this happened (first time I got errors on a parity check) or if these are NEW errors since that time. And here is the diags: ezra-diagnostics-20230923-1552.zip I really appreciate the help, and thanks in advance. -G
  17. Thank you so much, you are always so helpful... I really am trying to put in the time to learn how to do basic things like this... appreciate the help. -G
  18. I have a similar issue, and am trying to learn HOW to look at the diagnostics or figure this type of thing out without a gui. I hate having to ask for help every time I run up against an issue (which lets face it, I am using shucked SMR drives, so they come up more often than I would like)... But as with many things in a community like this, help is case by case, post diagnostics etc.. DirsyncPro is giving me an i/o error/unable to analyze /mnt/sourcefiles... and saying "structure needs cleaning"... I am attaching a log/diag here even though this is someone else' solved topic from 3yrs ago... I realize that is probably terribly improper, but I don't yet understand if the point is just to fix my issue or make it easier for the next person with the problems by keeping it in one place... So guidance here is appreciated as well. as always, thanks in advanced -G ezra-diagnostics-20230809-1129.zip
  19. Thank you. And I am pretty sure I knew that about pre-clearing ssd drives, but this was one that wasn't mountable last week... and you helped me with the problem that turned out to be memory timing issues, so I was pretty sure the drive was fine, but wanted to use it as another pool for something else... just thought I would check with those that know more... thanks again for the assist. -g
  20. Hi. I haven't paid attention to the log when pre-clearing an SSD.... but the pre/post read speeds are about what I would expect (consumer sata SK Hynix Gold S31 1Tb drive) but the actually zero-ing??? Wtf? Basically 30min > 80min>40min.... and it was complaining about temps (who'd complain about 69 ) 24-03-2023 02:40Unraid device dev3 temperatureAlert [EZRA] - device dev3 overheated (69 C)SHGS31-1000GS-2_FS0BN96451080C415 (dev3)alert 24-03-2023 01:08Unraid device dev3 temperatureAlert [EZRA] - device dev3 overheated (68 C)SHGS31-1000GS-2_FS0BN96451080C415 (dev3)alert But really I suppose my question is can that all be temperature? It went from 32 to 68 in about 25 mins of pre-read... I am trying to determine if the drive needs to be RMA'd or anything... its 15mos into a 5yr warranty period... Anyway insights appreciated! TIA -g p.s. That 8MB/s I am wondering if that is an average for the last 25% or just a instant-read?
  21. Oh, I had done that as well (with the default -n) when I started in maintenance mode... it had alot of stuff it could apparently fix... I just was hesitant to do it. So it seems like what is happening is the array is working without disk4, things that are on disk4 are just NOT AVAILIBLE, yet parity isn't throwing a fit nor emulating it, so I don't know how that is acting as it should... but I need to figure it out... I didn't want to commit any changes to the file system without making sure I was doing the right thing.. Anyway, while typing all of that, I went back and ran it without the -n and it fixed things and mounted the drive fine... now I will figure out the bad parity drive situation. Thanks again. -g
  22. Everyone was helpful with my cache drive issue. Now for a problem I somehow just created. I rebuilt my disk5 a few days ago. The new drive is good. I then made the foolish mistake of pulling the power plug of (what I though at the time was) the OLD disk5... turns out the nice easy to read labels I affixed to my drives somehow(?) got borked and It was actually parity2... So that happened. I then shut down because I wasn't understanding how a drive was showing in unassigned devices that was physically unplugged. I know that sounds confusing enough, at that point disk1-disk5 (5 having been replaced and the old disk5 not erased but visible in unassigned drives) were all present and parity2 had a red X and like 2065 or so read errors.. at that point I hadn't realized that it was mislabeled but needed to reboot anyway... All of the sudden disk4 is UNMOUNTABLE. To be fair, disk4 was on same wire to psu as drive I stupidly removed, OLD disk5. The drive being onwhich frankly was the reason in my head why parity2 was throwing errors at that moment... turns out it was much simpler, I had unplugged it. So now I am in this boat. The old disk 5 is for sure bad. But I think disk4 is something simple, and parity2 can probably be fixed/rebuilt, but really both of those drives have crc errors from a bad cable situation I believe to be resolved... but they are older as well... either way, I need disk4 back online working correctly asap.. I can then sort parity2 out (or fly with 1 until I get more reliable hardware). So I am attaching diagnostics at this point, but I need to get disk4 (which isn't showing as emulated with an X) working (and ideas about parity2 fix appreciated) TIA (again) -G ezra-diagnostics-20230316-0435.zip
  23. ok, I can't see a way for me to move it (though mods probably can) should I just delete and repost when I have time tonight? I should have looked at where I was posting better I suppose. Thanks. -g edit: actually I hadn't looked, but it is now in VM Engine, which isn't where I posted, so I assume it was moved by someone who knows where it should go... so thanks!
  24. Hi, this may have been asked/answered but I didn't have luck searching... I think the title says it all, but basically I have a VM with a manual primary vdisk named... it changes names when I change the name of the server (this doesn't seem to happen if it is in a location other than where it "would be" in auto.. but if it is in the "auto" location, even set to manual... it seems to change... it's not a big deal, I was just trying to rename a vm with a "-old" to create a new one named the same as original, and realized it switched the "manual" designation to "auto" and renamed the folder in the domains directory.. I repeated and got same result... so I then manually used a different folder (that didn't match vm name) and this didn't happen. resolving it was simple enough, I'm just curious if this is a choice that was made for "reasons" or just an oversight... doesn't seem like it should if it is "manual" TIA -g
  25. yeah, they generally would stay on (they do now, so....) just as they would if I move them inside. I just didn't want to blow away my ntfs drivepool on a whim... figured I would talki about it public a bit first... ALSO, on the hardware that I just moved my unriad from (a dell t7500, 2x xeon x5680 w/72gb ecc ddr3) I was considering playing with Truenas Scale... but honestly I have an raspi3 or an old notebook (built in ups ) I could through OMV on also... but doing it in the main box just seems like a "fun" way to go... I am still gonna mess with Truenas and other stuff bare metal on the dell, so ideally I will be just using some smaller drives in there to tinker with.. I have a 2nd 5bay with smalller drives... the "backup" one totals 30tb the other one only 11-12tb... Thanks again @JorgeB, you just helped me on my cache drive issue yesterday (working fine after moving to new cache drive (the other drive is likely fine, just corrupted from my memory being oc'd initially, I maybe should pre-clear it before using it elsewhere) -g