KRiSX

Members
  • Posts

    21
  • Joined

  • Last visited

Everything posted by KRiSX

  1. Updated my backup server this morning from RC8, all good! Bit more nervous about my main server, but I'll give it a go later in the week. It's on 6.9.2 (i keep it on stable) and running an Intel 11th gen w/ Plex so want to be sure there are no issues there before i go for it thanks for the release!
  2. drives are recognised based off serial number, so as long as the drives are reported the same way it should be fine
  3. I have since replaced my Adaptec controller with an LSI 9207-8i, so far so good as I had to move all my data and format all my disks again and I've recieved absolutely no errors so far including on the 10tb that gave me issues. I really feel this was a case of my Adaptec couldn't properly handle 10tb disks, but they are on the support list so who knows... either way I'm in a so far so good situation and will continue to monitor. The onlly weird thing I'm seeing now is the Fix Common Problems plugin is saying write cache is disabled on all my disks connected to this controller, but speeds seem fine to me (150-250MB/s), I had 6 drives running flat out doing moves this past week and was hitting 550-650MB/s across them so I think its a false report or it simply doesn't matter.
  4. So I went ahead and stopped the array and restarted it to see if that would change anything - it didn't. After finding some other threads referencing some of erros I'm seeing, I tried sgdisk -v which led me to running sgdisk -e. Neither of which seemed to do a whole lot. I fired up storman (adaptec docker) to check the status from there and noticed the drive was "Ready" instead of JBOD like it should have been. Instead of adjusting this as I was pretty confident I would destroy the data doing so, I connected the drive to my motherboard directly instead. Tested if the drive was mountable and it is. Tried including it in my array but it was unmountable and wanted to format, so removed it again and now I'm moving the data off it via Unassigned Devices. Have managed to move off a couple hundred gigs without issue at this point, so I really still don't know for sure what the issue is but at least data loss appears to be minimal (if any). After I get the data off I'll remove the drive and test it elsewhere.
  5. Hi all, I woke up this morning to 46 read errors on a brand new 10tb Ironwolf drive that successfully passed a full preclear. I'm not running parity, but I used preclear on the drive to give it a good test/work out before putting data on it. Due to some shuffling of data and drives I'm doing using unBALANCE, this particular drive is 98% full currently and I now seem to be getting errors with it. My first suspicion is a controller issue as it wouldn't be the first time I've had a bad time with my Adaptec with unRAID and I do have an LSI controller on the way, but right now I want to try and work out if I've got a faulty drive or if its my controller as I suspect. Diagnostics attached. Highlights are... kernel: blk_update_request: I/O error, dev sdt, sector 6497176416 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 kernel: md: disk16 read error, sector=6497176352 kernel: XFS (md16): metadata I/O error in "xfs_da_read_buf+0x9e/0xfe [xfs]" at daddr 0x183430b20 len 8 error 5 kernel: sd 1:1:27:0: [sdt] tag#520 access beyond end of device kernel: blk_update_request: I/O error, dev sdt, sector 10754744120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 kernel: md: disk16 read error, sector=10754744056 kernel: XFS (md16): metadata I/O error in "xfs_da_read_buf+0x9e/0xfe [xfs]" at daddr 0x281085ef8 len 8 error 5 kernel: sd 1:1:27:0: [sdt] tag#568 access beyond end of device kernel: blk_update_request: I/O error, dev sdt, sector 19327352984 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0 kernel: md: disk16 read error, sector=19327352920 kernel: md: disk16 read error, sector=19327352928 kernel: md: disk16 read error, sector=19327352936 kernel: md: disk16 read error, sector=19327352944 kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x480000058 len 32 error 5 I pulled down the diagnostics initially after I'd triggered an extended smart test, i then noticed the smart test was stopped by the host (host reset it says) and more errors appeared so I re-downloaded the logs... new items are... rc.diskinfo[11028]: SIGHUP received, forcing refresh of disks info. kernel: sd 1:1:27:0: [sdt] tag#678 access beyond end of device kernel: print_req_error: 4 callbacks suppressed kernel: blk_update_request: I/O error, dev sdt, sector 11491091352 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0 kernel: md: disk16 read error, sector=11491091288 kernel: md: disk16 read error, sector=11491091296 kernel: md: disk16 read error, sector=11491091304 kernel: md: disk16 read error, sector=11491091312 kernel: XFS: metadata IO error: 21 callbacks suppressed kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5 kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5 kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5 kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5 kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5 kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5 kernel: sd 1:1:27:0: [sdt] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB) kernel: sdt: detected capacity change from 0 to 10000831348736 kernel: GPT:Primary header thinks Alt. header is not at the end of the disk. kernel: GPT:19524464639 != 19532873727 kernel: GPT:Alternate GPT header not at the end of the disk. kernel: GPT:19524464639 != 19532873727 kernel: GPT: Use GNU Parted to correct GPT errors. kernel: sdt: sdt1 I'm going to try triggering an extended test again now. Hopefully I'm right and I'll have these issues solved by replacing the controller, but I need to work it out for sure if possible. Thanks UPDATE: second extended test failed, same message as above "kernel: GPT:Primary header thinks Alt. header is not at the end of the disk." - I still strongly suspect its a controller issue, but would love some feedback newbehemoth-diagnostics-20220307-0811.zip
  6. Found this thread as I'm trying to decide on buying an LSI HBA to replace my adaptec 6805, but I've got 12 x ST8000VN004's (and another 5 x 6tb ironwolf drives)... So I'm starting to think I'll just stay put. Everything has been pretty solid for me except the other day where my controller had a fit and killed my parity (it happened while I was extracting a bunch of zip files and copying large amounts of data to the array). I'm currently running without parity until I decide what I'm doing (ran without for years on drivepool so I'm not bothered ultimately). That said, this seems to only really be a problem for people that spin down drives? I've generally got all my drives spinning 24/7, so perhaps it won't be an issue?
  7. Yeah, I believe I can get a HP H220 but I'll have to flash it myself. Shouldn't be too hard
  8. after my adventure yesterday with my adaptec having a heart attack, I think I'm going to swap it out for an LSI and I'm aiming for a 9207-8i... however googling now has be paranoid about fakes - any opinions on these listings? https://www.ebay.com.au/itm/175133386946?epid=1385797553&hash=item28c6c368c2:g:pEwAAOSwPP1h9NgJ&frcectupt=true https://www.ebay.com.au/itm/185266212380?epid=25025684119&hash=item2b22ba0e1c:g:Y6gAAOSwLHFh6-tx&frcectupt=true https://www.ebay.com.au/itm/143372493429?hash=item2161aaa275:g:1TAAAOSwAUZhb8B2&frcectupt=true either that or I play it safe and go for something like the SilverStone ECS04 which I believe I'd then have to flash, but at least I'd know its legit... didn't realise this was a problem in the world, but seems to be plenty of talk about it!
  9. Yeah I've seen that seems to be the go from what I've seen on here, have had this Adaptec for years and it's served me well.... I guess if it keeps giving me trouble I can look into an LSI Thanks for the replies
  10. Anything I can do, would it be worth adjusting those time outs or not relevant these days?
  11. Hey all, bit freaked out right now, I have been fast approaching finishing my build and transferring everything across from my old setup when I've just hit a whole boat load of errors on both parity drives and 2 of the drives that had data copying to them. I've stopped doing any transfers and don't want to touch anything until I know what to do next. Logs attached. Essentially I'm seeing a whole heap of "md: diskX write error" messages. I am now also seeing a lot of "Failing async write on buffer block" messages. For the record, all of these drives that now have errors showing have been in service for years without issue. I didn't pre-clear them, I just let unraid do its clear and then formatted them as it seems that is acceptable with known good drives. Hopefully this isn't the end of the world and I can simply resolve it with a parity rebuild or something along those lines. UPDATE #1: array appears to not be allowing any writes at this point either, going to stop all my docker containers and have kicked off extended tests on the 2 x 6tb's that have kicked up errors - have done a short test on the 2 x 8tb parity drives and they are showing as ok, maybe I need to do an extended test though? UPDATE #2: I've stopped the array as it was non-stop generating the same lines in the logs "Failing async write on buffer block" for 10 different blocks. When stopping the array also noticed this "XFS (md13): I/O Error Detected. Shutting down filesystem" and "XFS (md13): Please unmount the filesystem and rectify the problem(s)" - so perhaps disk 13 really isn't good like I thought? UPDATE #3: Restarted the array to see what would happen, array started, appears to be writable now, no errors being produced in the logs - parity is offline. Going to keep everything else (docker) shut down until the SMART tests are complete on the 2 x 6tb's unless someone advises me otherwise. UPDATE #4: Looking at the logs a bit harder, it seems my controller (Adaptec 6805) had a bit of a meltdown which is why I think the errors occured - I've since restarted the server which has cleared all the errors, but parity is still disabled. I'm going to continue running without parity until after the extended SMART test finishes on the 2 x 6tb's and at this point may just keep it disabled until I've finished moving data across anyway. I also ran xfs checks on each disk to be sure they were all healthy. Not sure there is much else to do apart from wait for scans to finish and then rebuild parity. Would still appreciate any feedback anyone may have I also found this article.... seems old... but I confirmed the timeout is set to 60... would changing it per drive as instructed cause any issue? https://ask.adaptec.com/app/answers/detail/a_id/15357/~/error%3A-aacraid%3A-host-adapter-abort-request newbehemoth-diagnostics-20220218-1114.zip
  12. fair points, too many variables to assume its just safe to go ahead... oh well, i'm 8 hours in with another 8 hours to go on the correcting check - will logs show what/if any files are affected or is it a case of if the files are there then you're all good? i'm assuming the latter and right now anything on here could be re-obtained without any hassle, i just want my system healthy overall
  13. ok thanks, i'll trigger that off now before i move more data over then, am I correct in saying all I need to do is hit the Check button on Main with the write corrections box ticked or is there more to it? is there a way to make it do this by default in the event this happens again? so i don't have to tie up my disks for 16+ hours twice
  14. Hey all, I had an unclean shutdown yesterday due to a power cut, UPS options either didn't work or weren't configured correctly - I'm not sure yet and will be testing and fixing it soon, but upon powering the server back on I of course had an unclean shutdown and a parity check which resulted in 116 errors, I just want to check if I need to do anything else in this situation? Looking at the syslog I see "Parity Check Tuning: automatic Non-Correcting Parity Check finished (116 errors)" Should I run parity check again with "Write corrections to parity" enabled or will this be fine? I didn't select or do anything to influence the parity check and everything seems to be working. I'm just in the middle of my transfer from DrivePool so want to be sure I'm good to keep going with something like this occuring.
  15. Just installed for the first time on 6.9.2 (new user) and it was going crazy after signing in. Have removed it for now based on this thread, oh well! Will keep an eye out for updates.
  16. thanks, sorry for the delay I've been busy clearing off disks in DrivePool so have only just had a chance to boot unRAID again. All appears to be ok at the moment, but I'm not sure if I should use spin-down or not at this stage. newbehemoth-diagnostics-20220128-1200.zip
  17. Update, Parity is built and I've restarted a couple times, stopped and started the array, even spun down disks (which annoyingly about 14 started up shortly after for some reason)... can't get it to break again... very odd, but seems stable now at least *crosses fingers*
  18. Hi all, I'm currently looking to build/transition to unRAID from my existing Windows DrivePool setup and had something very weird happen during parity build. Due to the large amount of data I'm dealing with I'm having to be quite careful doing this transition, so right now my unRAID setup consists of 2 x 8tb's for parity, 1 x 1tb and 1 x 750gb SSD's for cache and a single 1tb hdd just to get the ball rolling (will be removing/replacing as soon as I free up some of my other drives - DrivePool's "put everything everywhere" approach is making it harder than I'd like. Anyway, yesterday I started the lengthy parity build process. My server has about 30 drives in it all up, 24 of which are used for my main storage (including the 2 I'm now dedicating to parity). I have been hoping to take advantage of the disk spin-down to save on power as I've traditionally been running all drives spun up for years on my existing setup, so I had spin-down set to 1 hour. During the build process I checked back from time to time and did see the drives not in the array spin-down and thought to myself "awesome, that works exactly as I'd expect", until I noticed the main Parity drive was spun-down! This didn't seem right, but I left it running. I also noticed all the drives would spin back up from time to time despite not being accessed or part of the array. It seems that SMART is triggering them to spin-up based on what I see in the system log, more on this later. 8+ hours later the build finished and said all is good, i thought to myself that is highly unlikely, but ok if its wrong verify will fix it or maybe I'll just run a verify to be sure. At this point I hadn't added my cache drives, so I shut down the array to prepare those drives and add them. Upon restarting the array I was extremely shocked to see another 1tb drive I'd been playing with (so it was formatted in xfs but not in any array) had taken place of the main 8tb Parity drive! The array started and said all was ok when it clearly wasn't, it was after midnight at this point and I was about ready to ditch the whole thing, but I'm very keen to make the switch. So I shut down the array, formatted the 2 x 8tb's, performed a new config and started the build again - but this time, I set drive spin-down to Never. Going on nearly 10 hours with another hour and a half or so to go (slower than last time), parity is nearly finished building, however I am quite concerned that this will happen again so will need to do some testing before I go trusting my data to unRAID. I have a couple of theories as to what happened/what caused it but thought I'd get it all down and see what the experts think. My main theory is that due to the majority of my drives being connected via an Adaptec 6805 and Intel SAS Expander, the spin-down options simply don't work or play nice with the system. The drives are all individual JBOD and unRAID sees them just fine, but as mentioned earlier the drives on the Adaptec kept spinning back up for SMART checks but some other random disks I have connected to the motherboard SATA ports did not, they stayed spun down as expected. This to me says I shouldn't be using spin-down on the Adaptec perhaps? Another thing I've noticed is that after a reboot of the whole server, drives pop into the Historical Devices list when literally nothing has changed with them. I believe this is part of the Unassigned Devices plugin, but it does concern me because why would it think the drives have changed or been removed when they haven't? (I did see an update for Unassigned Devices late last night that mentioned drives being identified incorrectly or something along those lines - so perhaps thats fixed now, but thought I'd mention it anyway) Sorry for the long post, but I've been performing a lot of testing before taking the leap with this and I'm starting to wonder if I should continue or not. Appreciate and help or advice Thanks
  19. thanks for that, honestly nothing that important is stored on here, if I lost it I'd be upset but it wouldn't be the end of the world - for some of the harder to replace things, yeah thats what i use the 18's for currently so maybe i'll just stick to that and use 2 of my 8's for parity... it will be a long while before I'd go bigger than 8's i feel but I guess i can cross that road in future thanks for the reply
  20. Hey all, I'm currently planning out/deciding on migrating to unRAID from my old trusty DrivePool setup. I've just spent the last couple days moving to a new motherboard, cpu, etc which was a bit of a nightmare partly due to my own stubborness and Windows being Windows, but anyway I am currently trying to work out the best way forward and as much as I can get away with the devil I know. I started playing around with unRAID on the new board before swapping it out and I can certainly see why people rave about it. Just about anything I could think of, I found a solution for or another way to think about it and I feel that a platform like unRAID is more likely to work when upgrading hardware in the future without hours of hassle? On to the main point of my post... I currently have the following disks in my DrivePool... 5 x 6tb 12 x 8tb 6 x 4tb 23 drives total I don't currently run with any kind of duplication or parity, but I feel if I go to unRAID I probably should/may as well. I'm slowly working on removing the 4tb's and replacing them with 8's but for right now it all needs to stay in the box as I need the space (was about 7tb free but I'm cleaning up and should have about 15-20tb free by the time I move over). I have 2 x 18tb WD USB drives that I have used for cold storage of some data and while I know very well that parity isn't a replacement for backup, should I consider shucking these drives to use them for parity given the volume and size of the drives I have? Is 2 x 18tb overkill? Should I just use 2 of my 8tb's instead? I guess that is my main question. What would be the best parity setup for my setup? Keen for any feedback or advice! Cheers!