clowncracker

Members
  • Posts

    109
  • Joined

  • Last visited

Posts posted by clowncracker

  1. 7 hours ago, JorgeB said:

    Log shows possible power/connections issues with parity and disk9, but they may be unrelated to the sync errors, should still be fixed though, you sure no unclean shutdown since last check?

    There has been no unclean shutdown since the last check.  Both of those drives are on power splitters though, not sure if that makes a huge difference.

    https://www.amazon.com/gp/product/B012BPLW08/ref=ppx_yo_dt_b_asin_title_o01_s00?ie=UTF8&psc=1

    Attached are my PSU specs.

    Just for my own knowledge, how did you identify the issues were disk9 and parity?

    image.png

  2. 1 hour ago, itimpi said:

    I can see quite a lot of 

    Power-on or device reset occurred

    messages in the syslog, but not sure what device(s) they refer to.

    Any way to check which device it is referring to?  That can be any drive or any cable, it's near impossible to troubleshoot.

  3. I think I found the issue, it looks like mover started running during the parity check.  I thought that couldn't normally happen, but that explains the writes to drive 3 and the parity drives.  I noticed the amount of free storage on my cache was increasing, so I paused the parity check and sure enough mover was running.  Once mover is done I'll resume the parity check and see if that solves the problem.

  4. I've started running my monthly non-correcting parity check and it was going smoothly until about an hour ago.  Now the estimated finish is in ~30 days.  It looks like a lot of writing is happening to disk 3, but I don't know what is happening:

    image.png.9ca98f455a608d3da45a0eeacdc01985.png

    image.png

     

  5. 2 hours ago, itimpi said:

    If you read @JorgeB answer he recommends a sync (which is effectively a correcting check) followed by a non-correcting check to make sure everything is now good.


    So I shouldn't be deselecting any drives then?  What is the functional difference between doing a sync by creating a new config and and just doing a correcting parity check?

  6. 4 minutes ago, JorgeB said:

    After a new config with a missing disk parity won't be valid, so you fir need to sync parity, then run a non correcting parity check

    Based on my comments what do you think about checking the parity drives?  Do you think it's worth the risk, especially since they are essentially brand new?

    Something else I'm curious about.  If I start the array with a new config and a disk missing, won't I just lose the data currently on the disk?  Since it isn't on the array and it won't be emulated since the disk doesn't exist in the new array.

  7. Do you think it's worth testing the parity drives even if they are brand new?  I wasn't having any issues until a drive failed mid-parity check (non-correcting), so it makes me think the parity drives cannot be the problem.  If I did want to test them, I should remove both parity disks and put in a new drive and rebuild parity?

    Part of me is concerned about rebuilding the parity disks if one of the data disks might be an issue.  If another drive fails during a parity rebuild, then at that point won't I have corrupted data permanently?


    If not testing parity, which disk do you think is most likely the culprit?  Disk 3 that was just recently replaced that started the whole problem?  Disk 4 that has some relocated sector counts? Or Disk 10, which is the newest disk in the array?

  8. 19 minutes ago, JorgeB said:

     

    Like this:

     

    I'm going to assume the parity disks aren't the issue, since they've both been replaced in past 4 months with brand new drives.  I'm going to test disk 3 that was just replaced (which caused all of these issues to begin with), disk 4 that has sector count issues and disk 10 (which is the newest drive in the array).

    So to confirm:
    1) Save a backup of super.dat (I'll just use the file from the diagnostics).

    2) Stop the array.

    3) Tools > New config, selecting all in the Preserve current assignments section.

    4) DESELECT DISK 3 - making sure Parity is Valid is UNCHECKED.

    5) Start the array.

    6a) Run a parity check, if there are issues I should restart the process with another drive.

    6b) If there are no issues, run another parity check to make sure there are no issues.

    7) Once I've identified the drive with problems, stop the array and rebuild it with a new disk.

    8) Run two parity checks to make sure there are no issues.

    image.png

  9. 7 hours ago, JorgeB said:

    Could also be controller or a disk, if it's a disk it can be a pain to find out which one, you basically need to test without one disk at a time, try the controller first if possible, and note that memtest not finding errors does not completely rule out RAM, if you have more than one RAM stick try with one at a time, also remember that after any change you need to run two checks, since the first one can still find errors.

    I honestly don't think it's a hardware issue (memory, controller, cables, etc).  It might be disk related, but how would I go about testing that?  Stopping the array, disabling a disk, starting the server and just running a parity check?  If that's the case how would I actually go about fixing the issue?

  10. On 11/2/2023 at 4:05 PM, trurl said:

    Looks OK.

     

    Make sure you make a backup of flash 

    Sorry for the delay, but I've been out of town for a few days.  I just started another parity check.  I've already found 7685 errors in the past 10 minutes, so I'm concerned there is still a problem.  I decided to cancel the parity check and restart my server.  The config still looks correct, I've attached is another diagnostics log.

    clowncracker-diagnostics-20231107-1044.zip

  11. 6 minutes ago, trurl said:

    You might check your flash drive in your PC to make sure it isn't corrupt.

    Flash drive looks fine.  Just plugged it in and it looks like there are no issues.

    I guess I could rebuild disk 3 again, but I am concerned with the number of parity errors it was throwing out yesterday when I ran it after the rebuild.  It also might have been throwing the parity errors because for some reason disk 3 wasn't recognized correctly after I did the Check Filesystem.  Maybe I should have done the Check Filesystem before I rebuilt disk 3?  It did show disk 3 only having 5TB after the rebuild, which is weird.  The screenshot for that is in the original post.