Jump to content

JT Marshall

Members
  • Posts

    21
  • Joined

  • Last visited

Posts posted by JT Marshall

  1. I added "1,200" raw read error rate and raw write error rate to my SMART attribute notifications per something I read on here.  During my parity check I'm now getting read error rate warnings for (sdo).  First rate was 327,693 and then 30 minutes later 3,473,409.  I'm not entirely sure I know what that means but it doesn't seem good to be an order of magnitude higher 30 minutes later.  

  2. 25 minutes ago, itimpi said:

    I hope you find it useful.   The plugin has been providing this sort of functionality ever since Unraid 6.7 and the core Unraid system still has some way to go before it catches up with the full functionality offered by the plugin.  
     

    if you encounter any issues using the plugin feel free to bring these up in the plugin’s support thread.

    Much appreciated! I have it installed and I’m excited about never seeing parity during the day again!

  3. 16 hours ago, ChatNoir said:

    No misunderstanding from your part.

    6.10.3 suffered a regression bug on this new feature. It will be fixed in the next release. That would probably be 6.11 that is currently in open RC.

    That’s too bad.  That feature was part of the reason I was excited about 6.10. I didn’t know about Parity Check Tuning so I’ll give that a try. 

  4. 4 hours ago, itimpi said:

    If you do not want to try the 6.11 rc then use the Parity Check Tuning plugin to manage this instead as it will work on the Unraid release you are currently running.  You also get additional functionality that you can invoke with the plugin that might be of use.

    This is perfect. Thanks for the recommendation. 

  5. Well another month has gone by and my parity still is not running incrementally.  Once parity starts it continues to go until completion.  What am I doing wrong? I feel like this should start a parity check at 10pm on the first day of the month, run for 5 hours pause and do it again the next day.  Repeat until complete.  Am I misunderstanding this?

     

    Screen Shot 2022-08-02 at 8.04.30 PM.png

  6. 3 minutes ago, itimpi said:

    No idea on the scheduling, but I notice that you have it set to be correcting.  
     

    We normally recommend that scheduled checks are set none-correcting so you do not have a drive that is misbehaving inadvertently corrupting parity.    Then if you do get errors reported you can try and work out why and only when you are sure all hardware is behaving itself manually trigger a correcting check.

     Discrepancy noted, and corrected. 😉 

  7. I recently changed my Parity Schedule to run under custom conditions.  I wanted it to run every other month on the first Saturday of the month and only to run for 5 hours a day until parity is complete.  Here is how I set it:

     

    Scheduled parity check: Custom

    Day of the week: Saturday

    Week of the Month: First week

    Time of the day: 00:00

    Month of the year: January, March, May, July, September, November

    Write corrections to parity: Yes

    Cumulative parity check: Yes

    Accumulation frequency: Daily

    Accumulation duration: 5 hours

     

    Today is the first day for the run and as of 10:55 my scheduled parity is still running.  My understanding was that my parity would pause at 05:00 and resume daily until parity is complete.  Are my assumptions incorrect or is this a bug buried in the custom schedule?  In the interim I will switch my schedule to monthly and see if that works. Thoughts?

  8. 8 hours ago, bonienl said:

    Delete the file "sensors.conf" on your flash device in the folder /config/plugins/dynamix.system.temp

    I looked in my /boot folder with 

    find -iname "sensors.conf" -exec echo {} \;

    and found nothing.  /boot/config/plugins doesn't have a dynamix.system.temp folder, I believe because I removed that plugin as well. I checked in /etc and found a sensors.conf in /etc/sensors.d/ and removed that, rebooted and now I have no Airflow!

    THANKS! Your guidance was helpful.

  9. So, I added the Dynamix System Autofan plugin which added the Airflow section to my Dashboard.  Ultimately the plugin didn't work out for me so I removed it.  Now I have the Airflow section orphaned on my Dashboard.  How do I go about removing the Airflow?207034100_ScreenShot2020-03-12at8_59_47PM.thumb.png.1a193616618e048ae9d342610e3154ff.png

  10. Okay, so this is one that I took last night and then again this morning.  It would appear that shortly after posting this the looping of the SAS card seemed to stop and the preclear progress picked back up. Maybe I'm just being overly cautious with the build but I really didn't understand what was happening.  Still would like to understand what it was that I was seeing.  Another symptom of this is that the drives that are connected to that SAS controller are not showing up in the Unassigned Devies section like their predecessors.  I know I'm throwing a lot of information out there and don't expect it to all be relevant.  

    bifrost-diagnostics-20191003-0558.zip bifrost-diagnostics-20191003-1319.zip

  11. Sorry if I'm not in the right place on this.  I'm a new UnRaid user and just "completed" my first UnRaid build.

    So I built this server:

    AMD Ryzen Threadripper 2920X 12-Core @ 3500 MHz

    Gigabyte Technology Co., Ltd. X399 AORUS XTREME-CF

    Nvidia Quadro P2000

    2 LSI SAS 3008-8i for connectivity to Hot-Swap bays

    It seemed to be running well for a few days until I started adding drives and running a preclear to test the new drives.  I recently noticed that 2 of the drives seemed to have stalled out at about 90% of the pre-read.  When I looked at the log I saw this:

    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: port enable: SUCCESS
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: search for end-devices: start
    Oct 2 21:17:43 FileServer kernel: scsi target12:0:2: handle(0x0009), sas_addr(0x4433221100000000)
    Oct 2 21:17:43 FileServer kernel: scsi target12:0:2: enclosure logical id(0x500605b000648d80), slot(3)
    Oct 2 21:17:43 FileServer kernel: scsi target12:0:3: handle(0x000a), sas_addr(0x4433221106000000)
    Oct 2 21:17:43 FileServer kernel: scsi target12:0:3: enclosure logical id(0x500605b000648d80), slot(4)
    Oct 2 21:17:43 FileServer kernel: scsi target12:0:4: handle(0x000b), sas_addr(0x4433221107000000)
    Oct 2 21:17:43 FileServer kernel: scsi target12:0:4: enclosure logical id(0x500605b000648d80), slot(5)
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: search for end-devices: complete
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: search for end-devices: start
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: search for PCIe end-devices: complete
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: search for expanders: start
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: search for expanders: complete
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: _base_fault_reset_work: hard reset: success
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: removing unresponding devices: start
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: removing unresponding devices: end-devices
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: Removing unresponding devices: pcie end-devices
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: removing unresponding devices: expanders
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: removing unresponding devices: complete
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: scan devices: start
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: scan devices: expanders start
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: break from expander scan: ioc_status(0x0022), loginfo(0x310f0400)
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: scan devices: expanders complete
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: scan devices: end devices start
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: break from end device scan: ioc_status(0x0022), loginfo(0x310f0400)
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: scan devices: end devices complete
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: scan devices: pcie end devices start
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: break from pcie end device scan: ioc_status(0x0021), loginfo(0x3003011d)
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: pcie devices: pcie end devices complete
    Oct 2 21:17:43 FileServer kernel: mpt3sas_cm1: scan devices: complete
    Oct 2 21:17:43 FileServer kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    Oct 2 21:17:44 FileServer kernel: sd 12:0:4:0: Power-on or device reset occurred
    Oct 2 21:17:44 FileServer kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    Oct 2 21:17:44 FileServer kernel: mpt3sas_cm1: fault_state(0x5862)!
    Oct 2 21:17:44 FileServer kernel: mpt3sas_cm1: sending diag reset !!
    Oct 2 21:17:45 FileServer kernel: mpt3sas_cm1: diag reset: SUCCESS
    Oct 2 21:17:45 FileServer rc.diskinfo[6400]: SIGHUP ignored - already refreshing disk info.
    Oct 2 21:17:45 FileServer kernel: mpt3sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
    Oct 2 21:17:45 FileServer kernel: mpt3sas_cm1: _base_display_fwpkg_version: complete
    Oct 2 21:17:45 FileServer kernel: mpt3sas_cm1: LSISAS3008: FWVersion(16.00.01.00), ChipRevision(0x02), BiosVersion(08.37.00.00)
    Oct 2 21:17:45 FileServer kernel: mpt3sas_cm1: Protocol=(
    Oct 2 21:17:45 FileServer kernel: Initiator
    Oct 2 21:17:45 FileServer kernel: ,Target
    Oct 2 21:17:45 FileServer kernel: ),
    Oct 2 21:17:45 FileServer kernel: Capabilities=(
    Oct 2 21:17:45 FileServer kernel: TLR
    Oct 2 21:17:45 FileServer kernel: ,EEDP
    Oct 2 21:17:45 FileServer kernel: ,Snapshot Buffer
    Oct 2 21:17:45 FileServer kernel: ,Diag Trace Buffer
    Oct 2 21:17:45 FileServer kernel: ,Task Set Full
    Oct 2 21:17:45 FileServer kernel: ,NCQ
    Oct 2 21:17:45 FileServer kernel: )
    Oct 2 21:17:45 FileServer kernel: mpt3sas_cm1: sending port enable !!

    This sequence of messages repeated over and over led me down the google path finding this Bug Link and this another Bug Link.  I'm wondering what I should be doing here.  Am I screwed?  Is this something I can fix or should I look into returning my HBAs?  I'm hoping for a little guidance here...

×
×
  • Create New...