• [6.9.2] SMART settings wiped occasionally


    codefaux
    • Minor

    Hi there. Happy user of unRAID for an eternity. Well, ish, but it's usually hardware's fault, not unRAID's fault.

     

    Usually.

     

    I recently had to upgrade my motherboard, and in the process, the ordering of my disk controllers got shuffled. No big deal, I had to change the SMART Settings, 3ware-related values per-disk. I wrote a handy script to identify disks for me under the terminal, just have to retype each one...

     

    I entered each value, clicked Done, moved to the Identity section, verified it was the disk it was meant to be, then clicked the right arrow for the next disk. I eventually looped back to the start, as one does. When I scanned through them to verify the changes once more, they were mostly erased and back to default. Literally two disks did not reset themselves, for some reason. So, as I was changing them, it was (seemingly) occasionally eating the config file. I changed a few more, then went back, and they were gone again.

     

    I assured myself I would gather logs and submit them, and then ..life happened, and I don't have logs of the event in question. I'll gladly recreate it if required.

     

    At the mean time, I wrote a simple Bash one-liner, in order to;

     - scan three connected 3ware controllers, 12 disks per controller (my configuration)

     - output the disk serial number in the format used by unRAID to identify my disks

     - set up their tags in proper format to write into smart-one.cfg

     

    It is:

     - Stupid

     - Lowest-effort

     - Functional

    It is not:

     - Designed to be copy-pasted blindly by those not understanding exactly every word does adaptive in any way

     - Adaptive in any way

     - Likely to help many

     

    Sharing it in case anyone needs it. I would imagine not many do, but this might be adapted to work on other controllers, or just more intelligently in general -- honestly this is a "my own tool, works where I need it" sort of thing.

     

    for ctl in $(seq 0 2); do for dev in $(seq 0 11); do SER=`smartctl -a /dev/twa$ctl -d 3ware,$dev | grep Serial | awk  '{print substr($0,length-7,10)}'`; if [ $SER ]; then echo "[1AMCC_${SER}000000000000]"; echo smType=\"-d 3ware\"; echo smPort1=\"$dev\"; echo smDevice=\"twa$ctl\"; fi; done; done > /boot/config/smart-one.cfg

     

     

    I can't attach a relevant diagnostics.zip file right now because the error in question was actually kind of a long time ago and I forgot to keep it. If required, I will go about recreating it and attach a new diagnostics zip, or more appropriate logs if required/requested.

     

     

    EDIT: Is this also related to this flood of repeated messages? I get this SO MUCH, it literally 100% filled the log partition, it makes checking kernel logs absolutely useless, etc etc...

     

    [ 4430.813330] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.813566] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.813593] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.814564] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.814844] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.815029] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.815116] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.815293] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.815376] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.815538] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.815748] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.815804] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.816047] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.816068] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.816712] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.816777] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.816942] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.817006] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.817212] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.817290] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.817479] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.817603] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.817739] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.817911] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.822157] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.822451] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.822602] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.822627] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.822879] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.822917] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.822921] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.823180] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.825032] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.825312] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.825334] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.825590] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.832226] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.832556] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.832587] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.832880] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.834260] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.834525] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.834583] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.834874] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.836188] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.836488] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.836535] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.836822] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.837487] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.837752] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.837777] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.838064] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.838086] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.838346] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.838367] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.838625] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.839466] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.839732] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.839754] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.840013] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.840517] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.840780] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.840835] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.841065] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.841093] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.841335] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.841356] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.841613] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.843096] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.843203] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.843377] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.843397] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.843502] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.843527] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.843653] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.843811] 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.847624] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.848119] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4430.848178] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4430.848449] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4431.062903] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.
    [ 4431.063016] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
    [ 4431.071819] 3w-9xxx: scsi3: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x80.

     




    User Feedback

    Recommended Comments

    Don't know why you posted a bug report (without Diagnostics).

     

    This seems to be more about your RAID controller (not recommended for these reasons and more) than about Unraid. 

    Link to comment
    8 hours ago, trurl said:

    Don't know why you posted a bug report (without Diagnostics).

    Oh, I can help with that.

    1 - Because there's a bug.

    2 - Because unRAID's configuration management scripts are eating perfectly formatted, correct SMART configuration data

    3 - Because I don't have the Diagnostics

    4 - Because I thought there might be enough information here to actually figure out something, IE there's only a (relatively) small section of code responsible for handling the SMART Settings for the disks

    5 - Because my diagnostic.zip got lost, as I said, and because if it included the log directory it would've been over 128MB, as indicated by the full log partition.

     

    I apologize that the logic I used to arrive at seeking help was simply beyond your grasp, but now you know. Had you simply said, "We can't help at all without a diagnostic.zip" I would've been dubious but accepting. I'm curious, what was your logic in replying to a post seeking help, while also not providing anything that could be mistaken as helpful? Is this what unRAID support is? Is this guy important?

     

     

     

     

    8 hours ago, trurl said:

    This seems to be more about your RAID controller (not recommended for these reasons and more) than about Unraid. 

     

    Could you explain to me how my RAID controller is erasing the config settings from smart_one.cfg on the USB drive which isn't connected to it?

    Can you explain to me how my RAID controller is causing in-memory scripts to lose their contents?

    I'm deeply curious, because I'm using what I have. I don't care what's ideal, I don't have that and I can't afford it. Unless you wish to provide me with a suitable replacement suggestion (Four SAS slots per PCIe card, ideally an HBA) maybe let someone else weigh in. I know EXCEEDINGLY WELL why using RAID controllers sucks, I'm DEALING WITH IT.

     

    I don't care if they're recommended, they're "supported" and there's a bug with unRAID's configuration manipulation that is erasing the contents of its file. This is not "RAID controller related" any more than that I need to use the PROVIDED CONFIGURATION METHODS to adjust settings because my controller does not support direct polling of disks.

    Link to comment
    6 hours ago, trurl said:

     

    Thanks for the reference, but it covers up to eight disks on one controller. This does not support my use case, as I require a minimum of 12 disks per controller, preferably 16 -- as indicated by "Four SAS slots per PCIe card" in my post above.

     

    ---

    EDIT: Upon further review, the link -clearly- has options for higher port counts, but they are well outside my price range. An SAS expander and a smaller controller may be an option, but then I fear for my bandwidth, with 30+ disks. I accept that using a RAID controller in JBOD mode is not ideal, but it's the option I can afford - these controllers were $12 each, and provide up to 16 JBOD disks apiece via four internal SAS connectors. With proper smartctl flags, they even work reasonably well with unRAID. I now have a price goal for my next upgrade, but it's definitely gonna be a while. In the mean time, I'd like to fix what I've got.

    ---

     

    I'm also using SAS cabling (obviously, to those who know most of the technical terminology I've used) both internal and external to my casing, so a controller with SATA connectors simply won't cut it without me also having to buy eight 4x-SATA-host-to-SAS adapter cables.

     

    Furhtermore, I have neither the need nor the desire to change my controller and disk layout, nor do I have the finances. This worked before, it'll continue to work. In fact, while I'm mentioning it, this worked a few updates ago without kernel log flooding, so something recently changed.

     

    Oh! While I'm at it, thank you for being helpful, but I'd ask that unless you intend to be helpful WITH MY PROBLEM, please go away. There is a bug in unRAID that I intend to see addressed or at least acknowledged. I don't need to convert from RAID controllers to a direct HBA. My RAID controllers are in JBOD mode, that's the closest I can get to ideal where I'm at.

     

    "Buy something different" is not a support answer, it is not helpful, it is lazy and annoying and petulant. I get that it's not ideal, but there is infrastructure in place to make it work, and THAT INFRASTRUCTURE ISN'T WORKING. I'd like to get it fixed, rather than being avoidant and spending money I simply don't have, to acquire hardware I'm still unable to find in the first place. You obviously don't understand my requirements, my setup, my situation, or even the problem I'm attempting to address despite your deraliing.


    Please stop sidetracking my support thread. I don't need new disk controllers, I need someone to look into the issue I'm raising. Your first post was antagonistic, your second post was minimum-effort and didn't even begin to address the requirements I laid out in terminology you should understand if you're posing as a support agent on a forum for a storage-based OS. Your presence here is negative, I'm asking politely and professionally for help and you seem like a troll. All I want is for my fileserver to run properly.

    Edited by codefaux
    Further review, clarification
    Link to comment

    I apologize for being/sounding like an ass. I'm dealing with a lot of crap right now (including some issues with a hardware upgrade last night, thus not being able to provide diagnostics.zip presently) and I really just wanted someone to address the problem listed above -- unRAID seems to eat some/all of the contents of an important config file at some point when changing lots of entries one at a time.

     

    As a coder, if someone tells me "there's a bug with X" in my project, sometimes I can find it just looking through the area carefully, knowing it misbehaves. I understand that may not be the case here, and I understand internal devs may not be willing to DO that without further information. It's reasonable to say "We need the diagnostics.zip to proceeed" as a dev.

     

    I'll stabilize this stack of crap and reproduce the bug again (if I can, entropy hates me lately) and happily send the zip along. I don't intend to be unreasonable, I just anticipated any efforts to help would be toward the resolution of the bug, not toward changing everything I've got and spending money to avoid it.

    Link to comment

    Do you need to be on 6.9.2 for the new motherboard? Which vers was it working on previously? Is it an option to downgrade to a different unraid vers in the short term?

     

    I have looked at the smartctl code and it is still using ioctl for 3ware so not sure those messages in the log can be suppressed unless smartctl is updated by the owners, this would be outside of Limetech's control.

     

    Current diags will be helpful, but if you can provide at present are you able to post smart-one.cfg, disks.cfg 

    Link to comment
    40 minutes ago, codefaux said:

    unRAID seems to eat some/all of the contents of an important config file at some point when changing lots of entries one at a time.

    This is a known issue, there are already multiple bug reports about this, e.g.:

     

    Link to comment
    1 hour ago, SimonF said:

    Do you need to be on 6.9.2 for the new motherboard? Which vers was it working on previously? Is it an option to downgrade to a different unraid vers in the short term?

     

    I have looked at the smartctl code and it is still using ioctl for 3ware so not sure those messages in the log can be suppressed unless smartctl is updated by the owners, this would be outside of Limetech's control.

     

    Current diags will be helpful, but if you can provide at present are you able to post smart-one.cfg, disks.cfg 

    The new motherboard has only run 6.9.2 and up until today I was experiencing significant instability -- the same instability I upgraded motherboards to escape, much to my dismay. I may have found the issue (c-states) and a workaround (disabling them) and expect to be stable enough to provide the diagnostic.zip if this is the case -- and if it's even still required (see below, apparently it's a known issue anyway.)

     

    I could downgrade to an older version of unRAID, especially if it will quiet the logspam because JUST WOW. I don't remember which version I was running before these issues cropped up, is there one you might recommend?

     

    Also, for what it's worth -- my smart-one.cfg is *likely* to be absolutely useless as a diagnostic, because I wrote it myself from scratch with the bash snippet in my first post, using clever code to scan my connected controllers and match them to named entries etc etc.. If it's wrong, it's because I wrote it wrong. It seems to be working though, except for still spamming log messages every several minutes. Attaching it regardless, as well as disk.cfg -- I assume you meant disk.cfg and not disks.cfg, correct?

     

    smart-one.cfgdisk.cfg

     

     

    41 minutes ago, JorgeB said:

    This is a known issue, there are already multiple bug reports about this, e.g.:

     

    Fantastic, but to be fair the single referenced bug report (of seemingly multiple bug reports, I understand) reports specifically that they could not set warning/critical temperature, not that other settings were also being reset. Perhaps someone should ask the poster or a mod to clarify the title and/or contents of that report, if it's an issue, as the title and contents directly exclude my issue due to their level of specificity and also not mentioning other settings disappearing. Thanks to my additional, seemingly extra report, the scope of the problem is a bit more clearly defined now, so I believe it has merit. Is there disagreement?

     

    EDIT: Hey also, if this is a "known issue" -- is there some manner of note somewhere? Did I miss a "Known Issues" thread or something? This could have saved me a considerable amount of effort and time, where should I look for your Known Issues in the future? I think I just figured out another problem I'm having (as mentioned) and would love to browse your Known Issues list. Maybe in the future I won't have to post at all, because this time I looked and didn't find anyone with SMART settings clearing EXCEPT the one you linked, which I already knew about but ignored because it seemed like it could be a __different__ issue..

    Edited by codefaux
    Link to comment
    4 hours ago, codefaux said:

    is there one you might recommend?

    It would be 6.8.3 as I believe the issue started with 6.9, but will depend on hardware, i.e. if your new MD needs a newer kernel etc.

     

    4 hours ago, codefaux said:

    I assume you meant disk.cfg and not disks.cfg, correct?

    Yes

     

    If you went straight to 6.9.2 from previous vers, then you could look at the changes file in the previous dir as below.

     

    root@unraid:/boot/previous# cat changes.txt 
    ### Version 6.8.2 2020-01-26

    Link to comment
    13 hours ago, SimonF said:

    It would be 6.8.3 as I believe the issue started with 6.9, but will depend on hardware, i.e. if your new MD needs a newer kernel etc.

     

    Yes

     

    If you went straight to 6.9.2 from previous vers, then you could look at the changes file in the previous dir as below.

     

    root@unraid:/boot/previous# cat changes.txt 
    ### Version 6.8.2 2020-01-26

     

    Thank you very much. I'm using a pair of Xeon X5670 CPUs -- nothing I have is 'new' enough to need modern support..frankly it's become clear that it's quite the opposite, in my case.

     

    Looks like 6.8.2 lives in the /previous folder. My understanding - and I'm being paranoid-cautious about this because of how much time I've blown on this so far - is that I should back up the existing files to, say, backup-6.9.3 and then copy the files from /boot/previous into /boot and all should be well, correct? Will I need to downgrade plugins? I assume there's no settings migration required, but obviously a full-USB backup beforehand would be wise. 

     

    Am I missing anything?

    Link to comment

    If you click on the flash drive from the Main window you will have an option to backup flash.

     

    image.thumb.png.74e7f8386fd325900ac955d5038dec03.png

     

    in tools update OS you can revert to previous vers by clicking restore.

     

    image.thumb.png.a276e5dc089a5d0af573a04cae1168f4.png

     

    The other thing you will need to do is put the cache drive back if you have one. As part of 6.9 the cache config gets moved into the multipool config files.

     

    
    Reverting back to 6.8.3
    
    If you have a cache disk/pool it will be necessary to either:
    
    restore the flash backup you created before upgrading (you did create a backup, right?), or
    on your flash, copy 'config/disk.cfg.bak' to 'config/disk.cfg' (restore 6.8.3 cache assignment), or
    manually re-assign storage devices assigned to cache back to cache
     
    
    This is because to support multiple pools, code detects the upgrade to 6.9.0 and moves the 'cache' device settings out of 'config/disk.cfg' and into 'config/pools/cache.cfg'.  If you downgrade back to 6.8.3 these settings need to be restored.
    
     

     

    With regards to plugins depends which ones you use. Have you installed any new ones like Nvidia etc as they only work on 6.9

    image.png

    Link to comment

    Oh, so much fantastic information, literally half an hour after I just jumped the gun and did as I said. Haha... I got home and had crashy stuff again, the whole docker macvlan thing which I've now discovered ISN'T my hardware dieing slowly but is infact a third issue I've had with the new 6.9 series, so I jumped RIGHT on reverting.

     

     

    The backup/revert options would have been nice, and I probably should've seen them, but I actually did just move the files around on the flash drive. It seems to have survived.

     

    You're right, Cache didn't re-mount after restart, and thus Docker didn't start, but I was able to figure that out pretty quickly since my Cache disk was listed next to my Docker disk in Unassigned devices. I just re-added it and started the array, nothing caught fire so far.

     

    I WAS using the Unraid-Kernel-Helper from Community Apps --

     

    I'm not sure if this is one of the "new ones like Nvidia etc" which you reference, but after a quick read it seems I should still be able to use it -- please let me know if I'm wrong.

     

    Thanks!

     

     

    Link to comment
    16 minutes ago, codefaux said:

    I should still be able to use it

    I know it used to support 6.8.3 but not sure if that has been depreciated now as plugins on 6.9 can install modules.

     

    What parts did you enable in @ich777plugin?

    Link to comment
    2 minutes ago, SimonF said:

    I know it used to support 6.8.3 but not sure if that has been depreciated now as plugins on 6.9 can install modules.

     

    What parts did you enable in @ich777plugin?

    I actually cannot seem to find how to install the Unraid Kernel Helper template from CA again, so it seems that may be the case.

     

    Honestly, I only remember installing the Docker container template, allowing it to compile, then copying the requisite files from the container to the boot partition. I don't think I used the plugin, or adjusted any options much.

     

    All I needed it for was NVidia support for the Docker, though. It appears there's still a prebuilt 6.8.3 available on that page, so I'll likely just use that and call it good enough -- unless that's not a good approach.

    Link to comment
    On 4/28/2021 at 7:04 AM, SimonF said:

    It would be 6.8.3 as I believe the issue started with 6.9, but will depend on hardware, i.e. if your new MD needs a newer kernel etc.

     

    I finally got around to re-configuring my SMART values after reverting days ago. Turns out the 6.8.x smart-one.cfg stored by diskX/parity/cache, and starting 6.9 disks are stored by disk ID (in my case, 1AMCC_etcetc as seen previously) so my handy one-liner only works with 6.9+.. -- almost everything is normal.

     

    After re-writing all of the configurables for SMART, I'm still getting one instance of the SG_IO message sporadically, instead of one instance per disk for 31 disks..

    image.png.e79af75d323042b4e3784a5e6a45712c.png

     

     

    As you can see it's not all that often. I've attached a diagnostic.zip this time, since it isn't over-filled with useless garbage.

     

    I SUSPECT the single message is related to my single unassigned device, but it is also confgured correctly and reporting in the webUI so that also feels unlikely.

    image.thumb.png.15218cef368ed5cdb65b321f2cbfb249.png

     

    Honestly this is less of a "how do I fix this" and more "I really hope someone can explain this behavior because I am so curious" -- does anyone have a guess as to what's handled differently, internally? The unassigned device seems like the only one that would be handled out-of-band, yes?

     

    Clearly I can ignore that level of spam. I just like knowing why too. Naturally curious, and all that.

     

    Beyond that, no further needs. Thanks a ton guys.

    codefaux-tower-diagnostics-20210430-1739.zip

    Edited by codefaux
    Forgot to attach the damn ZIP derp
    Link to comment
    7 hours ago, codefaux said:

    I just like knowing why too. Naturally curious, and all that.

    Looking at the logs etc is /dev/sdp an Unassigned Disk? Smart is failing for it so likely that is the one producing the error. 

     

    smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build)
    Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

    Smartctl open device: /dev/sdp failed: AMCC/3ware controller, please try adding '-d 3ware,N',
    you may need to replace /dev/sdp with /dev/twlN, /dev/twaN or /dev/tweN
     

    Link to comment
    57 minutes ago, SimonF said:

    Smart is failing

    I mean, it probably was when I first booted or some such, but it shows in the webUI as valid.

    See below, lol

     

    image.thumb.png.b836706f12f9db33f0feaacd1dd19aeb.png

     

     

    And under Identity;

    image.thumb.png.0f7699f8ef5995f12dde70d196de6a45.png

     

     

    Interestingly, in the webUI, there's nowhere to put SMART configuration data for the Unassigned Device. I put it into smart-one.cfg by hand, using a little script I had written to reference which drives, and guessed that 'sdp' was the header it would look under given the URL patterns matched the smart-one.cfg header patterns... So, the webUI is using the information I gave it which it wasn't designed for, but something else in the system...isn't?

     

    I've uploaded a new diagnostic, taken between screenshots with the SMART data showing in the webUI. Huh.

     

    Well, that turned out to be more interesting than I expected. Thanks for humoring me.

    codefaux-tower-diagnostics-20210501-0127.zip

    Edited by codefaux
    Link to comment
    2 hours ago, codefaux said:

    SMART configuration data for the Unassigned Device.

    What do you get if you run

     

    cat /sys/block/sdp/queue/rotational 

     

    UD pre 6.9 tries to spin down every 15 mins for spinners using hdparm. It looks at the value above to get if ssd or spinner. I am guessing because the raid card may be showing as a spinner value 1 then it is trying to spin it down.

     

    Also it runs the following to get drive temp. Which I think its likely cause of logs.

     

    /usr/sbin/smartctl -n standby -A $dev | /bin/awk 'BEGIN{t=\"*\"} $1==\"Temperature:\"{t=$2;exit};$1==190||$1==194{t=$10;exit} END{print t}

     

    But isnt doing the override for the controller with -d .......

     

    do you get an error for hdparm -S180 /dev/sdp

    Edited by SimonF
    Link to comment
    16 hours ago, SimonF said:

    cat /sys/block/sdp/queue/rotational 

     

    root@Tower:~# cat /sys/block/sdp/queue/rotational
    1

    You are correct in this much, sir. The RAID controller is definitely reporting it as a spinner, despite it being distinctly non-rotational.

     

     

     

    16 hours ago, SimonF said:

    /usr/sbin/smartctl -n standby -A $dev | /bin/awk 'BEGIN{t=\"*\"} $1==\"Temperature:\"{t=$2;exit};$1==190||$1==194{t=$10;exit} END{print t}

     

    Manually running that (replacing $dev and removing everything after the pipe) I get an error, but no new log entry.

    root@Tower:~# /usr/sbin/smartctl -n standby -A /dev/sdp
    smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build)
    Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
    
    Smartctl open device: /dev/sdp failed: AMCC/3ware controller, please try adding '-d 3ware,N',
    you may need to replace /dev/sdp with /dev/twlN, /dev/twaN or /dev/tweN

    I'd paste the kernel log around that time, but it's difficult to capture a lack of a change, lol. I also ran it with proper parameters (/dev/twa1 -d 3ware,7) and it returns normal data, still no kernel log message.

     

     

     

     

    16 hours ago, SimonF said:

    hdparm -S180 /dev/sdp

    root@Tower:~# hdparm -S180 /dev/sdp
    
    /dev/sdp:
     setting standby to 180 (15 minutes)
     HDIO_DRIVE_CMD(setidle) failed: Invalid argument

    Well that gave AN error message in the kernel log, but not the one I'm seeing spammed.

    [181527.479504] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.

     

     

     

     

    Link to comment

    "/usr/sbin/hdparm -C $dev 2>/dev/null | /bin/grep -c standby"

     

    is used for checking is the drive is spun down.

     

    Try uninstalling the UD plugin just to see if the errors go away.

    Link to comment
    On 5/2/2021 at 12:08 AM, SimonF said:

    "/usr/sbin/hdparm -C $dev 2>/dev/null | /bin/grep -c standby"

     

     

    Two lines at the exact same time, both a different message than the logspam.

    [364448.237758] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
    [364448.238036] 3w-9xxx: scsi2: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.

     

     

    On 5/2/2021 at 12:08 AM, SimonF said:

    uninstalling the UD plugin

    If you mean the Unassigned Devices plugin, I'm using that. Stores some of my Docker container volumes. I could likely work around it temporarily if you really suspect that's the cause.

    Link to comment

    I just checked your poll timer on disk settings and it is set to 5mins. Are you able to change say to 10mins to see if the log entries happen every 10mins?

     

     

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.