Jump to content

rsbuc

Members
  • Posts

    26
  • Joined

  • Last visited

Posts posted by rsbuc

  1. Hello! I was having an issue before where my Incremental parity checks were not reading the disk temperatures correctly when the disks had spun down (they were reporting "=*".

     

    I have updated to the latest version of the Parity Tuning Script, and now the script doesn't appear to be collecting/detecting the disk temperature at all anymore.

     

    here is a snippet from the syslog (with Testing logs enabled)

     

    ***

     

     

    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR begin ------
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR /boot/config/forcesync marker file present
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR manual marker file present
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR parityTuningActive=1, parityTuningPos=886346616
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR appears there is a running array operation but no Progress file yet created
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR MANUAL record to be written
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Current disks information saved to disks marker file
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written header record to  progress marker file
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written MANUAL record to  progress marker file
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Creating required cron entries
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Created cron entry for scheduled pause and resume
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Created cron entry for 6 minute interval monitoring
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR CA Backup not running, array operation paused
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... no action required
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR global temperature limits: Warning: 50, Critical: 55
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR plugin temperature settings: Pause 3, Resume 8
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   array drives=0, hot=0, warm=0, cool=0, spundown=0, idle=0
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Array operation paused but not for temperature related reason
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR end ------
     

    ***

     

    the parity check tuning clearly shows "Warm=0, Cool=0, Spundown=0" but there are several disks above 55c.

     

    and heres a screenshot of the disk temps in the webui.

     

    (thanks again for reading this message)

    2023-03-11_diskTemps_Screenshot 2023-03-11 133428.png

  2. 2 hours ago, itimpi said:

    The plugin always treats the case where the temperature is returned as '*' due to spindown as 'cool' so there needs to be something else going on.   I will see if I can work out what it is from the syslog you provided.

    No worries, I appreciate the effort, if you'd like more info let me know.

  3. On 12/16/2022 at 8:16 AM, itimpi said:

    If you think the plugin is not correctly resuming when drives cool down, then perhaps you can try turning on the Testing level of logging in the plugin and sending me the resulting logs as that will allow me to see the fine detail of what the plugin is doing under the covers.   Testing the temperature related stuff is extremely tricky as my systems do not suffer from heat issues so I have to artificially try to set up tests to simulate temperature issues.

    I've finally had a few mins to test this out with the TESTING log mode enabled. I think you were hinting at what I've seen.

     

    When the array goes into 'overheat mode' and the parity check pauses, the disks eventually spin down and the temperature value in the log goes to "Temp=*" instead of showing an actual Temperature value, so the Parity Check Tuning script doesn't see a valid numerical temperature value to resume the parity check process.

     

    after waiting ~12minutes, I manually clicked 'spin up disks' and then 6minutes later the parity check process resumed as it was able to see the temperature values when the disks were spun up.

     

    I'm attaching my syslog.

    syslog.txt

  4. 2 hours ago, itimpi said:

    This is the basic behaviour as long as the time is within the overall time slot set for an increment.   How long a temperature related pause will last depends on how quickly your drives cool down to reach the resume temperature threshold.  The plugin will take into account if you have set specific temperature threshold settings at the Unraid level on a drive over-riding the global ones.  You may find the Debug logging level helps with a basic understanding of what the plugin is doing without having to know too much detail of the underlying mechanisms being used.

     

    Once you get outside the time slot for the overall increment then the plugin will pause the check and the temperature related pause/resume will stop happening (until the time comes around to start the next increment).

     

    If I can provide any further clarification then please ask.  As a new user if you can think of items I could add to the built-in help that would have helped you then please feel free to suggest them.

    Interesting, I've enabled Debug logging, and that totally demystifies a lot of what the plugin is doing (Thanks for that). Here is what I'm seeing (I'm sure I have a bad setting or something) -- I start the parity check, it runs for an hour or so, then the hard drives hit their temperature limit, and the parity check pauses. The drives spin down, and the drives cool off, but the plugin doesn't seem to resume the parity operations.

     

    If I "Spin up all disks" it will detect the drive temperatures as being cool again and resume the parity check.

     

    are there special disk settings that I need to enable for this to work properly?

    (also, thanks again for trying to helping me out!)

    Screenshot 2022-12-15 152350.png

    Screenshot 2022-12-15 152505.png

  5. 17 hours ago, itimpi said:

    I think you are over-thinking this!     You only want to set the increment pause/resume times to define the maximum time period you want the parity check to potentially run.

     

    You then set the temperature related pause resume values and as long as you are within the increment period the plugin will pause/resume the check based on disk temperatures.    You may also want to have aggressive spin down times on the drives as experience has shown that simply keeping them spinning even if no I/O is taking place significantly extends the cool down time.

    Hello! Am I understanding this correctly? The plugin will pause the parity operation when the disks reach the temperature threshold and wait until the temps fall below the temperature threshold value -- then the script immediately resume the parity operations? or will it only attempt to resume after the 'Increment resume time' schedule?

  6. Hey Everyone! I've been trying to get the "Increment Frequency/Custom" working for what I need, but I'm struggling.

     

    I have cooling issues with my Unraid, and what my goal is to allow the the Parity Check to 'Pause when disks overheat', then have the Custom Increment frequency pause the Parity operations for ~30mins to let the disks cool down, and then resume (or at least check if the disks are cooled down enough) and then Resume parity operations.

     

    Clearly my cron skills are weak, is there an "Increment Resume Time" and "Increment Pause Time" that someone can suggest?

     

    (thanks again for all the awesome features in the Parity Check Tuning plugin!)

     

  7. Write errors during array expansion/rebuild

     

    Hey guys, first let me say that I've been an unraid user for several years, and I've had to visit the forum on numerous occasions, and have usually found solutions/similar problems to my issues (thank you all).

     

    But this time i haven't found an identical issue.

     

    I am running Unraid 6.0-rc3, in my Norco 24bay chassis (with some off the shelf Intel/asus cpu/motherboard)

     

    Here is what has happened so far.

     

    I decided to swap out disk17 in my unraid to upgrade its size, i was going from a 6tb to an 8tb drive. The 8tb drive i had pre-cleared ~6times or so without any issues.

     

    I stopped the array as i normally would, i removed disk17, waited ~30secs, installed the new 8tb disk into the disk17 position, waited ~30secs for the drive to be detected.

     

    Selected the new 8tb disk in the unraid UI (in the disk17 position), and selected 'Rebuild/Expand' and started the array.

     

    The array began rebuilding, and the following morning (Today) the rebuild had completed, but instead of all of the disks having a 'green ball' next to it, Disk17 had a 'gray triangle'. Checking the syslog, i found a bunch of 'Write Errors'.

     

    I stopped the array, reseated disk17 in the array, and rebooted unraid.

     

    Unraid started, but disk17 still has a 'gray triangle' on it - and in my dashboard view, i have a 'Data is Invalid' warning at the bottom.

     

    What should i try next?

     

    Attached is my syslog and a couple screenshots

    syslog.zip

    Capture1.JPG.6e326edd6cdbbf5424dd9ce5e19bd24a.JPG

    Capture2.JPG.1ad64e4843271a772baefb81fa5ec6ae.JPG

  8. That is correct, my initial plan was to replace an existing 4tb disk with a 6tb disk, but numerous write errors caused the expansion/rebuild to fail.

     

    When i reseated the 6tb disk, i tried to restart the array with the same 6tb disk that had write errors, the array started with that drive showing "unformatted" and it started to rebuild the parity disk for the array with disk19 "missing" (since it was showing unformatted).

     

    I didnt catch it until it already started to rebuild parity.

     

    So i stopped the array, replaced disk 19, with a fresh 6tb disk, formatted it, and started the array (parity) rebuild.

  9. Hey guys, just figured i'd update my issue, incase anyone has it in the future.

     

    I stopped the array, reseated the drive, and started the array again. Unraid detected the drive, but it detected it as "Unformatted". So it rebuilt parity for the array, with a missing/blank Disk19.

     

    I stopped the array rebuild, replaced disk19 with a new disk, formatted disk19, and rebuilt the array again (which regenerated the parity with a blank disk19).

     

    Everything appears to be fine, but i will have to copy the data from the old disk19 to the new disk19.

     

    Thanks again for all the help!

    rsbuc

  10. Sadly no, i didn't run a pre-clear on it (i think i learned a valuable lesson about skipping the preclear).

     

    i can't seem to run a SMART test on it, when i 'spin up all drives' every drive except disk19 goes 'green ball', but disk19 stays as a gray triangle.

     

    When i try to run a SMART test on it, it says that it needs to be spun up (but disk19 will not spin up).

     

  11. Hey guys, i've been running unraid for quite sometime, and i couldn't be happier, its been great.

     

    My system info is.

    Unraid V6.0-rc3

    24bay norco chassis, 21disks

     

    I tried to replace one of my existing 4tb drives (disk19) with a 6tb drive (which I've done numerous times before), but this time, when i restarted the array to start up the array expansion, it started the rebuild, but when i checked on it a couple hours later, the rebuild stopped, and the newly replaced disk has a gray triangle on it, and my 'Parity status' is 'Data is invalid'. The array is started, and its serving data fine.

     

    When i check the logs i see...

     

    ****

     

    Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#4 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 7a 08 00 00 04 00 00 00

    Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042160648

    Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00

    Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#5 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 76 08 00 00 04 00 00 00

    Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042159624

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164680

    Dec 17 11:22:38 UNRAID kernel: md: md_do_sync: got signal, exit...

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164688

    Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164696

    Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#6 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 72 08 00 00 04 00 00 00

    Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042158600

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164704

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164712

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164720

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164728

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164736

    Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164744

    Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#7 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee aa 08 00 00 04 00 00 00

    Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042172936

    Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164752

     

    ****

    and the "disk19 write error, sector=....." continues for numerous pages.

     

    it looks like Disk19 went bad during the expansion, what is my suggested next step here?

    Am i safe to stop the array, remove 'disk19' and replace it with another 6tb drive?

    Will it rebuild correctly?

     

    Any help would be appreciated!

  12. Hey BDHarrington, in my attempt to troubleshoot my issue i ended up disabling all of my plugins, but i still had strange/random lockups about once a week (followed by a lengthy consistency check).

     

    I considered dropping my system memory down to 4gb (From 16gb) and doing a fresh install without any plugins, but just as i was about to remove the dimm's version 6.0b3 got its release, so instead i installed 6.0b3 and its been working great, Previously my uptime wouldn't be longer than a week without a strange lockup; now its 64days of uptime.

     

    on a side now, i have since moved my plex service to a vm on my hyper-v host. I couldn't be happier with v6.0b3!

  13. Thanks guys for all the replies, i thought it might have been the 'Openssh' plugin since that was the most recent one that i installed, but after 4 days of running with Openssh disabled, the same Out of Memory errors popped back up.

     

    I then tried Tony's suggestion (echo 65536 > /proc/sys/vm/min_free_kbytes), which seems to be limiting the maximum amount of memory that linux is using (before the change, unraid would hover around 12GB of memory usage, now its hovering around 7GB).

     

    But the problems are still here, i'll try to disable more plugins to see which one is causing my pains.

     

    unless anyone has any other ideas?

  14. Hey guys, first let me say thanks for all the help, I've been able to search through the forums for most of the other

     

    issues I've encountered and found enough help to resolve my issue.

     

    My setup is as follows

     

    unRAID Version: unRAID Server Pro, Version 5.0

    Motherboard: ASUSTeK COMPUTER INC. - P8Z77-V LK

    Processor: Intel® Core i3-2120 CPU @ 3.30GHz - 3.3 GHz

    Cache: L1-Cache = 128 kB (max. 128 kB)

    L2-Cache = 512 kB (max. 512 kB)

    L3-Cache = 3072 kB (max. 3072 kB)

    Memory: 16384 MB (max. 32 GB)

    4096 MB = BANK 0, 1333 MHz

    1333 MHz = 4096 MB, BANK 1

    1333 MHz = 1333 MHz, 4096 MB

    BANK 2 = 1333 MHz, 1333 MHz

    4096 MB = BANK 3, 1333 MHz

    1333 MHz = ,

    Network: eth0: 1000Mb/s - Full Duplex

    Uptime: 4 days, 7 hrs, 5 mins, 15 secs

     

    Running plugins

     

    Native Plugins

    OpenSSH

    Plex Media Server

     

     

    Unmenu Plugins

    Plex

    bwm-ng

    htop

    lsof

    unraid status alert email

    monthly parity check

     

     

    As of recent (maybe over the past 2-3weeks) i'I've been having an issue while playing back media through Plex Media Server, where the content will start to stream, and then after a few minutes, the stream will stop, when i try to

     

    restart it, Plex Media Server is no longer running.

     

    I've checked my syslog, and it shows an 'out of memory' condition followed by it killing PMS

     

    ****

    Dec  3 13:53:46 STORSERV kernel: Out of memory: Kill process 11279 (Plex Media Serv) score 2 or sacrifice child

    Dec  3 13:53:46 STORSERV kernel: Killed process 11279 (Plex Media Serv) total-vm:316424kB, anon-rss:33892kB, file-rss:11112kB

    ****

     

    I'm sure something is leaking memory but i'm not savvy enough to be able to identify which plugin it is?

     

    I've tried the SwapFile plugin without success, and I've tried re installing some of plugins without any change either.

     

    I can usually just restart the Plex Media Server and it'll come back online, but i'd prefer to find the plugin thats causing the problem, then treat the symptom.

     

    I'll attach my full syslog as well, in case that will help.

     

     

    vmstat shows

    ****

    Linux 3.9.6p-unRAID.

    root@STORSERV:~# vmstat

    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

    r  b  swpd  free  buff  cache  si  so    bi    bo  in  cs us sy id wa

    2  1      0 3484316 145364 12294496    0    0    9    1  17  21 10  7 73 11

     

    ****

     

    free -l shows

    ****

    root@STORSERV:~# free -l

                total      used      free    shared    buffers    cached

    Mem:      16542820  13750824    2791996          0    145800  12984332

    Low:        768168    694676      73492

    High:    15774652  13056148    2718504

    -/+ buffers/cache:    620692  15922128

    Swap:            0          0          0

     

    ****

     

    any help would be greatly appreciated!

    Thanks

    rs

    syslog-2013-12-03.zip

  15. Is there anything in common between the drives that exhibit this problem, for example on the same controller?  I guess you're using all the PCI-E slots on that motherboard for disk controllers, correct?  It's possible there is an issue with a disk controller being plugged into a certain slot.

     

    i have 21 disks, all on the SuperMicro PCIE Controller cards, Disk1 (the original disk in the post), was on Controller1(port0), Disk17 is on Controller3(port1), Disk8 is on Controller1(port1).

     

    the 5 failures i've had on these SV35.5 drives have happened on almost all controllers, and all ports. Nothing is following a pattern; except that all of the drives that have exhibited this behavior are the same model.

     

    I'm wondering if these drives issue strange SMART responses, and are being falsely detected as failing...

  16. so i've caught it doing it again; i checked the server today, everything was green balled (but the drives were sleeping), i checked the syslog, and it was reporting the same errors before

     

    Jan  9 08:47:12 STORE emhttp: mdcmd: write: Input/output error

    Jan  9 08:47:12 STORE kernel: mdcmd (276): spindown 17

    Jan  9 08:47:12 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5

    Jan  9 08:47:22 STORE emhttp: mdcmd: write: Input/output error

    Jan  9 08:47:22 STORE kernel: mdcmd (277): spindown 17

    Jan  9 08:47:22 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5

    Jan  9 08:47:32 STORE emhttp: mdcmd: write: Input/output error

    Jan  9 08:47:32 STORE kernel: mdcmd (278): spindown 17

    Jan  9 08:47:32 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5

    Jan  9 08:47:42 STORE emhttp: mdcmd: write: Input/output error

    Jan  9 08:47:42 STORE kernel: mdcmd (279): spindown 17

    Jan  9 08:47:42 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5

     

    these were the same messages it was spamming before when the drive failed, previously i stopped the array, and that caused the drive to redball, this time i decided to 'Spinup all disks' to see what it would do.

     

    Then it displayed (sdy is disk17)

     

    Jan  9 20:08:19 STORE kernel: md: disk17: ATA_OP e3 ioctl error: -5

    Jan  9 20:08:19 STORE kernel: mdcmd (4331): spinup 20

    Jan  9 20:08:19 STORE ata_id[20757]: HDIO_GET_IDENTITY failed for '/dev/sdy'

    Jan  9 20:08:19 STORE kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

     

    All drive spun up and showed a solid green ball, shortly after that, Disk17 redballed and then Disk8 also redballed.

     

    Jan  9 20:30:16 STORE kernel: mdcmd (4390): spinup 20

    Jan  9 20:30:25 STORE kernel: md: disk8 read error

    Jan  9 20:30:25 STORE kernel: handle_stripe read error: 36112/8, count: 1

    Jan  9 20:30:25 STORE kernel: Buffer I/O error on device md8, logical block 4514

    Jan  9 20:30:25 STORE kernel: lost page write due to I/O error on md8

    Jan  9 20:30:25 STORE kernel: md: disk8 read error

    Jan  9 20:30:25 STORE kernel: handle_stripe read error: 36120/8, count: 1

    Jan  9 20:30:25 STORE kernel: Buffer I/O error on device md8, logical block 4515

    Jan  9 20:30:25 STORE kernel: lost page write due to I/O error on md8

     

    I have powered the unraid box down, and powered it back up, DISK8 was detected (and shows green), and so was DISK17 (and shows RED), i have replaced disk17 with a fresh 3TB Seagate SV35.5 drive and the array is rebuilding.

     

    any suggestions? or ideas? i went through my notes and this has actually happened 5 times so far each on an Seagate 3TB SV35.5 drive.

     

    i have posted the syslog, and i cut out a bunch of the repeated error messages just to keep the log small for posting.

     

    thanks!

    syslog.20130109.txt

  17. Hey guys, i've been having a wierd issue with my unraid box, where (seemingly at random) a drive (always model Seagate 3TB SV35.5 "ST3000VX000") will start issuing errors.

     

    Syslog will start to show something like...

     

    kernel: md: disk1: ATA_OP e0 ioctl error: -5

    mdcmd: write: Input/output error kernel: mdcmd (121)

     

    at this point, the drive is still Green Balled and reporting no issues in the web interface. if i stop the array, the drive will redball and say that its missing (and its missing from the drop down menu)

     

    Rebooting the system will detect the missing drive (Shows in the drop down); but the array is stopped saying that disk is DISK_DSBL.

     

    This has happened twice before, each time with a different Seagate ST3000vx000 (3tb) drive (each drive has been connected to a different controller/cable/power connector), i have replaced the drive each time, and the raid has rebuilt it, and i've never thought too much of it.

     

    but this is the third time, so i figured i need to look into it.

     

    My configuration is..

     

    Unraid Pro ver: 5.0-rc8a, installed on a Kingston DT_101 8gb USB drive

    motherboard: Asus p8z77-v LK

    CPU: intel i3-2120

    Memory: 16GB (4x4gb)

    SAS Controllers: 3x Supermicro AOC-SAS2LP-MV8

    Power supply: 850watt PSU

     

    i have a mix of drives in my machine, mostly Seagates, only a handful of 3TB SV35.5 though.

     

    The smart status is showing the following.

     

     

    smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

    Home page is http://smartmontools.sourceforge.net/

     

    === START OF INFORMATION SECTION ===

    Device Model:    ST3000VX000-9YW166

    Serial Number:    [cut]

    Firmware Version: CV13

    User Capacity:    3,000,592,982,016 bytes

    Device is:        Not in smartctl database [for details use: -P showall]

    ATA Version is:  8

    ATA Standard is:  ATA-8-ACS revision 4

    Local Time is:    Mon Jan  7 16:46:10 2013 EST

    SMART support is: Available - device has SMART capability.

    SMART support is: Enabled

     

    === START OF READ SMART DATA SECTION ===

    SMART overall-health self-assessment test result: PASSED

     

    General SMART Values:

    Offline data collection status:  (0x00) Offline data collection activity

    was never started.

    Auto Offline Data Collection: Disabled.

    Self-test execution status:      (  0) The previous self-test routine completed

    without error or no self-test has ever

    been run.

    Total time to complete Offline

    data collection: ( 592) seconds.

    Offline data collection

    capabilities: (0x73) SMART execute Offline immediate.

    Auto Offline data collection on/off support.

    Suspend Offline collection upon new

    command.

    No Offline surface scan supported.

    Self-test supported.

    Conveyance Self-test supported.

    Selective Self-test supported.

    SMART capabilities:            (0x0003) Saves SMART data before entering

    power-saving mode.

    Supports SMART auto save timer.

    Error logging capability:        (0x01) Error logging supported.

    General Purpose Logging supported.

    Short self-test routine

    recommended polling time: (  1) minutes.

    Extended self-test routine

    recommended polling time: ( 255) minutes.

    Conveyance self-test routine

    recommended polling time: (  2) minutes.

    SCT capabilities:       (0x10b9) SCT Status supported.

    SCT Feature Control supported.

    SCT Data Table supported.

     

    SMART Attributes Data Structure revision number: 10

    Vendor Specific SMART Attributes with Thresholds:

    ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

      1 Raw_Read_Error_Rate    0x000f  116  099  006    Pre-fail  Always      -      115549888

      3 Spin_Up_Time            0x0003  094  093  000    Pre-fail  Always      -      0

      4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      278

      5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

      7 Seek_Error_Rate        0x000f  072  060  030    Pre-fail  Always      -      15450467

      9 Power_On_Hours          0x0032  099  099  000    Old_age  Always      -      1169

    10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

    12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      25

    184 Unknown_Attribute      0x0032  100  100  099    Old_age  Always      -      0

    187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

    188 Unknown_Attribute      0x0032  100  100  000    Old_age  Always      -      0

    189 High_Fly_Writes        0x003a  001  001  000    Old_age  Always      -      159

    190 Airflow_Temperature_Cel 0x0022  067  056  045    Old_age  Always      -      33 (Lifetime Min/Max 25/43)

    191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

    192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      17

    193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      1919

    194 Temperature_Celsius    0x0022  033  044  000    Old_age  Always      -      33 (0 21 0 0)

    197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

    198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

    199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0

     

    SMART Error Log Version: 1

    No Errors Logged

     

    SMART Self-test log structure revision number 1

    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

    # 1  Short offline      Completed without error      00%      1169        -

     

    SMART Selective self-test log data structure revision number 1

    SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

        1        0        0  Not_testing

        2        0        0  Not_testing

        3        0        0  Not_testing

        4        0        0  Not_testing

        5        0        0  Not_testing

    Selective self-test flags (0x0):

      After scanning selected spans, do NOT read-scan remainder of disk.

    If Selective self-test is pending on power-up, resume after 0 minute delay.

     

    Does anyone have any ideas?

    smart.log.txt

  18. and as strangely as this problem started, after swapping all hardware (except the drives and the controller cards (and backplanes), a motherboard bios flash appears to have fixed this issue, or atleast its stopped locking up for now.

     

    On a side note the Asus P8Z77-LK motherboard has some weird behavior with the Super Micro AOC-SAS2LP-MV8 cards.

     

    If you use one of the cards in the 3rd pcie-16x slot (which will normally function at 2x due to chipset limitations), and then you force the 3rd pcie slot to 4x in the bios (which disables one of the pcie-1x slots) the performance on the 3rd pcie-16x slot drops to nothing, like 2-4MB/s write performance. When i undo the change and drop the slot back to the defaulted pcie-2x speeds, i get 50-60MB write.

     

    Thanks for all the help guys!

×
×
  • Create New...