rsbuc

Members
  • Posts

    26
  • Joined

  • Last visited

Everything posted by rsbuc

  1. Sure! see attached This log has my array going from normal temps to 57c which is above my CRIT level. syslog.txt
  2. Hello! I was having an issue before where my Incremental parity checks were not reading the disk temperatures correctly when the disks had spun down (they were reporting "=*". I have updated to the latest version of the Parity Tuning Script, and now the script doesn't appear to be collecting/detecting the disk temperature at all anymore. here is a snippet from the syslog (with Testing logs enabled) *** Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR begin ------ Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR /boot/config/forcesync marker file present Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR manual marker file present Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR parityTuningActive=1, parityTuningPos=886346616 Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR appears there is a running array operation but no Progress file yet created Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: Manual Correcting Parity-Check Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR MANUAL record to be written Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Current disks information saved to disks marker file Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written header record to progress marker file Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written MANUAL record to progress marker file Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Creating required cron entries Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: Created cron entry for scheduled pause and resume Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: Created cron entry for 6 minute interval monitoring Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR CA Backup not running, array operation paused Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... no action required Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR global temperature limits: Warning: 50, Critical: 55 Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR plugin temperature settings: Pause 3, Resume 8 Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: array drives=0, hot=0, warm=0, cool=0, spundown=0, idle=0 Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: Array operation paused but not for temperature related reason Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR end ------ *** the parity check tuning clearly shows "Warm=0, Cool=0, Spundown=0" but there are several disks above 55c. and heres a screenshot of the disk temps in the webui. (thanks again for reading this message)
  3. No worries, I appreciate the effort, if you'd like more info let me know.
  4. I've finally had a few mins to test this out with the TESTING log mode enabled. I think you were hinting at what I've seen. When the array goes into 'overheat mode' and the parity check pauses, the disks eventually spin down and the temperature value in the log goes to "Temp=*" instead of showing an actual Temperature value, so the Parity Check Tuning script doesn't see a valid numerical temperature value to resume the parity check process. after waiting ~12minutes, I manually clicked 'spin up disks' and then 6minutes later the parity check process resumed as it was able to see the temperature values when the disks were spun up. I'm attaching my syslog. syslog.txt
  5. Interesting, I've enabled Debug logging, and that totally demystifies a lot of what the plugin is doing (Thanks for that). Here is what I'm seeing (I'm sure I have a bad setting or something) -- I start the parity check, it runs for an hour or so, then the hard drives hit their temperature limit, and the parity check pauses. The drives spin down, and the drives cool off, but the plugin doesn't seem to resume the parity operations. If I "Spin up all disks" it will detect the drive temperatures as being cool again and resume the parity check. are there special disk settings that I need to enable for this to work properly? (also, thanks again for trying to helping me out!)
  6. Hello! Am I understanding this correctly? The plugin will pause the parity operation when the disks reach the temperature threshold and wait until the temps fall below the temperature threshold value -- then the script immediately resume the parity operations? or will it only attempt to resume after the 'Increment resume time' schedule?
  7. Hey Everyone! I've been trying to get the "Increment Frequency/Custom" working for what I need, but I'm struggling. I have cooling issues with my Unraid, and what my goal is to allow the the Parity Check to 'Pause when disks overheat', then have the Custom Increment frequency pause the Parity operations for ~30mins to let the disks cool down, and then resume (or at least check if the disks are cooled down enough) and then Resume parity operations. Clearly my cron skills are weak, is there an "Increment Resume Time" and "Increment Pause Time" that someone can suggest? (thanks again for all the awesome features in the Parity Check Tuning plugin!)
  8. Thanks for the advice, i upgraded to the latest version, and i've been able to swap out 3 disks so far with larger disks, and i received no errors on rebuild. Maybe it was a funny driver version for my sas cards in that specific version of unraid. Thanks again!
  9. Write errors during array expansion/rebuild Hey guys, first let me say that I've been an unraid user for several years, and I've had to visit the forum on numerous occasions, and have usually found solutions/similar problems to my issues (thank you all). But this time i haven't found an identical issue. I am running Unraid 6.0-rc3, in my Norco 24bay chassis (with some off the shelf Intel/asus cpu/motherboard) Here is what has happened so far. I decided to swap out disk17 in my unraid to upgrade its size, i was going from a 6tb to an 8tb drive. The 8tb drive i had pre-cleared ~6times or so without any issues. I stopped the array as i normally would, i removed disk17, waited ~30secs, installed the new 8tb disk into the disk17 position, waited ~30secs for the drive to be detected. Selected the new 8tb disk in the unraid UI (in the disk17 position), and selected 'Rebuild/Expand' and started the array. The array began rebuilding, and the following morning (Today) the rebuild had completed, but instead of all of the disks having a 'green ball' next to it, Disk17 had a 'gray triangle'. Checking the syslog, i found a bunch of 'Write Errors'. I stopped the array, reseated disk17 in the array, and rebooted unraid. Unraid started, but disk17 still has a 'gray triangle' on it - and in my dashboard view, i have a 'Data is Invalid' warning at the bottom. What should i try next? Attached is my syslog and a couple screenshots syslog.zip
  10. That is correct, my initial plan was to replace an existing 4tb disk with a 6tb disk, but numerous write errors caused the expansion/rebuild to fail. When i reseated the 6tb disk, i tried to restart the array with the same 6tb disk that had write errors, the array started with that drive showing "unformatted" and it started to rebuild the parity disk for the array with disk19 "missing" (since it was showing unformatted). I didnt catch it until it already started to rebuild parity. So i stopped the array, replaced disk 19, with a fresh 6tb disk, formatted it, and started the array (parity) rebuild.
  11. Hey guys, just figured i'd update my issue, incase anyone has it in the future. I stopped the array, reseated the drive, and started the array again. Unraid detected the drive, but it detected it as "Unformatted". So it rebuilt parity for the array, with a missing/blank Disk19. I stopped the array rebuild, replaced disk19 with a new disk, formatted disk19, and rebuilt the array again (which regenerated the parity with a blank disk19). Everything appears to be fine, but i will have to copy the data from the old disk19 to the new disk19. Thanks again for all the help! rsbuc
  12. I'll stop the array tomorrow, and reseat the drive (its plugs into a backplane on the norco), and the other drives on that backplane appear to be fine.
  13. Sadly no, i didn't run a pre-clear on it (i think i learned a valuable lesson about skipping the preclear). i can't seem to run a SMART test on it, when i 'spin up all drives' every drive except disk19 goes 'green ball', but disk19 stays as a gray triangle. When i try to run a SMART test on it, it says that it needs to be spun up (but disk19 will not spin up).
  14. Hey guys, i've been running unraid for quite sometime, and i couldn't be happier, its been great. My system info is. Unraid V6.0-rc3 24bay norco chassis, 21disks I tried to replace one of my existing 4tb drives (disk19) with a 6tb drive (which I've done numerous times before), but this time, when i restarted the array to start up the array expansion, it started the rebuild, but when i checked on it a couple hours later, the rebuild stopped, and the newly replaced disk has a gray triangle on it, and my 'Parity status' is 'Data is invalid'. The array is started, and its serving data fine. When i check the logs i see... **** Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#4 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 7a 08 00 00 04 00 00 00 Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042160648 Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#5 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 76 08 00 00 04 00 00 00 Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042159624 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164680 Dec 17 11:22:38 UNRAID kernel: md: md_do_sync: got signal, exit... Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164688 Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164696 Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#6 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 72 08 00 00 04 00 00 00 Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042158600 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164704 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164712 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164720 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164728 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164736 Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164744 Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#7 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee aa 08 00 00 04 00 00 00 Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042172936 Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164752 **** and the "disk19 write error, sector=....." continues for numerous pages. it looks like Disk19 went bad during the expansion, what is my suggested next step here? Am i safe to stop the array, remove 'disk19' and replace it with another 6tb drive? Will it rebuild correctly? Any help would be appreciated!
  15. Hey BDHarrington, in my attempt to troubleshoot my issue i ended up disabling all of my plugins, but i still had strange/random lockups about once a week (followed by a lengthy consistency check). I considered dropping my system memory down to 4gb (From 16gb) and doing a fresh install without any plugins, but just as i was about to remove the dimm's version 6.0b3 got its release, so instead i installed 6.0b3 and its been working great, Previously my uptime wouldn't be longer than a week without a strange lockup; now its 64days of uptime. on a side now, i have since moved my plex service to a vm on my hyper-v host. I couldn't be happier with v6.0b3!
  16. Thanks guys for all the replies, i thought it might have been the 'Openssh' plugin since that was the most recent one that i installed, but after 4 days of running with Openssh disabled, the same Out of Memory errors popped back up. I then tried Tony's suggestion (echo 65536 > /proc/sys/vm/min_free_kbytes), which seems to be limiting the maximum amount of memory that linux is using (before the change, unraid would hover around 12GB of memory usage, now its hovering around 7GB). But the problems are still here, i'll try to disable more plugins to see which one is causing my pains. unless anyone has any other ideas?
  17. Hey guys, first let me say thanks for all the help, I've been able to search through the forums for most of the other issues I've encountered and found enough help to resolve my issue. My setup is as follows unRAID Version: unRAID Server Pro, Version 5.0 Motherboard: ASUSTeK COMPUTER INC. - P8Z77-V LK Processor: Intel® Core i3-2120 CPU @ 3.30GHz - 3.3 GHz Cache: L1-Cache = 128 kB (max. 128 kB) L2-Cache = 512 kB (max. 512 kB) L3-Cache = 3072 kB (max. 3072 kB) Memory: 16384 MB (max. 32 GB) 4096 MB = BANK 0, 1333 MHz 1333 MHz = 4096 MB, BANK 1 1333 MHz = 1333 MHz, 4096 MB BANK 2 = 1333 MHz, 1333 MHz 4096 MB = BANK 3, 1333 MHz 1333 MHz = , Network: eth0: 1000Mb/s - Full Duplex Uptime: 4 days, 7 hrs, 5 mins, 15 secs Running plugins Native Plugins OpenSSH Plex Media Server Unmenu Plugins Plex bwm-ng htop lsof unraid status alert email monthly parity check As of recent (maybe over the past 2-3weeks) i'I've been having an issue while playing back media through Plex Media Server, where the content will start to stream, and then after a few minutes, the stream will stop, when i try to restart it, Plex Media Server is no longer running. I've checked my syslog, and it shows an 'out of memory' condition followed by it killing PMS **** Dec 3 13:53:46 STORSERV kernel: Out of memory: Kill process 11279 (Plex Media Serv) score 2 or sacrifice child Dec 3 13:53:46 STORSERV kernel: Killed process 11279 (Plex Media Serv) total-vm:316424kB, anon-rss:33892kB, file-rss:11112kB **** I'm sure something is leaking memory but i'm not savvy enough to be able to identify which plugin it is? I've tried the SwapFile plugin without success, and I've tried re installing some of plugins without any change either. I can usually just restart the Plex Media Server and it'll come back online, but i'd prefer to find the plugin thats causing the problem, then treat the symptom. I'll attach my full syslog as well, in case that will help. vmstat shows **** Linux 3.9.6p-unRAID. root@STORSERV:~# vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 1 0 3484316 145364 12294496 0 0 9 1 17 21 10 7 73 11 **** free -l shows **** root@STORSERV:~# free -l total used free shared buffers cached Mem: 16542820 13750824 2791996 0 145800 12984332 Low: 768168 694676 73492 High: 15774652 13056148 2718504 -/+ buffers/cache: 620692 15922128 Swap: 0 0 0 **** any help would be greatly appreciated! Thanks rs syslog-2013-12-03.zip
  18. What is really strange, is that my parity drive, which is also the same SV3.5 model that is causing me problems, has never disconnected on its own (like all of the other drives), i think almost ALL of the SV35.5 drives i have, i have had to replace (except the parity drive).
  19. i have 21 disks, all on the SuperMicro PCIE Controller cards, Disk1 (the original disk in the post), was on Controller1(port0), Disk17 is on Controller3(port1), Disk8 is on Controller1(port1). the 5 failures i've had on these SV35.5 drives have happened on almost all controllers, and all ports. Nothing is following a pattern; except that all of the drives that have exhibited this behavior are the same model. I'm wondering if these drives issue strange SMART responses, and are being falsely detected as failing...
  20. Hi there, the power supply is an 850watt Silverstone; which was brand new back in september 2011 (along with most of the other hardware).
  21. so i've caught it doing it again; i checked the server today, everything was green balled (but the drives were sleeping), i checked the syslog, and it was reporting the same errors before Jan 9 08:47:12 STORE emhttp: mdcmd: write: Input/output error Jan 9 08:47:12 STORE kernel: mdcmd (276): spindown 17 Jan 9 08:47:12 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5 Jan 9 08:47:22 STORE emhttp: mdcmd: write: Input/output error Jan 9 08:47:22 STORE kernel: mdcmd (277): spindown 17 Jan 9 08:47:22 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5 Jan 9 08:47:32 STORE emhttp: mdcmd: write: Input/output error Jan 9 08:47:32 STORE kernel: mdcmd (278): spindown 17 Jan 9 08:47:32 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5 Jan 9 08:47:42 STORE emhttp: mdcmd: write: Input/output error Jan 9 08:47:42 STORE kernel: mdcmd (279): spindown 17 Jan 9 08:47:42 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5 these were the same messages it was spamming before when the drive failed, previously i stopped the array, and that caused the drive to redball, this time i decided to 'Spinup all disks' to see what it would do. Then it displayed (sdy is disk17) Jan 9 20:08:19 STORE kernel: md: disk17: ATA_OP e3 ioctl error: -5 Jan 9 20:08:19 STORE kernel: mdcmd (4331): spinup 20 Jan 9 20:08:19 STORE ata_id[20757]: HDIO_GET_IDENTITY failed for '/dev/sdy' Jan 9 20:08:19 STORE kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO All drive spun up and showed a solid green ball, shortly after that, Disk17 redballed and then Disk8 also redballed. Jan 9 20:30:16 STORE kernel: mdcmd (4390): spinup 20 Jan 9 20:30:25 STORE kernel: md: disk8 read error Jan 9 20:30:25 STORE kernel: handle_stripe read error: 36112/8, count: 1 Jan 9 20:30:25 STORE kernel: Buffer I/O error on device md8, logical block 4514 Jan 9 20:30:25 STORE kernel: lost page write due to I/O error on md8 Jan 9 20:30:25 STORE kernel: md: disk8 read error Jan 9 20:30:25 STORE kernel: handle_stripe read error: 36120/8, count: 1 Jan 9 20:30:25 STORE kernel: Buffer I/O error on device md8, logical block 4515 Jan 9 20:30:25 STORE kernel: lost page write due to I/O error on md8 I have powered the unraid box down, and powered it back up, DISK8 was detected (and shows green), and so was DISK17 (and shows RED), i have replaced disk17 with a fresh 3TB Seagate SV35.5 drive and the array is rebuilding. any suggestions? or ideas? i went through my notes and this has actually happened 5 times so far each on an Seagate 3TB SV35.5 drive. i have posted the syslog, and i cut out a bunch of the repeated error messages just to keep the log small for posting. thanks! syslog.20130109.txt
  22. Hey guys, i've been having a wierd issue with my unraid box, where (seemingly at random) a drive (always model Seagate 3TB SV35.5 "ST3000VX000") will start issuing errors. Syslog will start to show something like... kernel: md: disk1: ATA_OP e0 ioctl error: -5 mdcmd: write: Input/output error kernel: mdcmd (121) at this point, the drive is still Green Balled and reporting no issues in the web interface. if i stop the array, the drive will redball and say that its missing (and its missing from the drop down menu) Rebooting the system will detect the missing drive (Shows in the drop down); but the array is stopped saying that disk is DISK_DSBL. This has happened twice before, each time with a different Seagate ST3000vx000 (3tb) drive (each drive has been connected to a different controller/cable/power connector), i have replaced the drive each time, and the raid has rebuilt it, and i've never thought too much of it. but this is the third time, so i figured i need to look into it. My configuration is.. Unraid Pro ver: 5.0-rc8a, installed on a Kingston DT_101 8gb USB drive motherboard: Asus p8z77-v LK CPU: intel i3-2120 Memory: 16GB (4x4gb) SAS Controllers: 3x Supermicro AOC-SAS2LP-MV8 Power supply: 850watt PSU i have a mix of drives in my machine, mostly Seagates, only a handful of 3TB SV35.5 though. The smart status is showing the following. smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST3000VX000-9YW166 Serial Number: [cut] Firmware Version: CV13 User Capacity: 3,000,592,982,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Jan 7 16:46:10 2013 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 592) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x10b9) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 116 099 006 Pre-fail Always - 115549888 3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 278 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 15450467 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1169 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 25 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 159 190 Airflow_Temperature_Cel 0x0022 067 056 045 Old_age Always - 33 (Lifetime Min/Max 25/43) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 17 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1919 194 Temperature_Celsius 0x0022 033 044 000 Old_age Always - 33 (0 21 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1169 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Does anyone have any ideas? smart.log.txt
  23. and as strangely as this problem started, after swapping all hardware (except the drives and the controller cards (and backplanes), a motherboard bios flash appears to have fixed this issue, or atleast its stopped locking up for now. On a side note the Asus P8Z77-LK motherboard has some weird behavior with the Super Micro AOC-SAS2LP-MV8 cards. If you use one of the cards in the 3rd pcie-16x slot (which will normally function at 2x due to chipset limitations), and then you force the 3rd pcie slot to 4x in the bios (which disables one of the pcie-1x slots) the performance on the 3rd pcie-16x slot drops to nothing, like 2-4MB/s write performance. When i undo the change and drop the slot back to the defaulted pcie-2x speeds, i get 50-60MB write. Thanks for all the help guys!
  24. just swapped out the memory with another kit, same problem :\ I'm going to try swapping the CPU and motherboard out next. Here's hoping that its one of those! thanks
  25. Hey Dgaschk, i've gone through the UEFI bios on my asus motherboard, and I've tried to match the settings in the Intel Bios page. But the problem still exists. Anyone else have any ideas? Thanks!