rsbuc
-
Posts
26 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by rsbuc
-
-
Hello! I was having an issue before where my Incremental parity checks were not reading the disk temperatures correctly when the disks had spun down (they were reporting "=*".
I have updated to the latest version of the Parity Tuning Script, and now the script doesn't appear to be collecting/detecting the disk temperature at all anymore.
here is a snippet from the syslog (with Testing logs enabled)
***
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR begin ------
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR /boot/config/forcesync marker file present
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR manual marker file present
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR parityTuningActive=1, parityTuningPos=886346616
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR appears there is a running array operation but no Progress file yet created
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: Manual Correcting Parity-Check
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR MANUAL record to be written
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Current disks information saved to disks marker file
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written header record to progress marker file
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written MANUAL record to progress marker file
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Creating required cron entries
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: Created cron entry for scheduled pause and resume
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: Created cron entry for 6 minute interval monitoring
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR CA Backup not running, array operation paused
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... no action required
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR global temperature limits: Warning: 50, Critical: 55
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR plugin temperature settings: Pause 3, Resume 8
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: array drives=0, hot=0, warm=0, cool=0, spundown=0, idle=0
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG: Array operation paused but not for temperature related reason
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR end ------
***
the parity check tuning clearly shows "Warm=0, Cool=0, Spundown=0" but there are several disks above 55c.
and heres a screenshot of the disk temps in the webui.
(thanks again for reading this message)
-
2 hours ago, itimpi said:
The plugin always treats the case where the temperature is returned as '*' due to spindown as 'cool' so there needs to be something else going on. I will see if I can work out what it is from the syslog you provided.
No worries, I appreciate the effort, if you'd like more info let me know.
-
On 12/16/2022 at 8:16 AM, itimpi said:
If you think the plugin is not correctly resuming when drives cool down, then perhaps you can try turning on the Testing level of logging in the plugin and sending me the resulting logs as that will allow me to see the fine detail of what the plugin is doing under the covers. Testing the temperature related stuff is extremely tricky as my systems do not suffer from heat issues so I have to artificially try to set up tests to simulate temperature issues.
I've finally had a few mins to test this out with the TESTING log mode enabled. I think you were hinting at what I've seen.
When the array goes into 'overheat mode' and the parity check pauses, the disks eventually spin down and the temperature value in the log goes to "Temp=*" instead of showing an actual Temperature value, so the Parity Check Tuning script doesn't see a valid numerical temperature value to resume the parity check process.
after waiting ~12minutes, I manually clicked 'spin up disks' and then 6minutes later the parity check process resumed as it was able to see the temperature values when the disks were spun up.
I'm attaching my syslog.
-
2 hours ago, itimpi said:
This is the basic behaviour as long as the time is within the overall time slot set for an increment. How long a temperature related pause will last depends on how quickly your drives cool down to reach the resume temperature threshold. The plugin will take into account if you have set specific temperature threshold settings at the Unraid level on a drive over-riding the global ones. You may find the Debug logging level helps with a basic understanding of what the plugin is doing without having to know too much detail of the underlying mechanisms being used.
Once you get outside the time slot for the overall increment then the plugin will pause the check and the temperature related pause/resume will stop happening (until the time comes around to start the next increment).
If I can provide any further clarification then please ask. As a new user if you can think of items I could add to the built-in help that would have helped you then please feel free to suggest them.
Interesting, I've enabled Debug logging, and that totally demystifies a lot of what the plugin is doing (Thanks for that). Here is what I'm seeing (I'm sure I have a bad setting or something) -- I start the parity check, it runs for an hour or so, then the hard drives hit their temperature limit, and the parity check pauses. The drives spin down, and the drives cool off, but the plugin doesn't seem to resume the parity operations.
If I "Spin up all disks" it will detect the drive temperatures as being cool again and resume the parity check.
are there special disk settings that I need to enable for this to work properly?
(also, thanks again for trying to helping me out!)
-
17 hours ago, itimpi said:
I think you are over-thinking this! You only want to set the increment pause/resume times to define the maximum time period you want the parity check to potentially run.
You then set the temperature related pause resume values and as long as you are within the increment period the plugin will pause/resume the check based on disk temperatures. You may also want to have aggressive spin down times on the drives as experience has shown that simply keeping them spinning even if no I/O is taking place significantly extends the cool down time.
Hello! Am I understanding this correctly? The plugin will pause the parity operation when the disks reach the temperature threshold and wait until the temps fall below the temperature threshold value -- then the script immediately resume the parity operations? or will it only attempt to resume after the 'Increment resume time' schedule?
-
Hey Everyone! I've been trying to get the "Increment Frequency/Custom" working for what I need, but I'm struggling.
I have cooling issues with my Unraid, and what my goal is to allow the the Parity Check to 'Pause when disks overheat', then have the Custom Increment frequency pause the Parity operations for ~30mins to let the disks cool down, and then resume (or at least check if the disks are cooled down enough) and then Resume parity operations.
Clearly my cron skills are weak, is there an "Increment Resume Time" and "Increment Pause Time" that someone can suggest?
(thanks again for all the awesome features in the Parity Check Tuning plugin!)
-
Thanks for the advice, i upgraded to the latest version, and i've been able to swap out 3 disks so far with larger disks, and i received no errors on rebuild. Maybe it was a funny driver version for my sas cards in that specific version of unraid.
Thanks again!
-
Write errors during array expansion/rebuild
Hey guys, first let me say that I've been an unraid user for several years, and I've had to visit the forum on numerous occasions, and have usually found solutions/similar problems to my issues (thank you all).
But this time i haven't found an identical issue.
I am running Unraid 6.0-rc3, in my Norco 24bay chassis (with some off the shelf Intel/asus cpu/motherboard)
Here is what has happened so far.
I decided to swap out disk17 in my unraid to upgrade its size, i was going from a 6tb to an 8tb drive. The 8tb drive i had pre-cleared ~6times or so without any issues.
I stopped the array as i normally would, i removed disk17, waited ~30secs, installed the new 8tb disk into the disk17 position, waited ~30secs for the drive to be detected.
Selected the new 8tb disk in the unraid UI (in the disk17 position), and selected 'Rebuild/Expand' and started the array.
The array began rebuilding, and the following morning (Today) the rebuild had completed, but instead of all of the disks having a 'green ball' next to it, Disk17 had a 'gray triangle'. Checking the syslog, i found a bunch of 'Write Errors'.
I stopped the array, reseated disk17 in the array, and rebooted unraid.
Unraid started, but disk17 still has a 'gray triangle' on it - and in my dashboard view, i have a 'Data is Invalid' warning at the bottom.
What should i try next?
Attached is my syslog and a couple screenshots
-
That is correct, my initial plan was to replace an existing 4tb disk with a 6tb disk, but numerous write errors caused the expansion/rebuild to fail.
When i reseated the 6tb disk, i tried to restart the array with the same 6tb disk that had write errors, the array started with that drive showing "unformatted" and it started to rebuild the parity disk for the array with disk19 "missing" (since it was showing unformatted).
I didnt catch it until it already started to rebuild parity.
So i stopped the array, replaced disk 19, with a fresh 6tb disk, formatted it, and started the array (parity) rebuild.
-
Hey guys, just figured i'd update my issue, incase anyone has it in the future.
I stopped the array, reseated the drive, and started the array again. Unraid detected the drive, but it detected it as "Unformatted". So it rebuilt parity for the array, with a missing/blank Disk19.
I stopped the array rebuild, replaced disk19 with a new disk, formatted disk19, and rebuilt the array again (which regenerated the parity with a blank disk19).
Everything appears to be fine, but i will have to copy the data from the old disk19 to the new disk19.
Thanks again for all the help!
rsbuc
-
I'll stop the array tomorrow, and reseat the drive (its plugs into a backplane on the norco), and the other drives on that backplane appear to be fine.
-
Sadly no, i didn't run a pre-clear on it (i think i learned a valuable lesson about skipping the preclear).
i can't seem to run a SMART test on it, when i 'spin up all drives' every drive except disk19 goes 'green ball', but disk19 stays as a gray triangle.
When i try to run a SMART test on it, it says that it needs to be spun up (but disk19 will not spin up).
-
Hey guys, i've been running unraid for quite sometime, and i couldn't be happier, its been great.
My system info is.
Unraid V6.0-rc3
24bay norco chassis, 21disks
I tried to replace one of my existing 4tb drives (disk19) with a 6tb drive (which I've done numerous times before), but this time, when i restarted the array to start up the array expansion, it started the rebuild, but when i checked on it a couple hours later, the rebuild stopped, and the newly replaced disk has a gray triangle on it, and my 'Parity status' is 'Data is invalid'. The array is started, and its serving data fine.
When i check the logs i see...
****
Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#4 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 7a 08 00 00 04 00 00 00
Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042160648
Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#5 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 76 08 00 00 04 00 00 00
Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042159624
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164680
Dec 17 11:22:38 UNRAID kernel: md: md_do_sync: got signal, exit...
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164688
Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164696
Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#6 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee 72 08 00 00 04 00 00 00
Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042158600
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164704
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164712
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164720
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164728
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164736
Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164744
Dec 17 11:22:38 UNRAID kernel: sd 9:0:9:0: [sdw] tag#7 CDB: opcode=0x8a 8a 00 00 00 00 00 f0 ee aa 08 00 00 04 00 00 00
Dec 17 11:22:38 UNRAID kernel: blk_update_request: I/O error, dev sdw, sector 4042172936
Dec 17 11:22:38 UNRAID kernel: md: disk19 write error, sector=4042164752
****
and the "disk19 write error, sector=....." continues for numerous pages.
it looks like Disk19 went bad during the expansion, what is my suggested next step here?
Am i safe to stop the array, remove 'disk19' and replace it with another 6tb drive?
Will it rebuild correctly?
Any help would be appreciated!
-
Hey BDHarrington, in my attempt to troubleshoot my issue i ended up disabling all of my plugins, but i still had strange/random lockups about once a week (followed by a lengthy consistency check).
I considered dropping my system memory down to 4gb (From 16gb) and doing a fresh install without any plugins, but just as i was about to remove the dimm's version 6.0b3 got its release, so instead i installed 6.0b3 and its been working great, Previously my uptime wouldn't be longer than a week without a strange lockup; now its 64days of uptime.
on a side now, i have since moved my plex service to a vm on my hyper-v host. I couldn't be happier with v6.0b3!
-
Thanks guys for all the replies, i thought it might have been the 'Openssh' plugin since that was the most recent one that i installed, but after 4 days of running with Openssh disabled, the same Out of Memory errors popped back up.
I then tried Tony's suggestion (echo 65536 > /proc/sys/vm/min_free_kbytes), which seems to be limiting the maximum amount of memory that linux is using (before the change, unraid would hover around 12GB of memory usage, now its hovering around 7GB).
But the problems are still here, i'll try to disable more plugins to see which one is causing my pains.
unless anyone has any other ideas?
-
Hey guys, first let me say thanks for all the help, I've been able to search through the forums for most of the other
issues I've encountered and found enough help to resolve my issue.
My setup is as follows
unRAID Version: unRAID Server Pro, Version 5.0
Motherboard: ASUSTeK COMPUTER INC. - P8Z77-V LK
Processor: Intel® Core i3-2120 CPU @ 3.30GHz - 3.3 GHz
Cache: L1-Cache = 128 kB (max. 128 kB)
L2-Cache = 512 kB (max. 512 kB)
L3-Cache = 3072 kB (max. 3072 kB)
Memory: 16384 MB (max. 32 GB)
4096 MB = BANK 0, 1333 MHz
1333 MHz = 4096 MB, BANK 1
1333 MHz = 1333 MHz, 4096 MB
BANK 2 = 1333 MHz, 1333 MHz
4096 MB = BANK 3, 1333 MHz
1333 MHz = ,
Network: eth0: 1000Mb/s - Full Duplex
Uptime: 4 days, 7 hrs, 5 mins, 15 secs
Running plugins
Native Plugins
OpenSSH
Plex Media Server
Unmenu Plugins
Plex
bwm-ng
htop
lsof
unraid status alert email
monthly parity check
As of recent (maybe over the past 2-3weeks) i'I've been having an issue while playing back media through Plex Media Server, where the content will start to stream, and then after a few minutes, the stream will stop, when i try to
restart it, Plex Media Server is no longer running.
I've checked my syslog, and it shows an 'out of memory' condition followed by it killing PMS
****
Dec 3 13:53:46 STORSERV kernel: Out of memory: Kill process 11279 (Plex Media Serv) score 2 or sacrifice child
Dec 3 13:53:46 STORSERV kernel: Killed process 11279 (Plex Media Serv) total-vm:316424kB, anon-rss:33892kB, file-rss:11112kB
****
I'm sure something is leaking memory but i'm not savvy enough to be able to identify which plugin it is?
I've tried the SwapFile plugin without success, and I've tried re installing some of plugins without any change either.
I can usually just restart the Plex Media Server and it'll come back online, but i'd prefer to find the plugin thats causing the problem, then treat the symptom.
I'll attach my full syslog as well, in case that will help.
vmstat shows
****
Linux 3.9.6p-unRAID.
root@STORSERV:~# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 1 0 3484316 145364 12294496 0 0 9 1 17 21 10 7 73 11
****
free -l shows
****
root@STORSERV:~# free -l
total used free shared buffers cached
Mem: 16542820 13750824 2791996 0 145800 12984332
Low: 768168 694676 73492
High: 15774652 13056148 2718504
-/+ buffers/cache: 620692 15922128
Swap: 0 0 0
****
any help would be greatly appreciated!
Thanks
rs
-
What is really strange, is that my parity drive, which is also the same SV3.5 model that is causing me problems, has never disconnected on its own (like all of the other drives), i think almost ALL of the SV35.5 drives i have, i have had to replace (except the parity drive).
-
Is there anything in common between the drives that exhibit this problem, for example on the same controller? I guess you're using all the PCI-E slots on that motherboard for disk controllers, correct? It's possible there is an issue with a disk controller being plugged into a certain slot.
i have 21 disks, all on the SuperMicro PCIE Controller cards, Disk1 (the original disk in the post), was on Controller1(port0), Disk17 is on Controller3(port1), Disk8 is on Controller1(port1).
the 5 failures i've had on these SV35.5 drives have happened on almost all controllers, and all ports. Nothing is following a pattern; except that all of the drives that have exhibited this behavior are the same model.
I'm wondering if these drives issue strange SMART responses, and are being falsely detected as failing...
-
Hi there, the power supply is an 850watt Silverstone; which was brand new back in september 2011 (along with most of the other hardware).
-
so i've caught it doing it again; i checked the server today, everything was green balled (but the drives were sleeping), i checked the syslog, and it was reporting the same errors before
Jan 9 08:47:12 STORE emhttp: mdcmd: write: Input/output error
Jan 9 08:47:12 STORE kernel: mdcmd (276): spindown 17
Jan 9 08:47:12 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5
Jan 9 08:47:22 STORE emhttp: mdcmd: write: Input/output error
Jan 9 08:47:22 STORE kernel: mdcmd (277): spindown 17
Jan 9 08:47:22 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5
Jan 9 08:47:32 STORE emhttp: mdcmd: write: Input/output error
Jan 9 08:47:32 STORE kernel: mdcmd (278): spindown 17
Jan 9 08:47:32 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5
Jan 9 08:47:42 STORE emhttp: mdcmd: write: Input/output error
Jan 9 08:47:42 STORE kernel: mdcmd (279): spindown 17
Jan 9 08:47:42 STORE kernel: md: disk17: ATA_OP e0 ioctl error: -5
these were the same messages it was spamming before when the drive failed, previously i stopped the array, and that caused the drive to redball, this time i decided to 'Spinup all disks' to see what it would do.
Then it displayed (sdy is disk17)
Jan 9 20:08:19 STORE kernel: md: disk17: ATA_OP e3 ioctl error: -5
Jan 9 20:08:19 STORE kernel: mdcmd (4331): spinup 20
Jan 9 20:08:19 STORE ata_id[20757]: HDIO_GET_IDENTITY failed for '/dev/sdy'
Jan 9 20:08:19 STORE kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
All drive spun up and showed a solid green ball, shortly after that, Disk17 redballed and then Disk8 also redballed.
Jan 9 20:30:16 STORE kernel: mdcmd (4390): spinup 20
Jan 9 20:30:25 STORE kernel: md: disk8 read error
Jan 9 20:30:25 STORE kernel: handle_stripe read error: 36112/8, count: 1
Jan 9 20:30:25 STORE kernel: Buffer I/O error on device md8, logical block 4514
Jan 9 20:30:25 STORE kernel: lost page write due to I/O error on md8
Jan 9 20:30:25 STORE kernel: md: disk8 read error
Jan 9 20:30:25 STORE kernel: handle_stripe read error: 36120/8, count: 1
Jan 9 20:30:25 STORE kernel: Buffer I/O error on device md8, logical block 4515
Jan 9 20:30:25 STORE kernel: lost page write due to I/O error on md8
I have powered the unraid box down, and powered it back up, DISK8 was detected (and shows green), and so was DISK17 (and shows RED), i have replaced disk17 with a fresh 3TB Seagate SV35.5 drive and the array is rebuilding.
any suggestions? or ideas? i went through my notes and this has actually happened 5 times so far each on an Seagate 3TB SV35.5 drive.
i have posted the syslog, and i cut out a bunch of the repeated error messages just to keep the log small for posting.
thanks!
-
Hey guys, i've been having a wierd issue with my unraid box, where (seemingly at random) a drive (always model Seagate 3TB SV35.5 "ST3000VX000") will start issuing errors.
Syslog will start to show something like...
kernel: md: disk1: ATA_OP e0 ioctl error: -5
mdcmd: write: Input/output error kernel: mdcmd (121)
at this point, the drive is still Green Balled and reporting no issues in the web interface. if i stop the array, the drive will redball and say that its missing (and its missing from the drop down menu)
Rebooting the system will detect the missing drive (Shows in the drop down); but the array is stopped saying that disk is DISK_DSBL.
This has happened twice before, each time with a different Seagate ST3000vx000 (3tb) drive (each drive has been connected to a different controller/cable/power connector), i have replaced the drive each time, and the raid has rebuilt it, and i've never thought too much of it.
but this is the third time, so i figured i need to look into it.
My configuration is..
Unraid Pro ver: 5.0-rc8a, installed on a Kingston DT_101 8gb USB drive
motherboard: Asus p8z77-v LK
CPU: intel i3-2120
Memory: 16GB (4x4gb)
SAS Controllers: 3x Supermicro AOC-SAS2LP-MV8
Power supply: 850watt PSU
i have a mix of drives in my machine, mostly Seagates, only a handful of 3TB SV35.5 though.
The smart status is showing the following.
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: ST3000VX000-9YW166
Serial Number: [cut]
Firmware Version: CV13
User Capacity: 3,000,592,982,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Mon Jan 7 16:46:10 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 592) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10b9) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 116 099 006 Pre-fail Always - 115549888
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 278
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 15450467
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1169
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 25
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 159
190 Airflow_Temperature_Cel 0x0022 067 056 045 Old_age Always - 33 (Lifetime Min/Max 25/43)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 17
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1919
194 Temperature_Celsius 0x0022 033 044 000 Old_age Always - 33 (0 21 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1169 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Does anyone have any ideas?
-
and as strangely as this problem started, after swapping all hardware (except the drives and the controller cards (and backplanes), a motherboard bios flash appears to have fixed this issue, or atleast its stopped locking up for now.
On a side note the Asus P8Z77-LK motherboard has some weird behavior with the Super Micro AOC-SAS2LP-MV8 cards.
If you use one of the cards in the 3rd pcie-16x slot (which will normally function at 2x due to chipset limitations), and then you force the 3rd pcie slot to 4x in the bios (which disables one of the pcie-1x slots) the performance on the 3rd pcie-16x slot drops to nothing, like 2-4MB/s write performance. When i undo the change and drop the slot back to the defaulted pcie-2x speeds, i get 50-60MB write.
Thanks for all the help guys!
-
just swapped out the memory with another kit, same problem :\
I'm going to try swapping the CPU and motherboard out next.
Here's hoping that its one of those!
thanks
-
Hey Dgaschk, i've gone through the UEFI bios on my asus motherboard, and I've tried to match the settings in the Intel Bios page.
But the problem still exists.
Anyone else have any ideas?
Thanks!
[Plugin] Parity Check Tuning
in Plugin Support
Posted
Sure! see attached
This log has my array going from normal temps to 57c which is above my CRIT level.
syslog.txt