ruthlessone Posted December 21, 2014 Share Posted December 21, 2014 I am in 6.0-beta12 So 2 days ago I walk by the server and see it lit up like the Rockafeller Xmas tree with disk activity. I log into the console and see that a parity check has started on its own. I have it set up manually and just had a parity check a week ago with 0 errors. At some point during the check the array went off line and I see 0 listed for the Parity Drive size. All the drives were in the right location and I clicked to start the array. it came right up and began a parity check. At this point the parity disk is yellow balled and parity status is invalid. Thoughts? Link to comment
Frank1940 Posted December 22, 2014 Share Posted December 22, 2014 First, attach a syslog so smarter people than me can review it for problems. Be sure that you get the whole thing, don't edit it if it is too long. Instead compress it and upload the zipped file. Second, get a Smart Report on your parity drive. Be sure to get that syslog first! PS--- Do you have a UPS on your system? Link to comment
ruthlessone Posted December 22, 2014 Author Share Posted December 22, 2014 Thanks Frank, When I view the log from the link, it only has like activity from this hour. No where near what I expect is needed. http://192.168.1.119/log/syslog Link to comment
Frank1940 Posted December 22, 2014 Share Posted December 22, 2014 Thanks Frank, When I view the log from the link, it only has like activity from this hour. No where near what I expect is needed. http://192.168.1.119/log/syslog What you have posted is a link to your server. No one but you can see it. (First, we can't get to the server and, second if we could, we don't have your server password.) Here are the instructions on how to get a syslog: http://lime-technology.com/forum/index.php?topic=9880.msg94514#msg94514 Link to comment
ruthlessone Posted December 23, 2014 Author Share Posted December 23, 2014 A Sincere thanks Frank for your help. I understand i posted my lan address, I was simply showing the syslog I was trying to retrieve in hopes there was a better more effective way to pull more than that short bit of data, which you kindly provided me. See attached in the zip syslog and smart report. An update while i do that, after a restart, the parity check got to 46% when I logged in this morning. Then everything froze and then on the monitor connected to the server I see MDCMB spindowns on drives and I cannot access the server. syslog.zip Link to comment
RobJ Posted December 23, 2014 Share Posted December 23, 2014 You have 2 Marvell-based 8 SATA port cards, and it appears that the first one crashed (at Dec 23 03:44:16). The first 4 attached drives on it simultaneously reported errors, so simultaneously that some error info was lost, and too simultaneously to be considered a drive problem (unless it was caused by a power spike). The exception handlers for all 4 seemed to think they had successfully recovered the drives, but the controller was clearly unstable, and the trouble quickly spread to the kernel itself, with multiple Call Traces and finally an OOPS, at which time the system was probably no longer fully accessible (or fully operational). I think your drives are probably fine. This occurred part way through the mover session. I don't see a direct connection, but the mover was already reporting numerous "No space left on device" errors while working on 'MoviesScrape'. The crash did not happen though until 3 minutes after it began moving 'TV Shows'. It may or may not be related to the mover, may just be related to high disk activity, or perhaps a power glitch, or perhaps an issue with that disk controller card. A little off-topic, but I noticed that it is reporting a BIOS from 2006! That seems rather surprising to me. The syslog does report a number of workarounds during the boot and setup. It's possible that there would be better virtualization support in a newer BIOS, perhaps other benefits too. However, I wouldn't attempt a BIOS upgrade without a safe way to restore what you currently have. Link to comment
ruthlessone Posted December 23, 2014 Author Share Posted December 23, 2014 Thanks for all that info... Are new Sata Cards needed because the server began this problem like 4-5 days ago. Should I take the server down? It keeps going and trying to do a parity check then stops at some point. Currently 59 mins in and has 1 day 3 hours to go. I am not sure there is a newer bios for the supermicro I have? I went to the website and downloaded the BIOS there and same date.. see attached photo. Link to comment
itimpi Posted December 23, 2014 Share Posted December 23, 2014 You might want to check that the SuperMicro cards are properly seated into the motherboard slots. I have seen similar symptoms when such cards are not firmly seated - and I guess vibration can cause momentary issues. Link to comment
JonathanM Posted December 23, 2014 Share Posted December 23, 2014 Thanks for all that info... Are new Sata Cards needed because the server began this problem like 4-5 days ago. Have you run an extended (~24 hour) memtest session since the trouble started? A RAM chip going bad can cause all sorts of weird havoc. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.