February 6, 200917 yr Hello everyone, I am quite new to the unRaid server community. I have just recently setup a unRaid server pro and have had no issues till today. I configured everything and started copying files to my server on the 1st day after getting it all going. I shut down the server for a few days as I was not going to be using it till I finished my HTPC anyway. Today I start it up to get some files from it and after running for about 30-45 min's its no longer accessible. I can't get to it from telnet, web interface, or windows explorer. It obviously crashed. I switch off the machine, let it boot back up and everything seems fine. It starts to do a parity check because of the unclean shutdown which occurred, no big deal. After about another 30 mins, dead again! I never finished the parity check. I pull the flash drive out and put it on my laptop. It's full. The syslog file was huge. So huge my laptop almost died trying to open it. I think it was 255mb total. I shut it off again, same thing. Shut it back off. This time when I boot it back up, it says, "Initial Configuration", like it reset. Something is/was wrong and it is writing thousands of entries in the log file. It starts eating up services, the server runs out of memory... you know how it goes. I need help figuring out what is wrong with my server. What else do you guys need from me in order to take a look at what could be the problem, besides the syslog which is too big to attach? Attached is the NEW syslog from the 3rd reboot. Thank you in advance for any help!!
February 6, 200917 yr Author Couldn't attach the file. If anyone is interested in helping. I can email both of them.
February 6, 200917 yr Use pastebin.com to post segemtns that look relevant. Choose permanent storage when you post it there.
February 6, 200917 yr Author Feb 5 03:20:42 Tower kernel: [ 198.164064] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:20:42 Tower kernel: [ 198.164072] ata1.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:20:42 Tower kernel: [ 198.164074] res 51/04:01:0b:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:20:42 Tower kernel: [ 198.164249] hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error } Feb 5 03:20:42 Tower kernel: [ 198.164252] hdb: drive_cmd: error=0x04 { DriveStatusError } Feb 5 03:20:42 Tower kernel: [ 198.164255] ide: failed opcode was: 0xb0 Feb 5 03:20:42 Tower kernel: [ 198.164349] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:20:42 Tower kernel: [ 198.164354] ata4.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:20:42 Tower kernel: [ 198.164356] res 51/04:01:0b:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:20:42 Tower emhttp[1381]: get_temperature: ioctl (smart): Input/output error Feb 5 03:20:42 Tower kernel: [ 198.167692] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:20:42 Tower kernel: [ 198.167697] ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:20:42 Tower kernel: [ 198.167698] res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:20:42 Tower kernel: [ 198.168585] hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error } Feb 5 03:20:42 Tower kernel: [ 198.168588] hda: drive_cmd: error=0x04 { DriveStatusError } Feb 5 03:20:42 Tower kernel: [ 198.168590] ide: failed opcode was: 0xb0 Feb 5 03:20:42 Tower emhttp[1384]: get_temperature: ioctl (smart): Input/output error A lot of these... Feb 5 03:22:51 Tower emhttp[1451]: get_temperature: ioctl (smart): Input/output error Feb 5 03:22:51 Tower kernel: [ 327.892027] ata4.00: configured for UDMA/133 Feb 5 03:22:51 Tower kernel: [ 327.892037] ata4: EH complete Feb 5 03:22:51 Tower kernel: [ 327.892074] ata1.00: configured for UDMA/133 Feb 5 03:22:51 Tower kernel: [ 327.892079] ata1: EH complete Feb 5 03:22:51 Tower kernel: [ 327.896515] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:22:51 Tower kernel: [ 327.896520] ata4.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:22:51 Tower kernel: [ 327.896522] res 51/04:01:0b:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:22:51 Tower kernel: [ 327.896656] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:22:51 Tower kernel: [ 327.896661] ata1.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:22:51 Tower kernel: [ 327.896662] res 51/04:01:0b:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:22:51 Tower kernel: [ 327.913090] ata3.00: configured for UDMA/133 Feb 5 03:22:51 Tower kernel: [ 327.913095] ata3: EH complete Feb 5 03:22:51 Tower kernel: [ 327.951899] ata4.00: configured for UDMA/133 Feb 5 03:22:51 Tower kernel: [ 327.951902] ata4: EH complete Feb 5 03:22:51 Tower kernel: [ 327.951946] ata1.00: configured for UDMA/133 Feb 5 03:22:51 Tower kernel: [ 327.951949] ata1: EH complete Feb 5 03:22:52 Tower kernel: [ 327.956383] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:22:52 Tower kernel: [ 327.956388] ata4.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:22:52 Tower kernel: [ 327.956389] res 51/04:01:0b:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:22:52 Tower kernel: [ 327.956518] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:22:52 Tower kernel: [ 327.956523] ata1.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:22:52 Tower kernel: [ 327.956524] res 51/04:01:0b:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:22:52 Tower kernel: [ 327.973314] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:22:52 Tower kernel: [ 327.973319] ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:22:52 Tower kernel: [ 327.973321] res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:22:52 Tower kernel: [ 328.011881] ata4.00: configured for UDMA/133 Feb 5 03:22:52 Tower kernel: [ 328.011890] ata4: EH complete Feb 5 03:22:52 Tower kernel: [ 328.011897] ata1.00: configured for UDMA/133 Feb 5 03:22:52 Tower kernel: [ 328.011901] ata1: EH complete Feb 5 03:22:52 Tower emhttp[1453]: get_temperature: ioctl (smart): Input/output error Feb 5 03:22:52 Tower emhttp[1452]: get_temperature: ioctl (smart): Input/output error Feb 5 03:22:52 Tower kernel: [ 328.024204] sd 4:0:0:0: [sdc] 240121728 512-byte hardware sectors (122942 MB) Feb 5 03:22:52 Tower kernel: [ 328.027121] sd 4:0:0:0: [sdc] Write Protect is off Feb 5 03:22:52 Tower kernel: [ 328.027124] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 Feb 5 03:22:52 Tower kernel: [ 328.031256] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 5 03:22:52 Tower kernel: [ 328.032861] ata3.00: configured for UDMA/133 Feb 5 03:22:52 Tower kernel: [ 328.032871] ata3: EH complete Feb 5 03:22:52 Tower kernel: [ 328.035720] sd 1:0:0:0: [sda] 240121728 512-byte hardware sectors (122942 MB) Feb 5 03:22:52 Tower kernel: [ 328.037564] sd 1:0:0:0: [sda] Write Protect is off Feb 5 03:22:52 Tower kernel: [ 328.037566] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00 Feb 5 03:22:52 Tower kernel: [ 328.040908] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 5 03:22:52 Tower kernel: [ 328.043175] sd 4:0:0:0: [sdc] 240121728 512-byte hardware sectors (122942 MB) Feb 5 03:22:52 Tower kernel: [ 328.044864] sd 4:0:0:0: [sdc] Write Protect is off Feb 5 03:22:52 Tower kernel: [ 328.044867] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 Feb 5 03:22:52 Tower kernel: [ 328.044907] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 5 03:22:52 Tower kernel: [ 328.098113] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:22:52 Tower kernel: [ 328.098119] ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:22:52 Tower kernel: [ 328.098120] res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:22:52 Tower kernel: [ 328.152655] ata3.00: configured for UDMA/133 Feb 5 03:22:52 Tower kernel: [ 328.152658] ata3: EH complete Feb 5 03:22:52 Tower kernel: [ 328.214631] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:22:52 Tower kernel: [ 328.214636] ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:22:52 Tower kernel: [ 328.214638] res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:22:52 Tower kernel: [ 328.272521] ata3.00: configured for UDMA/133 Feb 5 03:22:52 Tower kernel: [ 328.272524] ata3: EH complete Feb 5 03:22:52 Tower kernel: [ 328.339452] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:22:52 Tower kernel: [ 328.339457] ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:22:52 Tower kernel: [ 328.339459] res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:22:52 Tower kernel: [ 328.392383] ata3.00: configured for UDMA/133 Feb 5 03:22:52 Tower kernel: [ 328.392388] ata3: EH complete
February 6, 200917 yr Author Here is where I see it start to crash... Feb 5 03:37:14 Tower emhttp[1494]: get_temperature: ioctl (smart): Input/output error Feb 5 03:37:14 Tower kernel: [ 1189.743753] sd 3:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB) Feb 5 03:37:14 Tower kernel: [ 1189.749252] sd 3:0:0:0: [sdb] Write Protect is off Feb 5 03:37:14 Tower kernel: [ 1189.749258] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 Feb 5 03:37:14 Tower kernel: [ 1189.766142] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 5 03:37:14 Tower kernel: [ 1189.774548] sd 3:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB) Feb 5 03:37:14 Tower kernel: [ 1189.777913] sd 3:0:0:0: [sdb] Write Protect is off Feb 5 03:37:14 Tower kernel: [ 1189.777918] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 Feb 5 03:37:14 Tower kernel: [ 1189.785070] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 5 03:39:45 Tower kernel: [ 1340.260741] md0: parity incorrect: 48521144 Feb 5 03:39:45 Tower kernel: [ 1340.260772] md0: parity incorrect: 48521152 Feb 5 03:39:45 Tower kernel: [ 1340.260802] md0: parity incorrect: 48521160 Feb 5 03:39:45 Tower kernel: [ 1340.260830] md0: parity incorrect: 48521168 Feb 5 03:39:45 Tower kernel: [ 1340.260855] md0: parity incorrect: 48521176 Feb 5 03:39:45 Tower kernel: [ 1340.260882] md0: parity incorrect: 48521184 Feb 5 03:39:45 Tower kernel: [ 1340.260908] md0: parity incorrect: 48521192 Feb 5 03:39:45 Tower kernel: [ 1340.260934] md0: parity incorrect: 48521200 Feb 5 03:39:45 Tower kernel: [ 1340.260961] md0: parity incorrect: 48521208 Feb 5 03:39:45 Tower kernel: [ 1340.260985] md0: parity incorrect: 48521216 Feb 5 03:39:45 Tower kernel: [ 1340.261012] md0: parity incorrect: 48521224 Feb 5 03:39:45 Tower kernel: [ 1340.261038] md0: parity incorrect: 48521232 Feb 5 03:39:45 Tower kernel: [ 1340.261066] md0: parity incorrect: 48521240 Feb 5 03:39:45 Tower kernel: [ 1340.261157] md0: parity incorrect: 48521248 Feb 5 03:39:45 Tower kernel: [ 1340.261182] md0: parity incorrect: 48521256 Feb 5 03:39:45 Tower kernel: [ 1340.261207] md0: parity incorrect: 48521264 Feb 5 03:39:45 Tower kernel: [ 1340.261231] md0: parity incorrect: 48521272 Feb 5 03:39:45 Tower kernel: [ 1340.261257] md0: parity incorrect: 48521280 Feb 5 03:39:45 Tower kernel: [ 1340.261278] md0: parity incorrect: 48521288 Feb 5 03:39:45 Tower kernel: [ 1340.261303] md0: parity incorrect: 48521296 Feb 5 03:39:45 Tower kernel: [ 1340.261326] md0: parity incorrect: 48521304 Feb 5 03:39:45 Tower kernel: [ 1340.261349] md0: parity incorrect: 48521312 Feb 5 03:39:45 Tower kernel: [ 1340.276728] md0: parity incorrect: 48521320 Feb 5 03:39:45 Tower kernel: [ 1340.276751] md0: parity incorrect: 48521328 Feb 5 03:39:45 Tower kernel: [ 1340.276770] md0: parity incorrect: 48521336 Feb 5 03:39:45 Tower kernel: [ 1340.276790] md0: parity incorrect: 48521344 Feb 5 03:39:45 Tower kernel: [ 1340.276811] md0: parity incorrect: 48521352 Feb 5 03:39:45 Tower kernel: [ 1340.276831] md0: parity incorrect: 48521360 Feb 5 03:39:45 Tower kernel: [ 1340.276852] md0: parity incorrect: 48521368 Feb 5 03:39:45 Tower kernel: [ 1340.276872] md0: parity incorrect: 48521376 Feb 5 03:39:45 Tower kernel: [ 1340.276893] md0: parity incorrect: 48521384 Feb 5 03:39:45 Tower kernel: [ 1340.276917] md0: parity incorrect: 48521392 Feb 5 03:39:45 Tower kernel: [ 1340.276935] md0: parity incorrect: 48521400 Feb 5 03:39:45 Tower kernel: [ 1340.276955] md0: parity incorrect: 48521408 Feb 5 03:39:45 Tower kernel: [ 1340.276977] md0: parity incorrect: 48521416 Feb 5 03:39:45 Tower kernel: [ 1340.276999] md0: parity incorrect: 48521424 Feb 5 03:39:45 Tower kernel: [ 1340.277021] md0: parity incorrect: 48521432 Feb 5 03:39:45 Tower kernel: [ 1340.277041] md0: parity incorrect: 48521440 Feb 5 03:39:45 Tower kernel: [ 1340.277063] md0: parity incorrect: 48521448 Feb 5 03:39:45 Tower kernel: [ 1340.277085] md0: parity incorrect: 48521456 Feb 5 03:39:45 Tower kernel: [ 1340.277107] md0: parity incorrect: 48521464 Feb 5 03:39:45 Tower kernel: [ 1340.277130] md0: parity incorrect: 48521472 Feb 5 03:39:45 Tower kernel: [ 1340.277154] md0: parity incorrect: 48521480 Feb 5 03:39:45 Tower kernel: [ 1340.277177] md0: parity incorrect: 48521488 Feb 5 03:39:45 Tower kernel: [ 1340.277198] md0: parity incorrect: 48521496 Feb 5 03:39:45 Tower kernel: [ 1340.277221] md0: parity incorrect: 48521504 Feb 5 03:39:45 Tower kernel: [ 1340.277242] md0: parity incorrect: 48521512 Feb 5 03:39:45 Tower kernel: [ 1340.277264] md0: parity incorrect: 48521520 Feb 5 03:39:45 Tower kernel: [ 1340.277288] md0: parity incorrect: 48521528 Feb 5 03:39:45 Tower kernel: [ 1340.277311] md0: parity incorrect: 48521536 Feb 5 03:39:45 Tower kernel: [ 1340.277332] md0: parity incorrect: 48521544 Feb 5 03:39:45 Tower kernel: [ 1340.277352] md0: parity incorrect: 48521552 Feb 5 03:39:45 Tower kernel: [ 1340.277373] md0: parity incorrect: 48521560 Feb 5 03:39:45 Tower kernel: [ 1340.277393] md0: parity incorrect: 48521568 Feb 5 03:39:45 Tower kernel: [ 1340.278253] md0: parity incorrect: 48521576 Feb 5 03:39:45 Tower kernel: [ 1340.278273] md0: parity incorrect: 48521584 Feb 5 03:39:45 Tower kernel: [ 1340.278293] md0: parity incorrect: 48521592 Feb 5 03:39:45 Tower kernel: [ 1340.278312] md0: parity incorrect: 48521600 Feb 5 03:39:45 Tower kernel: [ 1340.278331] md0: parity incorrect: 48521608 Feb 5 03:39:45 Tower kernel: [ 1340.278353] md0: parity incorrect: 48521616 Feb 5 03:39:45 Tower kernel: [ 1340.278370] md0: parity incorrect: 48521624 Feb 5 03:39:45 Tower kernel: [ 1340.278387] md0: parity incorrect: 48521632 Feb 5 03:39:45 Tower kernel: [ 1340.278405] md0: parity incorrect: 48521640 Feb 5 03:39:45 Tower kernel: [ 1340.278425] md0: parity incorrect: 48521648 Feb 5 03:39:45 Tower kernel: [ 1340.278444] md0: parity incorrect: 48521656 Feb 5 03:39:45 Tower kernel: [ 1340.278463] md0: parity incorrect: 48521664 Feb 5 03:39:45 Tower kernel: [ 1340.278480] md0: parity incorrect: 48521672 Feb 5 03:39:45 Tower kernel: [ 1340.278499] md0: parity incorrect: 48521680 Feb 5 03:39:45 Tower kernel: [ 1340.278519] md0: parity incorrect: 48521688 Feb 5 03:39:45 Tower kernel: [ 1340.278538] md0: parity incorrect: 48521696 Feb 5 03:39:45 Tower kernel: [ 1340.278560] md0: parity incorrect: 48521704
February 6, 200917 yr Author Last bit... Feb 5 03:41:51 Tower emhttp[1345]: driver cmd: nocheck Feb 5 03:41:51 Tower kernel: [ 1466.224768] mdcmd (13): nocheck Feb 5 03:41:51 Tower kernel: [ 1466.239876] md: md_do_sync() got signal, exit... Feb 5 03:41:52 Tower kernel: [ 1466.937415] md: recovery thread sync completion status: -4 Feb 5 03:41:52 Tower kernel: [ 1466.979872] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:41:52 Tower kernel: [ 1466.979881] ata1.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:41:52 Tower kernel: [ 1466.979883] res 51/04:01:0b:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:41:52 Tower kernel: [ 1466.980023] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 5 03:41:52 Tower kernel: [ 1466.980028] ata4.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in Feb 5 03:41:52 Tower kernel: [ 1466.980030] res 51/04:01:0b:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) Feb 5 03:41:52 Tower kernel: [ 1467.039780] ata1.00: configured for UDMA/133 Feb 5 03:41:52 Tower kernel: [ 1467.039786] ata1: EH complete Feb 5 03:41:52 Tower kernel: [ 1467.039840] ata4.00: configured for UDMA/133 Feb 5 03:41:52 Tower kernel: [ 1467.039843] ata4: EH complete Feb 5 03:41:52 Tower kernel: [ 1467.042774] hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error } Feb 5 03:41:52 Tower kernel: [ 1467.042778] hdb: drive_cmd: error=0x04 { DriveStatusError } Feb 5 03:41:52 Tower kernel: [ 1467.042781] ide: failed opcode was: 0xb0
February 6, 200917 yr Clearly you have a problem. The first thing I'd recommend you do is run a memory test. (When you boot, one of the options from the unRAID boot menu is a memory test). Run it for at least 2 cycles (many run these overnight) to make sure that your memory is working properly. The second thing you should check is the drive cabling. If you are using docks or backplanes, I consider that part of cabling. If a cable (including internal cabling in the dock/backplane) is not making solid connection, it can wreck havoc. This is more common than you might think. In fact I'd say it is the #1 most common cause of problems. They can also be hard to track down. I'm not sure how your syslog would up on your USB stick unless you did something to put it there. Please post details of your configuration, including motherboard, controllers, drives, memory, version of unRAID, add-ons installed, any linux mods you might have made, etc. so we have a complete picture of your configuration. I would not use the array in its current state.
February 6, 200917 yr The syslog growing too much is what probably is shutting down the server. You definitely have drive problems, on at least 5 drives. That seems too many, so there may be something fundamentally wrong with the motherboard, and/or multiple disk controllers. Turn off any over-clocking, check for badly configured memory, CPU, bus, or other BIOS settings, make sure the CPU, motherboard chipsets, and drives are not too hot... Please try to extract the very first 100K of the syslog, and use a service like http://pastebin.com/ to make it available. 150K or 200K would even be better, but the first 100 kilobytes are the most important.
February 7, 200917 yr Author Update: I turned on S.M.A.R.T, which was disabled in the bios and now it seems to be running fine. After I enabled that, the parity check finished and when I save the syslog to disk1, it now is very small and has no errors. I looked like unRaid was trying to get the temp from the drives, and kept logging the errors, so that is what led me to look at why it couldn't get the temperatures, which led me to change the bios settings to enable smart. This seemed to get rid of the errors that you see in that syslog i posted. If that is what was causing the end result of it crashing, I dont know. Anyone ever had problems like this when SMART wasnt enabled?
February 7, 200917 yr get_temperature: ioctl (smart): Input/output error These *are* the normal indication of SMART being disabled, but they are not normally accompanied by all of the exceptions and drive errors. They certainly don't explode the syslog, even over 100K. I didn't bother mention anything about them, because they seemed like a very minor problem at the time (plus I forgot!), compared to the real problems with the drive errors on so many drives. I'm happy it is working better for you, and that was a good thing to find and fix, but I have to say I'll be very surprised if that is the only problem. I have never seen a case before, where that single change fixed major drive problems. If possible, could you still make that syslog available? And I would continue testing, try another parity check, copy a large number of files, and grab another syslog after that.
February 7, 200917 yr Author Thanks again for all the help guys. I am currently doing a memtest. After that, I am going to boot up, do a parity check, copy a few gigs of data over to it and grab another syslog. Yesterday, I moved a 3.5 gig iso over with no problems. I have one question about syslog. Every time you restart the server, the current syslog clears, correct? A syslog, is from boot up till when you grab the syslog? I have so many syslogs now, I hope i can find the correct one!! Thanks! I would upload the syslog, but the uploader is full...
February 7, 200917 yr Thanks again for all the help guys. I am currently doing a memtest. After that, I am going to boot up, do a parity check, copy a few gigs of data over to it and grab another syslog. Yesterday, I moved a 3.5 gig iso over with no problems. I have one question about syslog. Every time you restart the server, the current syslog clears, correct? A syslog, is from boot up till when you grab the syslog? Correct, each reboot creates a brand new syslog. If you use the powerdown script to shutdown instead of the Shutdown button on the web interface, it will automatically save a copy of the syslog to your flash on every shutdown. I have so many syslogs now, I hope i can find the correct one!! Thanks! I would upload the syslog, but the uploader is full... Look at pastebin.com as a way to easily upload and share your syslog and/or smart reports until Tom gets more space for uploaded files. For image uploads I recomend ImageShack, and for other kinds of uploads (e.g., .zip files) I recommend Mediafire.
Archived
This topic is now archived and is closed to further replies.