October 20, 200817 yr During a Parity-Check I've been getting a large number of errors. In checking the forum it seemed that I should let it finish however I now have a massive number of errors - 248,846 and it estimates it will take something over 500,000 minutes to complete at 14 KB/sec LOL! Zero Sync errors... The files on the data drives are intact. I'm running an older version - unRAID Server Basic 4.3-beta6 (Yeah yeah I know however it has worked wonderfully for me and I'm Linux challenged...) I did install and run the smartctl utility but the results are basically Greek to me - I'm not even sure if I should have run it while it was doing the parity check... I'm not sure what my next steps should be... I have Spinrite so I was thinking of running that on the parity drive, and I have some spare cables so I'll replace that, maybe a memory test is in order, force it to just redo the parity??? I've attached the smart.txt and syslog.txt I'm definitely lost so any suggestions would be appreciated!
October 20, 200817 yr During a Parity-Check I've been getting a large number of errors. In checking the forum it seemed that I should let it finish however I now have a massive number of errors - 248,846 and it estimates it will take something over 500,000 minutes to complete at 14 KB/sec LOL! Zero Sync errors... The files on the data drives are intact. I'm running an older version - unRAID Server Basic 4.3-beta6 (Yeah yeah I know however it has worked wonderfully for me and I'm Linux challenged...) I did install and run the smartctl utility but the results are basically Greek to me - I'm not even sure if I should have run it while it was doing the parity check... I'm not sure what my next steps should be... I have Spinrite so I was thinking of running that on the parity drive, and I have some spare cables so I'll replace that, maybe a memory test is in order, force it to just redo the parity??? I've attached the smart.txt and syslog.txt I'm definitely lost so any suggestions would be appreciated! Parity errors do not necessarily indicate the parity drive itself is at fault. Please post your syslog. (you did not attach it) It will have the clues needed to let folks assist you. Performing the "smart" status test is no problem... I did not see anything specific in it. Unlikely that spinrite will find anything wrong with the parity drive... Something has caused one of your drives to reduce its throughput... We have no idea which one until we can look at the syslog. Joe L.
October 20, 200817 yr Author Hi Joe, Finally got the log attached - didn't realize I couldn't use .rar - sorry for the delay.
October 20, 200817 yr It appears as if drive /dev/hdb is the one with the errors. Nov 21 07:16:37 unRAID kernel: [ 74.706287] hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error } Nov 21 07:16:37 unRAID kernel: [ 74.706295] hdb: drive_cmd: error=0x04 { DriveStatusError } Nov 21 07:16:37 unRAID kernel: [ 74.706298] ide: failed opcode was: 0xb0 Nov 21 07:16:37 unRAID emhttp[1226]: get_temperature: ioctl (smart): Input/output error Nov 21 07:16:37 unRAID emhttp[1193]: shcmd (9): killall -w smbd nmbd Nov 21 07:16:38 unRAID emhttp[1193]: shcmd (10): /usr/sbin/nmbd -D Nov 21 07:16:38 unRAID emhttp[1193]: shcmd (11): /usr/sbin/smbd -D Nov 21 07:16:38 unRAID emhttp[1193]: driver cmd: start STOPPED Nov 21 07:16:38 unRAID kernel: [ 75.818474] mdcmd (3): start Nov 21 07:16:38 unRAID kernel: [ 75.820624] unraid: allocated 7030kB Nov 21 07:16:38 unRAID kernel: [ 75.820830] md1: running, size: 245117344 blocks Nov 21 07:16:38 unRAID kernel: [ 75.820846] md2: running, size: 293036152 blocks Nov 21 07:16:38 unRAID kernel: [ 75.850060] hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error } Nov 21 07:16:38 unRAID kernel: [ 75.850067] hdb: drive_cmd: error=0x04 { DriveStatusError } Nov 21 07:16:38 unRAID kernel: [ 75.850070] ide: failed opcode was: 0xb0 Oh yes, you are running a beta version of unRAID with lots of debugging messages enabled. They will crash your machine once they fill the syslog as they will use up all your memory. There was a critical fix that was fixed in 4.3.1. It is described here: [pre] Patch release 4.3.1 is available. This fixes yet another bug with "clearing new disks". unRAID Server 4.3.1 Release Notes ================================= IMPORTANT: this release includes a critical bug fix. If you are running 4.3-beta4 or higher, and have added a New disk to an existing array, parity may be incorrect, and you must run a Parity-Check operation. Note: If Parity-Check detects any parity mismatches, they will be corrected by the parity check process. [/pre] Basically, when adding a new disk to the array, it was not zeroed, therefore, you will get exactly the parity errors you are seeing... If you let the parity check finish, the next run should be OK. You do have a hardware error causing the slow parity check rate, that is probably not related to the parity errors. Please consider upgrading to the current version of unRAID. The process only involves replacing two files on your flash drive, bzroot, and bzimage and rebooting. Everything else will stay the same.
October 20, 200817 yr Author Hi Joe, thanks so much for the lightning fast reply! The trouble is that I was going to let the parity check finish (and it is still running) however it is showing something well over 500,000 minutes - almost a year LOL! BTW: What do you use to open the log files - when I use wordpad it doesn't recognize linefeeds or carriage returns or something and it all comes out very hard to decipher or do you manually clean it up? Thanks again for your help!
October 20, 200817 yr Hi Joe, thanks so much for the lightning fast reply! The trouble is that I was going to let the parity check finish (and it is still running) however it is showing something well over 500,000 minutes - almost a year LOL! BTW: What do you use to open the log files - when I use wordpad it doesn't recognize linefeeds or carriage returns or something and it all comes out very hard to decipher or do you manually clean it up? Thanks again for your help! I know... you will probably not want to let it finish... As I said, it looks like a hardware issue with drive /dev/hdb. It is the one with the errors. You might try a "reboot" to see if the disk controller will come back up to normal speed. I use "vim" an improved version of the "vi" editor for windows to view syslog files. With all the debugging messages in your syslog, it is difficult to see anything.
October 20, 200817 yr Author OK I just read that the first several gigs can take longer and then things may speed up. I'm in no real hurry so I'll give it until tomorrow and see if things speed up any - it has been going for 10 - 12 hours so far though. Any suggestions on what to do about drive /dev/hdb? Try Spinrite on that or just replace the drive? Thanks for the tip on "vim", I'll give that a try. Oh, and should I clear (or delete) the syslog file now, or when I upgrade, to get rid of all the junk in it? Sorry for all the questions, your help is greatly appreciated!
October 20, 200817 yr What kind of cable are you using with hdb ? If it's an IDE drive do you have it set for the appropriate definition, i.e cableselect, MASTER, SLAVE. can you do an hdparm -Iia /dev/hda
October 20, 200817 yr Author What kind of cable are you using with hdb ? If it's an IDE drive do you have it set for the appropriate definition, i.e cableselect, MASTER, SLAVE. can you do an hdparm -Iia /dev/hda Hi WeeboTech, Yes it is an IDE drive and it is set for cableselect - (as near as I can tell without stopping the parity check and pulling the drive.) Linux 2.6.24.4-unRAID. root@unRAID:~# hdparm -Ia /dev/hda /dev/hda: No such file or directory root@unRAID:~# hdparm -Ia /dev/hdb /dev/hdb: readahead = 256 (on) ATA device, with non-removable media Model Number: Maxtor 6Y250P0 Serial Number: Y63Z16GE Firmware Revision: YAR41BW0 Standards: Used: ATA/ATAPI-7 T13 1532D revision 0 Supported: 7 6 5 4 Configuration: Logical max current cylinders 16383 65535 heads 16 1 sectors/track 63 63 -- CHS current addressable sectors: 4128705 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 490234752 device size with M = 1024*1024: 239372 MBytes device size with M = 1000*1000: 251000 MBytes (251 GB) Capabilities: LBA, IORDY(can be disabled) Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Advanced power management level: unknown setting (0x0000) Recommended acoustic management value: 192, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_VERIFY command * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Advanced Power Management feature set SET_MAX security extension * Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count not supported: enhanced erase HW reset results: CBLID- above Vih Device num = 1 determined by CSEL Checksum: correct root@unRAID:~#
October 20, 200817 yr I would STOP the parity check now! When it is reporting that many errors, it is almost certain that it is not calculating with one of the drives, and therefore is now 'correcting' the parity WITHOUT that drive, essentially ruining your parity. We need to see the current syslog to tell you the best next step, but even if we correct an issue, you will need to re-run the parity check, and expect the same huge number of parity errors, to undo the damage being done now! Next problem, you unfortunately included an old syslog, dated from Nov 21, 2007, when you had 3 drives (250GB and 2 300GB's), and were running v4.2.1 with a Basic license. It does not include the 500GB reported in that SMART report. We really need to see a current syslog, before you reboot. You will probably need to zip it, before attaching.
October 20, 200817 yr I did not pick up on the fact the syslog did not include the drive in the SMART report. Good catch. He is running one of the versions of unRAID that did not properly clear a drive when adding one to an array. The errors might be the old (bad)parity being corrected. Of course, I'd feel much better if his disk was performing at a normal speed. I thought the parity is probably bad now, and the errors were actually corrections, but I did not have a current syslog... As I said, good catch. Joe L.
October 20, 200817 yr When you upload a new syslog I'll take another look. The drive itself supports higher UDMA modes so it is not being initialized correctly. What kind of cable are you using? Also, It is being reported as secondary, Perhaps you should switch to the other header. I've had issues in the past when a drive is on the secondary port with the primary port dangling. Also check your bios to see if there are settings enabled to prevent it from going into higher DMA modes.
October 21, 200817 yr Author OK - Sorry guys, it has been a while since I had to use a syslog and I forget you had to "generate" it! It was too big to post, even zipped so hopefully this link will work. http://www.mediafire.com/?sharekey=8bb61a8710ebc743d2db6fb9a8902bda
October 21, 200817 yr Well, they were right, looks like one, probably two bad cables. The IDE drive hdb reports it is not passing the 80-wire test: Oct 19 13:43:25 unRAID kernel: hdb: Maxtor 6Y250P0, ATA DISK drive Oct 19 13:43:25 unRAID kernel: hdb: host max PIO4 wanted PIO255(auto-tune) selected PIO4 Oct 19 13:43:25 unRAID kernel: hdb: host side 80-wire cable detection failed, limiting max speed to UDMA33 Either that is a very bad IDE cable, or a 40-wire cable, which is very poor for drives, OK for your CD or DVD drives. Replace that cable! This Maxtor drive did not report any other errors though, just very slow. The Maxtor 300GB SATA drive sdb is fine, no errors at all. The drive causing all the trouble is your parity drive sda, a fairly new Maxtor/Seagate 500GB. It would reset with a time delay over and over, which of course slowed everything way way down. The problem does not look like a physical drive problem, but most likely a bad cable, or possibly a bad controller. It was quickly slowed to UDMA/33:PIO4, was never dropped lower, but with the constant waiting on it, and then resetting it over and over, was hardly readable or writable. I don't know why it was not disabled. It was apparently in an odd state, that made the system think it just needed one more reset, and then it would work. After doing that thousands of times, you would think it would learn! One other possibility, this is a newer Seagate-made SATA II drive, that may have come with a SATA I jumper on the back for compatibility with some older SATA ports. Is it possible that you removed that jumper? Since it is connected to an older SATA I port, try re-installing the jumper (if there is one). This may be a case where it is required, for backward compatibility. Otherwise, replace its SATA cable, and then try connecting this drive to a different SATA port, onboard if possible.
October 21, 200817 yr Author Hi RobJ, I had already replaced the IDE cable and started the parity check when I saw your response so I didn't change the SATA cables at least not yet. The parity check went fairly fast but I did get a bunch of Sync errors. I've read here that they can be caused by cables so I'll probably replace. I'll also do the upgrade as JoeL suggested earlier - sounds pretty painless! It was an 80 conductor cable but it had been in the system quite a while and was one of the round ones which I've also heard can make for problems even though they do seem to give better air flow. I thought I was over touchy cables after SCSI's years ago. LOL! I ended up with about 1,534 parity errors and 37,000+ sync errors so hopefully replacing the cable with address that. It seemed to be running fairly quickly so I'm not sure if there is still something wrong with sda and I don't remember removing any jumper but I'll look in the morning to be sure. I've attached the new syslog and screenshot. Thanks to all you guys I think I'm getting closer to having my system back!!! Ooops file was still a touch too big so here is the url. http://www.mediafire.com/?sharekey=8bb61a8710ebc743ab1eab3e9fa335caaced129f9c795011
October 21, 200817 yr A sync error and a parity error are usually synonymous terms. If you are getting errors on the physical parity drive, that is different. That means that the drive is returning read errors and unRaid is tryong to force sector remaps to occur. It could mean a bad drive, bad cable, bad backplane, or bad controler. You should run snartctl on the drive to see if there are reallocated sectors. That would point to a bad drive. A sync error just means that your parity calculation was wrong. UnRaid will correct this by updating the parity drive. Maybe the message should read "sync errors corrected". So if you have a specific drive problem that caused say 100 sync errors and then you fix the problem, you should expect 100 sync errors on the next parity check where it puts things right again. Make sense?
October 21, 200817 yr so I didn't change the SATA cables at least not yet. I can't tell you just how much trouble a questionable cable is. If there is any question about a drive and/or it's cable. swap the cable out or at least reseat everything. In the past few weeks I've had drives with trouble and in the end it turned out to be the cable. That's 3 cables in the past 3 weeks I've also been growing pretty rapidly so that's why it's cropping up). Even a simple flash drive I bought the other day was working fine, but when I started to stress communication with it, it kept going offline with errors and bus resets. The more data you put on the cable, the more it is inclined to show errors if it is questionable. The flash drive is so dam fast it was hit and miss until i got so frustrated I swapped the cable out and it started working correctly.
October 21, 200817 yr Author Hi everyone, That does make more sense now that you explained the terms bjp999. I've replaced the SATA cable on the parity drive, still need to get one for the other data disk. I've had some problems with the molex power connectors and have some new ones on the shelf so I think it is time to recable the system! I ran the smartclt utility on the three drives and have attached the results. Next I'm going to upgrade to 4.3.3 then rerun the parity check - does that sound reasonable? Hopefully I'm getting this straightened out with everyone's help!
Archived
This topic is now archived and is closed to further replies.