April 13, 201313 yr Hi, I have just spent 10 days rebuilding a disk due to horrible speeds at 2-3 MB/s.. now that it's done i thought everything would be ok, but it's still horribly slow. i can't get transfer speeds above 10 MB/s and parity build is just as slow. I see some disk related errors in the syslog but I'm not sure if one/more of my drives have an issue, or if it's config related. I have attached the syslog and hope some of you can see from the logfile what the issue could be. Thank you very much! Running 5.0 beta 7. syslog-2013-04-13.txt
April 13, 201313 yr Looks to me like your having problems with ata:7 and ata:8 on the sas card. You could try power down set sata cables. Move cables to on board sata ports to see if problem clears. A smart test is in order to see if those drives have issues. I would do that after one and 2 were complete. I could be wrong, but it looks to me like you are in translation mode where the parity drive is translating those two drives and is the reason for your slow performance.
April 13, 201313 yr Although I do not know if it's an actual mode it should have read "simulated", although thinking about what I wrote now doesn't seem correct as the disks in question are still part of the array.
April 14, 201313 yr Author Looks to me like your having problems with ata:7 and ata:8 on the sas card. You could try power down set sata cables. Move cables to on board sata ports to see if problem clears. A smart test is in order to see if those drives have issues. I would do that after one and 2 were complete. I could be wrong, but it looks to me like you are in translation mode where the parity drive is translating those two drives and is the reason for your slow performance. Thanks burtjr.. Is there any easy way to translate ata:7 and ata:8 into /dev/sd ? cause i have run the SMART check and i do get a lot of errors on one drive (the parity drive) but I'm not sure if what ata number it is.
April 15, 201313 yr ATA:7 = sde sas: DOING DISCOVERY on port 0, pid:841 Apr 13 10:12:58 tower kernel: drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone. Apr 13 10:12:58 tower kernel: sas: sas_ata_phy_reset: Found ATA device. Apr 13 10:12:58 tower kernel: ata7.00: ATA-8: WDC WD20EARS-00MVWB0, 51.0AB51, max UDMA/133 Apr 13 10:12:58 tower kernel: ata7.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 13 10:12:58 tower kernel: ata7.00: configured for UDMA/133 Apr 13 10:12:58 tower kernel: scsi 0:0:0:0: Direct-Access ATA WDC WD20EARS-00M 51.0 PQ: 0 ANSI: 5 Apr 13 10:12:58 tower kernel: sas: DONE DISCOVERY on port 0, pid:841, result:0 Apr 13 10:12:58 tower kernel: sd 0:0:0:0: [sde] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) Apr 13 10:12:58 tower kernel: sd 0:0:0:0: [sde] Write Protect is off Apr 13 10:12:58 tower kernel: sd 0:0:0:0: [sde] Mode Sense: 00 3a 00 00 Apr 13 10:12:58 tower kernel: sd 0:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata:8 = sdf sas: DOING DISCOVERY on port 1, pid:841 Apr 13 10:12:58 tower kernel: drivers/scsi/mvsas/mv_sas.c 1388:found dev[1:5] is gone. Apr 13 10:12:58 tower kernel: sas: sas_ata_phy_reset: Found ATA device. Apr 13 10:12:58 tower kernel: ata8.00: ATA-8: WDC WD20EARS-00MVWB0, 51.0AB51, max UDMA/133 Apr 13 10:12:58 tower kernel: ata8.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 13 10:12:58 tower kernel: ata8.00: configured for UDMA/133 Apr 13 10:12:58 tower kernel: scsi 0:0:1:0: Direct-Access ATA WDC WD20EARS-00M 51.0 PQ: 0 ANSI: 5 Apr 13 10:12:58 tower kernel: sas: DONE DISCOVERY on port 1, pid:841, result:0 Apr 13 10:12:58 tower kernel: sd 0:0:1:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) Apr 13 10:12:58 tower kernel: sd 0:0:1:0: [sdf] Write Protect is off Apr 13 10:12:58 tower kernel: sd 0:0:1:0: [sdf] Mode Sense: 00 3a 00 00 Apr 13 10:12:58 tower kernel: sd 0:0:1:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA I changed the font color to red on the parts that identify the ata and drive id. Did you move your troubled drives to a MB port? Post your smart report the more data you provide the better the support.
April 16, 201313 yr Author OK I have attached output from smartctl for all drives to this post.. as far as I can see there are some errors on /dev/sdb - which is the parity drive. Does these errors usually means something is physical wrong with the drive? And does the other drive looks ok? I have purchased a new 4 TB drive, as I plan to replace the parity drive to allow for larger drives in the future. Can I do this now or do I have a big risk losing data if anything is wrong with some of the other drives? They all appear green in the main page, so I'm a bit confused. Thank you very much for your assistance so far! smartctl.txt
April 16, 201313 yr Multiple drives are failing, they have reallocated sectors, current pending sectors, or both. You'll have to wait for one of the gurus to tell you what to do to get out of this situation, I can't help there.
April 17, 201313 yr I'd definitely wait for one of the Linux "gurus" to look this over -- there are clearly errors; but apparently not enough to cause any data loss (YET). If you have any un-backed-up data, NOW would be a really good time to start backing it up VERY slow reads CAN be caused by a failed drive, whereby the data has to be constructed by reading all of the other drives and parity -- that's the whole idea of a fault-tolerant system. But that doesn't appear to be the case here -- and it also wouldn't have any bearing on parity check speeds (which are also very slow). I'd think the culprit here is either one or more bad SATA cables; or a bad controller. But there may be better clues in the syslog, which needs to be interpreted by a Linux guru. Have patience ... I'm sure one of those guys will come along in a day or so ...
April 17, 201313 yr ... one other thought: How many drives in your system? If not too many, and you have another motherboard with enough SATA ports; you may want to try simply moving all of the drives to another motherboard and booting with your USB drive there. Since you're running v5, it should automatically assign the correct drives without any need to manually do so (which v4.7 would have required).
April 17, 201313 yr I also just noticed that you're running a pretty old Beta -- not even an RC release !! It's a bit risky to upgrade while you're having issues -- but one thing you COULD do is (a) copy the entire contents of your flash drive to a backup folder on your PC -- this will give you the option of easily restoring back to the current state if needed; (b) upgrade the flash to the latest RC (v12a) The instructions for upgrading from your current version are as follows (copied from the wiki): All previous 5.0-beta and 5.0-rc versions including 5.0-rc8a 1. Prepare the flash: either shutdown your server and plug the flash into your PC or Stop the array and perform the following actions referencing the flash share on your network: Copy the files bzimage and bzroot from the zip file to the root of your flash device, overwriting the same-named files already there. 2. Reboot your server. Once boot-up has completed, you should see "Stopped. Configuration valid." array status with all disks assigned correctly. 3. Click on each disk link on the Main page and examine the Partition format field. If you see "MBR: error", or "MBR: unknown" for any disk, do not Start the array; instead post your finding in the Forum announcement thread for this release. If everything looks ok, click Start to bring the array on-line. 4. Go to Utils/New Permissions and execute that utility to change file ownership and permission settings. This is necessary for proper operation of the 5.0 security model.
April 18, 201313 yr Author dirtysanchez: I have a total of 6 data drives. Would be nice to know which drives are failing. I don't mind changing several drives.. Just want to make sure I only change drives which there is something physical wrong with, does that show from the log? It's not end of the world if i lose the data (i have backed up all important documents and pics - so will "only" lose TV series and movies)... garycase: I don't have a spare motherboard to try with, but I'm quite sure there's nothing wrong with the rest of the hardware which has been running fine until i started getting those disk errors. I will try to update to latest RC (just waited forever for 5 final to come out so didn't bother upgrading before) Will post a status here about how it goes...
April 19, 201313 yr Don't go replacing drives just yet based on this, as I still believe an expert needs to weigh in here, because as garycase states it might be cables or controllers. That said here are the drives with what jumps out to me as issues. /dev/sdb Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA2933521 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 001 001 000 Old_age Always - 65530 /dev/sdc Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA2993907 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 4 /dev/sdf Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA3714126 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 196 196 000 Old_age Always - 1446 198 Offline_Uncorrectable 0x0030 200 199 000 Old_age Offline - 44 /dev/sdh Device Model: WDC WD15EADS-00R6B0 Serial Number: WD-WCAVY0708167 === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. 5 Reallocated_Sector_Ct 0x0033 041 041 140 Pre-fail Always FAILING_NOW 1265 197 Current_Pending_Sector 0x0032 196 195 000 Old_age Always - 1073 198 Offline_Uncorrectable 0x0030 191 191 000 Old_age Offline - 2296 The last drive is in really bad shape and should probably be replaced and/or backed up immediately. Again, an expert needs to weigh in here, as I'm no expert. I will say that IMO I don't think it's a cable as you don't have lots of UDMA CRC errors, which in my experience almost always point to the cable.
April 20, 201313 yr sdh needs to be replaced ASAP. sdc and sdf both need to be rebuilt. All three disks have unreadable sectors meaning rebuilding any of them is not possible. Copy data off of those disks ASAP. This is why the array is slow and copying the data off of these disks will be very time consuming.
Archived
This topic is now archived and is closed to further replies.