dlmh Posted November 8, 2009 Share Posted November 8, 2009 I have 4x Samsung F3 1.5 TB disk in my array with parity drive (same model) and a Samsung F3 500GB cache drive. Lately, when copying files to shares located on or directly to disk4 show sudden drops in throughput and sometimes even halt completely. And, when streaming movies located on this disk with XBMC sometimes results in playback failure and long buffer times. When I browse through the log I find these entries: Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706392/4, count: 1 Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706400/4, count: 1 Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706408/4, count: 1 Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706416/4, count: 1 Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706424/4, count: 1 Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706432/4, count: 1 Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706440/4, count: 1 Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706448/4, count: 1 Nov 8 19:17:28 Prometheus kernel: md: disk4 read error Nov 8 19:17:28 Prometheus kernel: handle_stripe read error: 2080706456/4, count: 1 and sometimes these occur too: Nov 8 19:17:30 Prometheus shfs: duplicate object: /mnt/disk4/.AppleDouble/.Parent Nov 8 19:17:31 Prometheus kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Nov 8 19:17:31 Prometheus kernel: ata4.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable Nov 8 19:17:31 Prometheus kernel: ata4.00: cmd 25/00:00:df:0f:05/00:04:7c:00:00/e0 tag 0 dma 524288 in Nov 8 19:17:31 Prometheus kernel: res 51/40:00:12:13:05/40:00:7c:00:00/e0 Emask 0x9 (media error) Nov 8 19:17:31 Prometheus kernel: ata4.00: status: { DRDY ERR } Nov 8 19:17:31 Prometheus kernel: ata4.00: error: { UNC } Nov 8 19:17:31 Prometheus kernel: ata4: hard resetting link Nov 8 19:17:31 Prometheus kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Nov 8 19:17:31 Prometheus kernel: ata4.00: configured for UDMA/133 Nov 8 19:17:31 Prometheus kernel: ata4: EH complete Nov 8 19:17:34 Prometheus kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Could these be early warning signs of a pending failure of this disk? The other disks don't show these kind of errors.... Link to comment
prostuff1 Posted November 8, 2009 Share Posted November 8, 2009 I just recently had a drive start giving me the "read error" messages. It turned out that the drive WAS failing and a clear sign of that was the Reallocated sector count kept increasing. You need to get a smart report from the drive (can use unMenu to do this) and then you need to run a smart long test on the drive (again can use unMenu). Once those are done post the results so the community can take a look and advice further. You can also check the SATA cable and the connection to see if it is good/bad/etc. Replace the cable if you have an extra and go from there. My drive was a 1TB Seagate that had some 300+ Reallocated sectors and it had 30 pending. So yeah, get those smart tests done and you will get an idea of what might be the problem. Link to comment
RobJ Posted November 9, 2009 Share Posted November 9, 2009 Can you also post the complete syslog? We need to see those errors in context, and especially what the very first errors were. The errors quoted in the first section are typical after a drive has been disabled, or a read error has occurred. The second section indicates a media error (UNCorrectable), probably a bad sector. There is not enough info yet to make any conclusions at all about whether the drive is failing, but the likelihood is that it probably is not. Link to comment
dlmh Posted November 11, 2009 Author Share Posted November 11, 2009 Thanks for the replies. I just tried to stream a movie from XBMC and it became completely inresponsive. The same when opening the Web GUI. I ssh-ed to the unRAID machine and entered reboot , but it wouldn't reboot (although it gave me the message "sending HALT...."). So I had to press and hold the power button to shutdown the machine and reboot. After this, the the disks show as "Unformatted". I checked the sys log and copied this to pastebin. I'm starting to feel there's definitely something wrong with that disk... Link to comment
RobJ Posted November 12, 2009 Share Posted November 12, 2009 It does look bad, but I would check that SMART report first to confirm. All of the errors are the same, "media errors" with error code UNC (UNCorrectable), initially in multiple large clusters, then randomly scattered across the drive. Check the SMART report for increasing Reallocated_Sector_Ct and Current_Pending_Sector, then do the SMART long test, then check a SMART report again and compare those same numbers plus the Offline_Uncorrectable. Link to comment
dlmh Posted November 13, 2009 Author Share Posted November 13, 2009 Well... I ordered a preemptive harddrive ... just to be on the safe side. Should I backup the data first or let the array restore the data? Link to comment
PhilH Posted November 13, 2009 Share Posted November 13, 2009 If you have the room somewhere, I would backup the data just in case something goes wrong. It probably won't, but you never know. I had two 2 disk failures all within a month of each other. I lost about 3 TB of data. The only stuff I didn't lose was the stuff I had backed up. I had been running unRAID for well over a year with zero problems. My power supply started to go out. Didn't know it and slowly fried 4 or 5 drives. Like I said it's probably ok to just do a rebuild, but if you have the room I would backup.. Phil Link to comment
dlmh Posted November 15, 2009 Author Share Posted November 15, 2009 This is the result from SMART Report in unMENU on Pastebin. And this is from a healthy drive. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.