napalmd Posted March 21, 2014 Share Posted March 21, 2014 Hi I have a 6 disk unraid server on a friend who stores there photos from event's. There's a lot of photos and a lot of folders. Recently he found some corrupted files in some folders, about 1 in 10 corrupted, but it seems only the ones created at about 5 months ago, but at the time the files were ok, he said. Since then many files have been written with no problems at all. Unraid has no errors and I do a parity check every month. I Think that most files corrupted are on disk3, so I ran a File system check: Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 179451 Internal nodes 1136 Directories 635 Other files 70293 Data block pointers 175011400 (1159060 of them are zero) Safe links 0 ########### reiserfsck finished at Fri Mar 21 17:29:24 2014 ########### /dev/md3 mounted on /mnt/disk3 then I ran smart status: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 247 243 021 Pre-fail Always - 9641 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 952 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 15615 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 61 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 33 193 Load_Cycle_Count 0x0032 105 105 000 Old_age Always - 287315 194 Temperature_Celsius 0x0022 123 104 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 190 000 Old_age Always - 50607 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 So as you can see there are a lot of UDMA_CRC_Error_Count... I read about it and problem could be the cables, I'll have to check them later... For now I would want to recover the corrupt files. Is there a way to do it? By re-syncing the data from the parity drive or something like it? Quote Link to comment
itimpi Posted March 21, 2014 Share Posted March 21, 2014 For now I would want to recover the corrupt files. Is there a way to do it? By re-syncing the data from the parity drive or something like it? I am afraid parity does not work at the file level. The only way to recover files is to copy them back on from your backups - if you dot have backups then there is no way to recover. Quote Link to comment
RobJ Posted March 23, 2014 Share Posted March 23, 2014 So as you can see there are a lot of UDMA_CRC_Error_Count... I read about it and problem could be the cables, I'll have to check them later... Replace the cable to this drive as soon as possible! Make sure it's a good quality one. Because these are cheap but important, do not ever skimp on SATA cables. There is a small chance that these CRC errors are caused by poor power, so make sure your power supply is a good one, and sufficient for your system needs. Quote Link to comment
napalmd Posted March 25, 2014 Author Share Posted March 25, 2014 I found corrupted files on another disk, and that disk has no UDMA crc errors... I don't know what to do... Quote Link to comment
napalmd Posted March 25, 2014 Author Share Posted March 25, 2014 The system I have is: PENTIUM G850 (2.9GHZ) SKT 1155 ASROCK B75 PRO3-M KINGSTON KIT 4GB DDR3 1333MHZ 6 WD 2tb green drives connected onboard pci-e gigabit card syslog: http://pastebin.com/czgAuHyp I don't have is a cache drive, would it prevent this kind of occurence? It seems that there are not new files being corrupted it seems to be only at that time that some files got corrupted but only today we found that... Quote Link to comment
Fireball3 Posted March 25, 2014 Share Posted March 25, 2014 Oh man, this is horror scenario imho. Even a backup (server) won't save you in this case because the corruption will propagate when syncing... I really wonder if the files were OK from the beginning? Cache drive is irrelevant to this issue. Quote Link to comment
vca Posted March 25, 2014 Share Posted March 25, 2014 You might also have bad RAM in either the unRAID box or the computer these were copied from. I would run the memory tester on all the machines that these files were copied through. Regards, Stephen Quote Link to comment
napalmd Posted March 25, 2014 Author Share Posted March 25, 2014 I tested the ram already, no problems with memtest86 both in server and the pc Quote Link to comment
SSD Posted March 25, 2014 Share Posted March 25, 2014 Suggest you use teracopy with the verify option to copy files to the server. You may have some corruption happening in the network. Quote Link to comment
RobJ Posted March 25, 2014 Share Posted March 25, 2014 The CRC errors are probably unrelated to the corruption too. They will hurt performance, because the data has to be resent (and resent again if necessary until its good), but they should not cause a file to be corrupted. Your syslog does not show any issues, apart from one anomaly. You did a parity check on Feb 28 which ran for 10 hours, then 2 more on Mar 7 and 21 which both ran for 7 hours, all without issue. It's rather strange that one parity check would take 3 hours longer, with no issues to report, but perhaps you were using the system then? File corruption issues can be hard to figure out, as they are usually very infrequent, and generally no visible errors anywhere. When you checked your memory, did you let each test run at least 8 hours? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.