Chris Pollard Posted December 8, 2010 Share Posted December 8, 2010 my testing indicates that you WILL see sync errors when checking parity if you have encountered this issue, I would suggest that running a parity check should either ease your mind or confirm that you have corruption. Please see Joe's post below if you only have one F4! Quote Link to comment
Joe L. Posted December 8, 2010 Share Posted December 8, 2010 my testing indicates that you WILL see sync errors when checking parity if you have encountered this issue, I would suggest that running a parity check should either ease your mind or confirm that you have corruption. If you have one of these drives I'd suggest a read only parity check. You can do it from the command line. /root/mdcmd check NOCORRECT If it shows no errors you are fine. It will look exactly like the normal check and the web-interface will still say it is correcting the errors but it is not. If it does show parity errors AND if you ONLY have one of the F4 drives in your array, you can Stop the array un-assign the F4 drive Start the array with it un-assigned (This will cause the array to forget its serial number so it can be used as its own replacement) Stop the array once more Re-Assign the F4 drive Start the array again... It will re-construct the data onto the F4 drive. It will have the correct data. This is assuming the F4 drive is a data drive and not the parity drive. If the F4 drive is the parity drive just do a normal parity "Check" and it will be corrected based on the actual data on the data disk. You'll be without parity protection for the duration of the re-construction but since you are actually only writing what is already there for most of the disk you are no worse off than with the corruption. (You could force the F4 data disk to be valid if another were to fail during the process.) Joe L. Quote Link to comment
DGalt Posted December 8, 2010 Share Posted December 8, 2010 Is this assuming you've assigned a parity drive? Again...I haven't done that yet. Should I do that and then run the check? Quote Link to comment
Chris Pollard Posted December 8, 2010 Share Posted December 8, 2010 Is this assuming you've assigned a parity drive? Again...I haven't done that yet. Should I do that and then run the check? If you don't have a parity drive at present you will have to compare the source files with the files on unraid using md5sum as previously described. That is the only way to know in your scenario. Quote Link to comment
slaveunit Posted December 8, 2010 Share Posted December 8, 2010 my testing indicates that you WILL see sync errors when checking parity if you have encountered this issue, I would suggest that running a parity check should either ease your mind or confirm that you have corruption. If you have one of these drives I'd suggest a read only parity check. You can do it from the command line. /root/mdcmd check READONLY If it shows no errors you are fine. It will look exactly like the normal check and the web-interface will still say it is correcting the errors but it is not. If it does show parity errors AND if you ONLY have one of the F4 drives in your array, you can Stop the array un-assign the F4 drive Start the array with it un-assigned (This will cause the array to forget its serial number so it can be used as its own replacement) Stop the array once more Re-Assign the F4 drive Start the array again... It will re-construct the data onto the F4 drive. It will have the correct data. This is assuming the F4 drive is a data drive and not the parity drive. If the F4 drive is the parity drive just do a normal parity "Check" and it will be corrected based on the actual data on the data disk. You'll be without parity protection for the duration of the re-construction but since you are actually only writing what is already there for most of the disk you are no worse off than with the corruption. (You could force the F4 data disk to be valid if another were to fail during the process.) Joe L. Is that the actual command? Is readonly supposed to be in all CAPS? If so I get this: root@Tower:~# /root/mdcmd check READONLY /root/mdcmd: line 11: echo: write error: Invalid argument Quote Link to comment
Joe L. Posted December 8, 2010 Share Posted December 8, 2010 my testing indicates that you WILL see sync errors when checking parity if you have encountered this issue, I would suggest that running a parity check should either ease your mind or confirm that you have corruption. If you have one of these drives I'd suggest a read only parity check. You can do it from the command line. /root/mdcmd check READONLY If it shows no errors you are fine. It will look exactly like the normal check and the web-interface will still say it is correcting the errors but it is not. If it does show parity errors AND if you ONLY have one of the F4 drives in your array, you can Stop the array un-assign the F4 drive Start the array with it un-assigned (This will cause the array to forget its serial number so it can be used as its own replacement) Stop the array once more Re-Assign the F4 drive Start the array again... It will re-construct the data onto the F4 drive. It will have the correct data. This is assuming the F4 drive is a data drive and not the parity drive. If the F4 drive is the parity drive just do a normal parity "Check" and it will be corrected based on the actual data on the data disk. You'll be without parity protection for the duration of the re-construction but since you are actually only writing what is already there for most of the disk you are no worse off than with the corruption. (You could force the F4 data disk to be valid if another were to fail during the process.) Joe L. Is that the actual command? Is readonly supposed to be in all CAPS? If so I get this: root@Tower:~# /root/mdcmd check READONLY /root/mdcmd: line 11: echo: write error: Invalid argument Yes it is supposed to be all caps... but I made a mistake in the command argument It should be /root/mdcmd check NOCORRECT Sorry. Quote Link to comment
prostuff1 Posted December 8, 2010 Share Posted December 8, 2010 Is that the actual command? Is readonly supposed to be in all CAPS? If so I get this: root@Tower:~# /root/mdcmd check READONLY /root/mdcmd: line 11: echo: write error: Invalid argument No, if I remember correctly the command is: /root/mdcmd check NOCORRECT EDIT: beat by 6 seconds by JoeL Quote Link to comment
Joe L. Posted December 8, 2010 Share Posted December 8, 2010 Is that the actual command? Is readonly supposed to be in all CAPS? If so I get this: root@Tower:~# /root/mdcmd check READONLY /root/mdcmd: line 11: echo: write error: Invalid argument No, if I remember correctly the command is: /root/mdcmd check NOCORRECT <embarrassed> yes </embarrassed> My photographic memory apparently had the wrong photograph. Quote Link to comment
slaveunit Posted December 8, 2010 Share Posted December 8, 2010 Until a fw fix comes out wouldn't we want to disable the write cache in the middle of your process Joe? Thanks. Quote Link to comment
Joe L. Posted December 8, 2010 Share Posted December 8, 2010 Until a fw fix comes out wouldn't we want to disable the write cache in the middle of your process Joe? Thanks. It would still require users to download and use the newer version. I suppose it would not hurt, but as mentioned earlier, many shell scripts that monitor the disks use hdparm and smartctl. Joe L. Quote Link to comment
pras1011 Posted December 8, 2010 Share Posted December 8, 2010 The new firmware should be released very soon. Hopefully before you lot wreck your servers! Quote Link to comment
DGalt Posted December 9, 2010 Share Posted December 9, 2010 I didn't realize how labor intensive doing the checksum thing would be. Considering the number of files I would have to check it's just not a feasible option (unless I'm missing something). So, two questions. 1. All I've done is basically move a bunch of files off of external harddrives onto the system. unMenu wasn't being used at the time (someone asked about this) or any other addon that I know of. What are the chances I'm going to run into a problem? 2. When the firmware is released, will it only solve the issue for newly added files, or will it fix any possible issues that arose before the firmware fix was applied? If I have to I'll just re-preclear the drives. At least that's automated Edit: Also, if you're using one of these for your parity, is it still ok to turn off write caching? Quote Link to comment
Joe L. Posted December 9, 2010 Share Posted December 9, 2010 I didn't realize how labor intensive doing the checksum thing would be. Considering the number of files I would have to check it's just not a feasible option (unless I'm missing something). So, two questions. 1. All I've done is basically move a bunch of files off of external harddrives onto the system. unMenu wasn't being used at the time (someone asked about this) or any other addon that I know of. What are the chances I'm going to run into a problem? Unless you were issuing hdparm or smartctl commands during the time you were copying the files (and you probably were not) you'll not have any problem at all. 2. When the firmware is released, will it only solve the issue for newly added files, or will it fix any possible issues that arose before the firmware fix was applied? Only for future files written to the drive. It will have no effect on files that were not written previously. They will still not be written. If I have to I'll just re-preclear the drives. At least that's automated I don't think you'll need to, from what you've said. Edit: Also, if you're using one of these for your parity, is it still ok to turn off write caching? Yes. you'll want to turn off the write-caching. Joe L. Quote Link to comment
lionelhutz Posted December 9, 2010 Share Posted December 9, 2010 unMenu is installed, although I haven't really used it at all. I've been following the Configuration tutorial found here: http://lime-technology.com/wiki/index.php?title=Configuration_Tutorial However, the web page was not open during file transfer. I haven't opened it since it was originally called for in that tutorial. You should be just fine then. You'd have to be doing something that was querying the drives (unMENU or command line stuff) while transferring data. You're saying you didn't do anything but boot unRAID and moving files. Peter Quote Link to comment
pras1011 Posted December 9, 2010 Share Posted December 9, 2010 New firmware: http://www.samsung.com/global/business/hdd/faqView.do?b2b_bbs_msg_id=386 Quote Link to comment
madpoet Posted December 9, 2010 Share Posted December 9, 2010 Hrm... how the heck do you patch your drives in unRaid? Pull them and patch on another machine, then put them back? Quote Link to comment
pras1011 Posted December 9, 2010 Share Posted December 9, 2010 You have to patch via DOS. Quote Link to comment
Joe L. Posted December 9, 2010 Share Posted December 9, 2010 New firmware: http://www.samsung.com/global/business/hdd/faqView.do?b2b_bbs_msg_id=386 The description on that "patch" seems to indicate the problem occurred if NCQ was enabled and an IDENTITY command issued. For most unRAID users, unless you changed the default settings, this might be good news, as NCQ is disabled by default. Joe L. Quote Link to comment
tranm5 Posted December 9, 2010 Share Posted December 9, 2010 Do you think that it is safe to re-enable write caching after the firmware upgrade? Quote Link to comment
slaveunit Posted December 9, 2010 Share Posted December 9, 2010 Hrm... how the heck do you patch your drives in unRaid? Pull them and patch on another machine, then put them back? I think the easiest way to patch if you do not want to remove your drive is to unplug either the sata or the power from all drives but the samsung one. Then boot from a seperate flash drive with the upgrade on it. Here is a simple tutorial for getting DOS boot files onto a flash drive. http://thelostbrain.com/post/2008/01/Make-your-flash-drive-bootable!-%28Boot-into-DOS-with-access-to-full-flash-drive-space%29.aspx Quote Link to comment
Joe L. Posted December 9, 2010 Share Posted December 9, 2010 Do you think that it is safe to re-enable write caching after the firmware upgrade? Personally if I had one of those drives, I'd wait before applying the patched firmware until you read of reports of it working. Are you certain it corrects the problem? and that no others are introduced? I would wait until I learn the patch process is tried by other more-anxious users of their disks... It here are wrinkles in their patch process, I'd want it to happen to somebody else first. Quote Link to comment
BRiT Posted December 9, 2010 Share Posted December 9, 2010 The description on that "patch" seems to indicate the problem occurred if NCQ was enabled and an IDENTITY command issued. For most unRAID users, unless you changed the default settings, this might be good news, as NCQ is disabled by default. From the bug description linked earlier, it did not matter if NCQ was enabled or not, the investigating labs were able to duplicate the bug without NCQ. I would certainly wait until it's been confirmed by many others including ct labs and the bug tracker that this firmware indeed corrects the issue. I have a feeling it might only partially correct the problem. http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks This may not always be true as the c't lab also reported problems with NCQ disabled. To make matters even worse, Samsung's patch does not change the firmware reported! How sloppy and irresponsible of them. Now there is no way of knowing if you need the patch or not. For shame! http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks The patch did not change the firmware version number reported by IDENTIFY DEVICE: smartctl -i -q noserial /dev/sda smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: SAMSUNG HD204UI Firmware Version: 1AQ10001 User Capacity: 2,000,398,934,016 bytes Quote Link to comment
lionelhutz Posted December 9, 2010 Share Posted December 9, 2010 Personally if I had one of those drives, I'd wait before applying the patched firmware until you read of reports of it working. Are you certain it corrects the problem? and that no others are introduced? I would wait until I learn the patch process is tried by other more-anxious users of their disks... It here are wrinkles in their patch process, I'd want it to happen to somebody else first. Always a good plan. I believe Zotac released a BIOS that bricked a whole bunch of motherboards one time in history. But then, that could have been Seagate during their drive issues... You either must be ready for the worst to happen (the drive failing badly in some manner) or just let others be the initial testers. Peter Quote Link to comment
jtown Posted December 9, 2010 Share Posted December 9, 2010 And I stupidly updated parity without checking the forums when I saw 14 errors. The fact that I did it before this thread started doesn't make it suck any less. On the plus side, I'm pretty sure everything I've stored on this drive is from bittorrent so I can just force a re-check and patch up the corrupt files. Now I have to decide if I want to Go For It and try the firmware update or grab an EARS drive on the way home and put the F4 in the time-out zone until the firmware fix has been tested and unRAID provides AF support. I think I'll go with the time-out. At least I know the EARS drive will give its all with the jumper installed and, by the time I finish filling it, it will probably be safe to bring the F4 back. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.