February 5, 201016 yr Author No I did not remove the one on the smaller disk. I will do that and post the results.
February 5, 201016 yr Author It said it worked. I rebooting again and then I will run a parity check. I assume I will see a lot of sync errors the first time.
February 5, 201016 yr Author Well the first parity check finished with over 1.4 million sync errors. When I started to run another parity check I started to get sync errors again. Something is definitely wrong. I have attached a syslog You will see one parity complete 100% and I cancelled the 2nd one once I started to see errors again. Any ideas? syslog.zip
February 6, 201016 yr Author I did a hdparm on all of my drives and the last 5 drives gave a weird message. These are all WD 1.5TB green drives (WD15EADS) For the last 5 drives I get this message for all of them: root@Tower:~# hdparm -N /dev/sdb /dev/sdb: max sectors = 18446744072344861488/2930277168, HPA setting seems invalid root@Tower:~# hdparm -N /dev/sdd /dev/sdd: max sectors = 18446744072344861488/2930277168, HPA setting seems invalid root@Tower:~# hdparm -N /dev/sdc /dev/sdc: max sectors = 18446744072344861488/2930277168, HPA setting seems invalid root@Tower:~# hdparm -N /dev/sdg /dev/sdg: max sectors = 18446744072344861488/2930277168, HPA setting seems invalid root@Tower:~# hdparm -N /dev/sdf /dev/sdf: max sectors = 18446744072344861488/2930277168, HPA setting seems invalid root@Tower:~#
February 6, 201016 yr I did not see any evidence of the HPA's, so they are no longer confusing the issue. I'd say start with the basics. Run a memory test. My disks say the same thing with the hdparm -N... I would not worry about it too much.
February 6, 201016 yr Sorry... how do I run a memory test? When the server boots, it displays a menu (you need a display attached to see this): -- unRAID OS -- Memtest86+ Just select the second option there.
February 6, 201016 yr Author Ah ok. I have it running now. How long does it run for and what am I looking for on the screen to determine if there is an error? Thx. Sorry for this but I have never used it before. BTW... this is the RAM I am using in the tower http://www.newegg.ca/Product/Product.aspx?Item=N82E16820145184 I have not modified any bios settings for memory
February 6, 201016 yr How long does it run for... It can run indefinitely. I would let it run overnight. ... and what am I looking for on the screen to determine if there is an error? There's a column there labeled "Errors". That better stay at zero!
February 6, 201016 yr Author OK I am going out shortly and will let it run all night. I will report in the morning. Thx
February 6, 201016 yr Author Alright it has been 15 hours and there are no memory errors. These are the parts I am using in my tower: Power supply Corsair 650W ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16817139005 CPU AMD ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16819103235 Motherboard Gigabyte ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16813128342 RAM Corsair 4GB ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16820145184 Parity drive Seagate 1.5TB ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16822148337 Add on SATA cards: Rosewill RC-218 PCI Express x4 ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16816132018 SYBA SD-SA2PEX-2IR PCI Express SATA II Controller Card ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16815124027 It looks like memory is fine. What are my next steps? Thx
February 6, 201016 yr Alright it has been 15 hours and there are no memory errors. These are the parts I am using in my tower: Power supply Corsair 650W ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16817139005 CPU AMD ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16819103235 Motherboard Gigabyte ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16813128342 RAM Corsair 4GB ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16820145184 Parity drive Seagate 1.5TB ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16822148337 Add on SATA cards: Rosewill RC-218 PCI Express x4 ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16816132018 SYBA SD-SA2PEX-2IR PCI Express SATA II Controller Card ----- http://www.newegg.ca/Product/Product.aspx?Item=N82E16815124027 It looks like memory is fine. What are my next steps? Thx OK... that is a good first step.... Now... to try to discover what in the heck is going on.... My best guess is now that you are not being distracted by any HPA issues, one of your disks, when read, is returning bad data. I've only seen this once before where it was so random it only showed in parity calcs, but it is a first guess. What I would propose you try is a process of elimination... Try to isolate which component of your server is not acting as expected. To do that, I'd first get a smart report on each of your disks, just in case it points immediately to some issue. Then, a "short" test on each, again, to see if anything is obviously bad, followed by a "long" test on each. You will need to disable the spin-down when doing the long test, otherwise the test will abort when the disk spins down. One way to test would be to create a large file on each drive, get a md5sum, and ensure the contents are as expected. This works for the data disks. A bad md5sum when read back would indicate a problem with a data drive. You can simply copy the same large ISO file to each disk in turn. Then run md5sum /mnt/disk1/your_file.iso md5sum /mnt/disk2/your_file.iso md5sum /mnt/disk3/your_file.iso md5sum /mnt/disk4/your_file.iso etc... on each in turn. Another way to test is to use the parity calculations in a process of elimination. Then, since we can guess your parity is basically useless at this time, it would not bother me to overwrite it in a few tests. Stop the array. Copy the entire "config" folder to old_config (so you can get back to it easily) Go to your "devices" page. un-assign half of your data drives Go back to the main page, press "restore" and let unRAId calculate a new set of parity on the remaining assigned drives Then, once done, press "check" to see if any parity errors detected. If Yes, then the errors are in the assigned drives/controllers. If all clean, stop the array once more, assign half of the remaining drives and repeat. If errors, un-assign half of the drives and repeat. do this unti you isolate the one drive that when un-assigned allows parity calcs to repeat with no errors. If they always occur, with even a single data drive and parity, it could be the parity drive itself. When done, assign all the drives once more, press "restore" and re-calc parity in its entirety. I'd try the first approach of testing md5sums of files first. It is the easiest and quickest. It might turn out to be a single disk. I don't think it has anything to do with an HPA. Joe L.
February 6, 201016 yr Author OK I have unassigned the two drives that are on the newest 2 port sata board that I installed last. I have done a restore and the parity is calculating. I will then run a parity check when done and see if there are any errors. If there are no errors then it is the sata board that I installed or one of the drives on it. If it still fails then I will unassign 4 more drives that are on the other 4 port sata board I installed and see how that goes. With both sata board out of the picture then it is just the 6 ports on the motherboard. I am trying one sata board at a time. I will post my findings when done. Thx
February 6, 201016 yr While it is doing the parity calc/subsequent test, you can still do the md5sum tests I described. The ongoing parity calc/check would have no effect on you creating the new files and checking their md5sum.
February 6, 201016 yr Author I checked md5sum on all drives and they all reported the proper value. I am thinking it might be the last add-on SATA I installed but I am not sure yet. I will know when I run a parity check after the parity build is done. I will post what I find
February 6, 201016 yr I checked md5sum on all drives and they all reported the proper value.I'm assuming you are referring to the drives still assigned. You've not yet checked the two drives you unassigned in the 2 port controller you referred to in the prior post. I hope you used a multi-gigabyte file... to ensure the random errors were not overlooked. I am thinking it might be the last add-on SATA I installed but I am not sure yet. I will know when I run a parity check after the parity build is done. I will post what I find It will be interesting... It is a shame you (and I) were initially mislead by thinking the HPA was the cause... I would still send a nasty-gram to GigaByte describing how much anguish they caused you, your family, your friends, the environment, global warming, world peace, etc with their BIOS adding an HPA... Joe L.
February 6, 201016 yr Author No I have not tested the two unassigned drives yet. I will see what happens with the parity build and check first. I used a 4Gb iso file when checking.
February 7, 201016 yr Author OK so parity built just fine and it is now running the parity check and so far the parity check is 26% done and no sync errors at all. This looks promising. I will know in the morning whether it is 100% fine or not. Now for the next step. If the check is good in the morning then either one of the 2 remaining drives has issues or the 2 port SATA board is no good. Is there a way to do a md5sum check on the other 2 drives when they are not assigned. This would help eliminate that. If the md5sum is fine on both then the SATA board is no good. The SATA board that is in question is this one: SYBA SD-SA2PEX-2IR PCI Express SATA II Controller Card http://www.newegg.ca/Product/Product.aspx?Item=N82E16815124027 I had read good things about them on this board so I bought one. Is this a known problem? Is there another board that is PCI express that is better? I only need one that has 2 ports on it. I am thinking it is the SATA board.
February 7, 201016 yr OK so parity built just fine and it is now running the parity check and so far the parity check is 26% done and no sync errors at all. This looks promising. I will know in the morning whether it is 100% fine or not. Now for the next step. If the check is good in the morning then either one of the 2 remaining drives has issues or the 2 port SATA board is no good. Is there a way to do a md5sum check on the other 2 drives when they are not assigned. This would help eliminate that. If the md5sum is fine on both then the SATA board is no good. Even though they are not assigned, they are still connected to the same controller, therefore you can't do anything with them un-assigned that would not be different than with them assigned. The SATA board that is in question is this one: SYBA SD-SA2PEX-2IR PCI Express SATA II Controller Card http://www.newegg.ca/Product/Product.aspx?Item=N82E16815124027 I had read good things about them on this board so I bought one. Is this a known problem? Is there another board that is PCI express that is better? I only need one that has 2 ports on it. I am thinking it is the SATA board. we will see... You can mount the un-assigned drives and do the md5sum test on them. mkdir /tmp/d1 mkdir /tmp/d2 mount -t reiserfs /dev/???1 /tmp/d1 mount -t reiserfs /dev/???1 /tmp/d2 cp /mnt/disk1/big_file.iso /tmp/d1/big_file.iso md5sum /tmp/d1/big_file.iso cp /mnt/disk1/big_file.iso /tmp/d2/big_file.iso md5sum /tmp/d2/big_file.iso md5sum /mnt/disk1/big_file.iso cd / umount /tmp/d1 umount /tmp/d2 Note the "mount commands use the first partition. So if the drive is "sde" the first partition is /dev/sde1 The "writing to the disks while not assigned to the array will normally invalidate parity, but since the array does not include these disks at this point, it really does not matter and will not affect the parity you calculated on the assigned drives. Of course, when you assign these two additional drives you will need to press "restore" once more, and a full parity calc will once more occur when you start the array. Joe L.
February 7, 201016 yr Author Well parity check is finished and there are 0 sync errors. So right now everytihng in the tower is workign 100%. Now to test the other 2 drives.
February 7, 201016 yr Author The md5sum values were identical. There is no issue with the drives. So it is the SATA board. Now I need to know what other SATA board is the best to use to replace it. I need a board with 2 ports on it and either PCI Express like the one I am using or a PCI board.
February 7, 201016 yr The md5sum values were identical. There is no issue with the drives. So it is the SATA board. Now I need to know what other SATA board is the best to use to replace it. I need a board with 2 ports on it and either PCI Express like the one I am using or a PCI board. Did you try those disks on a different controller ? If you left them on the existing SATA card (where they've always been connected), then I suppose it could be an interaction when both disks are simultaneously being accessed, but not visible when one at a time are being read. To confirm it is the card, I'd try assigning the disks back in the array (on that same card) and see if the errors return. I'd actually, try just one at a time. Then, after trying each port on the card individually with a parity check/verify, then try again with both. If you have the flexibility, you might even try the card in a different slot on your MB. Of course, a new PCI 2 port card can be purchased for under $20. Your sanity may be worth more than that. If you do get a new card, you can test the old one as a wheel-chock on your car. Place it behind the real wheel and see if you can roll over it. (Repeated tests might be necessary, just in case it allows you to roll over it once or twice by accident) Joe L.
February 7, 201016 yr Author I will try that. I am wondering if anything special needs to be done to this SATA board: SYBA SD-SA2PEX-2IR PCI Express SATA II Controller Card http://www.newegg.ca/Product/Product.aspx?Item=N82E16815124027 I read one post by RobJ that it needed to be flashed with a non-raid bios. Of course I have no idea how to do this. It could be as simple as the board does not have the proper firmware on it. I did not do anything to it except install it in the tower.
February 7, 201016 yr Author OK I figured out how to flash the sata board bios and it is now running a non-raid bios which is what RobJ did. Now I want to re-assign the disks to the array. I have done this but now it says 2 new disks found. I assume I need to do a restore. Please verify. Thx
Archived
This topic is now archived and is closed to further replies.