November 20, 200619 yr I just setup a brand spankin' new pro array. Right now I'm only using 4 drives and trying to get the feel of the product. I have a WD 500gb parity drive annd 3 Seagate 320gb drives all SATA and all hooked to a Promise SATA300 TX4. As I have been copying data over the the array I have been getting occasional errors - about 1-2 for every 50gb of data. I have attached a screenshot and my log below. Specific Questions: What does "errors" mean on the web page? Write errors? Read Errors? The log shows two read errors. I read that read errors are reconsructed using the parity info - what happens with a write error? Does it retry? Fail? What does the read error below indicate? [br] If I have hardware or setup issues I really want to knwo now before I move all my data over. Thanks in advance. One more thing: YES I did use search... Dan Stroot ------------------------------ root@Tower:~# tail -f /var/log/syslog Nov 19 05:54:52 Tower kernel: I/O error: dev 08:31, sector 16239472 Nov 19 05:54:52 Tower kernel: md3: read error! Nov 19 05:54:52 Tower kernel: end_read_request 16239472/3, count: 2, uptodate 0. Nov 19 06:28:37 Tower kernel: ata4: status=0x50 { DriveReady SeekComplete } Nov 19 06:28:37 Tower kernel: SCSI disk error : host 4 channel 0 id 0 lun 0 return code = 8000002 Nov 19 06:28:37 Tower kernel: Current sd08:41: sns = 70 0 Nov 19 06:28:37 Tower kernel: Raw sense data:0x70 0x00 0x00 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Nov 19 06:28:37 Tower kernel: I/O error: dev 08:41, sector 82644432 Nov 19 06:28:37 Tower kernel: md0: read error! Nov 19 06:28:37 Tower kernel: end_read_request 82644432/0, count: 2, uptodate 0.
November 20, 200619 yr The Errors column on the Main page counts the number low-level disk read/write commands which failed with error. In the case of a read error, the user data is reconstructed "on-the-fly" by reading all the other data disks plus parity. In the case of a write-error, the disk is marked "disabled". A disabled disk will have a red 'led' indicator next to it. It appears like there is something marginal in your system. Check cables, re-seat the PCI controller card, make sure your power supply is good. Before transfering any more data, might be good idea to run a Parity-Check.
January 2, 200719 yr I also have a mix of of WD and Seagate drives. I just added my 5th drive today brand new (western digital 500G SATA drive attached to a promise controller) - this is my second such drive attached to the controller (the other 4 are on motherboard) I copied 70 G of data today and got 5 read errors. The log looks almost identical to above. Did you find the problem? One thing Tom mentioned is to check cable to power supply. One concern I have is since I ran out of cables to connect this last drive to my power supply, I bought a Y-power cable. I wonder if this could be causing the problem (low voltage???) Since this is a brand new drive, I'm thinking about taking it back. Any thoughts or suggestions?
January 3, 200719 yr Send me the system log before you return your drive. Also, power y-splitter should be no problem as long as everything is securely connected. Take a close look at the connections - sometimes a female pin can be pushed out of the connector resulting in intermittant contact with the male pin,
January 3, 200719 yr Thanks. I've got a newbie question about "multi-platform" file moving... how can I get the /var/log/syslog file onto my PC? "My network places" only show me my data drives and shares Although I was pretty sure it wouldn't work, I tried to ftp tower from a cmd window... nope
January 3, 200719 yr Scratch that last question... I never realized you could COPY and PASTE within a command window. I'm still waiting for the parity check to finish. The only thing that seems suspect so far are a bunch of Jan 3 15:39:12 Tower smbd[1339]: [2007/01/03 15:39:12, 0] lib/util_sock.c:get_peer_addr(1000) Jan 3 15:39:12 Tower smbd[1339]: getpeername failed. Error was Transport endpoint is not connected Jan 3 15:39:12 Tower smbd[1339]: [2007/01/03 15:39:12, 0] lib/util_sock.c:get_peer_addr(1000) Jan 3 15:39:12 Tower smbd[1339]: getpeername failed. Error was Transport endpoint is not connected Jan 3 15:39:12 Tower smbd[1339]: [2007/01/03 15:39:12, 0] lib/util_sock.c:write_socket_data(430) Jan 3 15:39:12 Tower smbd[1339]: write_socket_data: write failure. Error = Connection reset by peer Jan 3 15:39:12 Tower smbd[1339]: [2007/01/03 15:39:12, 0] lib/util_sock.c:write_socket(455) Jan 3 15:39:12 Tower smbd[1339]: write_socket: Error writing 4 bytes to socket 5: ERRNO = Connection reset by peer ........(many more) I'll send you the whole syslog when parity check completes.
January 3, 200719 yr Author I upgraded my power supply and re-seated the cables - no errors now for weeks.
January 4, 200719 yr Scratch that last question... I never realized you could COPY and PASTE within a command window. I'm still waiting for the parity check to finish. The only thing that seems suspect so far are a bunch of Jan 3 15:39:12 Tower smbd[1339]: [2007/01/03 15:39:12, 0] lib/util_sock.c:get_peer_addr(1000) Jan 3 15:39:12 Tower smbd[1339]: getpeername failed. Error was Transport endpoint is not connected Jan 3 15:39:12 Tower smbd[1339]: [2007/01/03 15:39:12, 0] lib/util_sock.c:get_peer_addr(1000) Jan 3 15:39:12 Tower smbd[1339]: getpeername failed. Error was Transport endpoint is not connected Jan 3 15:39:12 Tower smbd[1339]: [2007/01/03 15:39:12, 0] lib/util_sock.c:write_socket_data(430) Jan 3 15:39:12 Tower smbd[1339]: write_socket_data: write failure. Error = Connection reset by peer Jan 3 15:39:12 Tower smbd[1339]: [2007/01/03 15:39:12, 0] lib/util_sock.c:write_socket(455) Jan 3 15:39:12 Tower smbd[1339]: write_socket: Error writing 4 bytes to socket 5: ERRNO = Connection reset by peer ........(many more) I'll send you the whole syslog when parity check completes. Those errors are meaningless (really!) - they have to do with Windows networking. If look under the Settings page, Identification section, what is the setting for "Smb ports"? There are two choices: "139" and "445 139". Change to "139" and most of these errors will go away.
January 4, 200719 yr Thanks Tom, I will disregard these errors. dstroot, if you don't mind me asking, which P/S did you purchase? I'll buy another one and keep my current one for powering my next couple of drives.
January 4, 200719 yr Author I went big because I want to try to only run one power supply - currently running six drives off of it. The main thing for me was dual 12v rails @ 18 amps each and I have always like Seasonic quality and quietness. http://www.newegg.com/Product/Product.asp?Item=N82E16817151025
January 4, 200719 yr Actually I'd try to avoid multi-rail PSU's. A dual-rail PSU is going to split the available power between both rails. Typically one rail is dedicated to the motherboard (for the CPU), the other to I/O devices, i.e., hard drives. Hence a dual-rail PSU rated at say 24A is only going to provide 12A to the hard drives where you need it most (in an unRAID server that is).
January 4, 200719 yr Then can you recommend a good single rail PSU. I want to buy a good PSU, however I have the Stacker 810 with only one PSU bay. This will need to power 12 drives in the future.
January 4, 200719 yr Then can you recommend a good single rail PSU. I want to buy a good PSU, however I have the Stacker 810 with only one PSU bay. This will need to power 12 drives in the future. This is the one we're using in the next soon-to-be-announced unRAID server product: ENERMAX EG651P-VE FM(24P)
January 5, 200719 yr I'm assuming this will run all 12 drives, mobo, case fans? If so, since I haven't purchased my 2nd PSU yet for the stacker, perhaps I should take my first PSU back and upgrade to this... alleviating the need for the 2nd supply, and eliminating my paranoia that SPARKLE is part of my current read error problems. Assuming it will fit in the stacker
Archived
This topic is now archived and is closed to further replies.