September 20, 200916 yr heart stopping moment this AM when I realised that all 1.4 TB of data had corrupted... however on closer inspection it appears to be some kind of issue with SMB... I have a main single share on my unraid server. this then has sub folders with all the data in. this data is split between two 1TB disks with another for parity. now, if I navigate to \\unraid-server\share\photo\image0011.jpg when it loads it's corrupted. however when I turned on the individual disk shares so that I could see if it was one or both disks that were screwed I found that things aren't corrupt. so: \\unraid-server\disk1\SHARE\Photos\imgage0011.jpg loads fine. I have been able to replicate this over and over... can anyone explain why or give me advice on how best to proceed? thanks Tony
September 20, 200916 yr heart stopping moment this AM when I realised that all 1.4 TB of data had corrupted... however on closer inspection it appears to be some kind of issue with SMB... I have a main single share on my unraid server. this then has sub folders with all the data in. this data is split between two 1TB disks with another for parity. now, if I navigate to \\unraid-server\share\photo\image0011.jpg when it loads it's corrupted. however when I turned on the individual disk shares so that I could see if it was one or both disks that were screwed I found that things aren't corrupt. so: \\unraid-server\disk1\SHARE\Photos\imgage0011.jpg loads fine. I have been able to replicate this over and over... can anyone explain why or give me advice on how best to proceed? thanks Tony Step 1, before you do anything, post a syslog. Instructions are in the wiki. Your issue could be almost anything, I suspect memory first, but since the data on the disks is probably fine, (as evidenced by your ability to look at them on the disk shares) I'd reboot (after grabbing the syslog) and see if the issue goes away. If it does not, then it is time to isolate what causes files read through user-shares to look corrupted. What version unRAID are you using? What do you see when you type ifconfig eth0 Have you tried to reboot or restart your "router" ? or the PC you are using the display the images? Joe L.
September 20, 200916 yr Author Hi Joe, To answer your questions. 1) yes i've rebooted the reading machine. 2) not yet rebooted the unraid server... thoguht it better to seek advise first. 3) it's version: 4.4.2\ 4) typing ifconfig eth0 ==========> eth0 Link encap:Ethernet HWaddr 00:15:f2:7e:16:8a inet addr:192.168.30.39 Bcast:192.168.30.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:250171618 errors:0 dropped:0 overruns:0 frame:0 TX packets:35606832 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1906102154 (1.7 GiB) TX bytes:246643758 (235.2 MiB) Interrupt:17 <============ i'll post the syslog in the next post.
September 20, 200916 yr Author OK. mem test went through a complete test with no errors. rebooted the server and im still getting the same issue with curruption when i look at the share but files ok when i look at the indervidual disks. any ideas
September 21, 200916 yr You have a slightly unusual board, an nForce board that is not using the built-in nVidia networking chipset that most do, but is using a Yukon network chipset instead. Not important, but the fact that you have an nForce board makes me ask - what is your motherboard? Be advised that many nForce 2, 3, or 4 boards have a data corrupting hardware flaw in them. If you want to follow up further, search the word nforce on the Hardware Compatibility page. You may want to test with a file compare tool, compare the same image file with both its disk path and its User Share path, and see if multiple compares display exactly the same corruption, or whether it is random. Or you can use a CRC or MD5 tool (or comparable), and see if repeated tests produce the same or random numbers.
September 22, 200916 yr Author Hi Rob, I've got Asus A8N-SLI Premium ----- ?? 8 0 2 NVIDIA Gigabit MAX(Marvell PHY chip) & Marvell PCI Gigabit LAN AMD 939 nForce4 SLI nForce4 MCP no ATX Manufacturer, Newegg forum post 1 ---------- this thread says it's a good board and it had been working ok for me. http://lime-technology.com/forum/index.php?topic=3412 I thoguht this issue was resolved today, but allas it seems to still be there. I'm in the middle of transfering an 11GB zip file from my server to see if it's currupted or not. I'll report back shortly with more info.
October 3, 200916 yr Author Hi, Been a busy couple of weeks and I've had my unraid server mostly powered off. today i did some more tests. from two different machines I've pulled the same selection of files from both the Disk share and from the straight multi disk SMB share. runing the two copies of what should be the same file through an MD-5 checker i get different file signatures. can any one shed some light on why this might be happening or how i can over come this problem. It's completly shakken my confidence in my unraid setup!
October 5, 200916 yr Hi Rob, I've got Asus A8N-SLI Premium ----- ?? 8 0 2 NVIDIA Gigabit MAX(Marvell PHY chip) & Marvell PCI Gigabit LAN AMD 939 nForce4 SLI nForce4 MCP no ATX Manufacturer, Newegg forum post 1 ---------- this thread says it's a good board and it had been working ok for me. http://lime-technology.com/forum/index.php?topic=3412 I thoguht this issue was resolved today, but allas it seems to still be there. I'm in the middle of transfering an 11GB zip file from my server to see if it's currupted or not. I'll report back shortly with more info. If you are using the 1st Lan connection it is the Marvel on the PCI bus. The 2nd Lan connection is the nVidia which is on the PCI-e bus (which is what I use). Maybe there is an issue with the Marvel chipset?
October 5, 200916 yr Since you are facing data corruption, I would contact the developer directly. It's pretty serious. (good luck - with your problem I mean)
October 5, 200916 yr Author I'd hoped this thread might have attracted his attension but i guess not...
October 5, 200916 yr Hi, Been a busy couple of weeks and I've had my unraid server mostly powered off. today i did some more tests. from two different machines I've pulled the same selection of files from both the Disk share and from the straight multi disk SMB share. runing the two copies of what should be the same file through an MD-5 checker i get different file signatures. can any one shed some light on why this might be happening or how i can over come this problem. It's completly shakken my confidence in my unraid setup! In every case I've seen so far, it has been hardware that has been to blame when files experience corruption. Some were motherboard issues, some were memory strip issues, and one has been a hard disk issue. None (so far) have been an issue with a release of unRAID itself. So... first step is to isolate when and where the errors are occurring. absolutely the first thing I would do is a memory test. Run through several full cycles or overnight. If memory is having problems, all bets are off for anything else. Check the memory voltage, timing, and clock speed. Make sure they are appropriate for your specific model and brand of memory. This is especially important if you purchased premium memory. It will often need non-standard settings to make it perform at its best. Joe L.
October 5, 200916 yr Author OK Joe. Firstly... I'm really not trying to have a go at unraid here. I hope it doesn't come off as if I am from this thread - I love the product and really just want to get things up and running properly. There is no other product which does what unraid does - I desperately want it to work for me. I'll leave it to do a full memory test tonight... if you think it's worth it. (it was 2 or 3 hours last time IIRC with not errors) But the strange thing is that I can copy the same file from the same physical hardware and the only difference being the way I navigate to the file - yet it produces different outputs. \\unraid\share\folder\file1 \\unraid\disk1\share\folder\file1 Surely that can't be a hardware issue? if it were a memory bug I would expect crashes and pretty much everything to be dished out as corrupt. The fact that I can repeatable grab the same file from the two shares and get different file signatures seems very odd. I'd love to get to the bottom of this... I've ruled out client machine by using both my laptop and my desktop to copy files from the unraid sever. both times if I copy from the all encompassing \\unraid\share\folder\file1 it gets dished out corrupted if I then copy the same file from \\unraid\disk1\share\folder\file1 it's not corrupted. that to me points an SMB bug in software not hardware... any thoughts not narrowing it down? thanks again for you time! Tony
October 6, 200916 yr 10 hours so far of memtest 6 full passes and no errors. I'll rule out the memory then. That is a great first step. Next is to determine where the bits are being dropped. Are they occurring on a specific disk, or on the MB, or during transmission through the network? Since you said that a file "referenced" from a user-share is corrupted, but from the disk share is not, and we know they are the same file, just accessed in different ways, we'll need to isolate the network. Try running from a telnet session, or at a console log-in session(multiple times) md5sum /mnt/disk1/share/file1 and md5sum /mnt/user/share/file1 to detect if they are the same checksum when read locally on the unRAID server. This should eliminate SMB/networking related issues from the possible cause if it results in identical checksums.
October 6, 200916 yr Author Thanks Joe, Good plan - I like your logic. I'll try that when I get home. regards Tony.
October 6, 200916 yr Author Hi Joe, OK I've done two different files over telnet each of them done through both of their "navigatable paths" one file from each HDD All four came back with the same MD5 sum I then copied the second of these two files over the network to my desktop. i copied it from the Disk share and from the User share. The disk share copy has the same MD5 sum as that i got for the file over telnet the User share copy has a different check sum... This is the expected result based on my previous copy / curruption results ======================> root@unRAID-Server:/mnt/user/SHARE/movies/DivX# md5sum p-bm.avi 1d84ab311aa017e370e8777410e8f0c2 p-bm.avi root@unRAID-Server:/mnt/disk2/SHARE/movies/DivX# md5sum p-bm.avi 1d84ab311aa017e370e8777410e8f0c2 p-bm.avi root@unRAID-Server:/mnt/disk1/SHARE/_UnWatched# md5sum burn.notice.s03e07.hdtv.xvid-fqm.avi c251de235ce81c77fc10f71a74f40ac5 burn.notice.s03e07.hdtv.xvid-fqm.avi root@unRAID-Server:/mnt/user/SHARE/_UnWatched# md5sum burn.notice.s03e07.hdtv.xvid-fqm.avi c251de235ce81c77fc10f71a74f40ac5 burn.notice.s03e07.hdtv.xvid-fqm.avi root@unRAID-Server:/mnt/user/SHARE/_UnWatched# USER Share calculated on my desktop Md5 = dd8bcbd94a63396aaf8cd2fa19215953 Disk Share file calculated on my desktop MD5 = c251de235ce81c77fc10f71a74f40ac5 <==================================== SO.... Does this point to a problem with the SMB User share? what to do next? thanks Joe, regards Tony
October 6, 200916 yr Hi Joe, OK I've done two different files over telnet each of them done through both of their "navigatable paths" one file from each HDD All four came back with the same MD5 sum I then copied the second of these two files over the network to my desktop. i copied it from the Disk share and from the User share. The disk share copy has the same MD5 sum as that i got for the file over telnet the User share copy has a different check sum... This is the expected result based on my previous copy / curruption results ======================> root@unRAID-Server:/mnt/user/SHARE/movies/DivX# md5sum p-bm.avi 1d84ab311aa017e370e8777410e8f0c2 p-bm.avi root@unRAID-Server:/mnt/disk2/SHARE/movies/DivX# md5sum p-bm.avi 1d84ab311aa017e370e8777410e8f0c2 p-bm.avi root@unRAID-Server:/mnt/disk1/SHARE/_UnWatched# md5sum burn.notice.s03e07.hdtv.xvid-fqm.avi c251de235ce81c77fc10f71a74f40ac5 burn.notice.s03e07.hdtv.xvid-fqm.avi root@unRAID-Server:/mnt/user/SHARE/_UnWatched# md5sum burn.notice.s03e07.hdtv.xvid-fqm.avi c251de235ce81c77fc10f71a74f40ac5 burn.notice.s03e07.hdtv.xvid-fqm.avi root@unRAID-Server:/mnt/user/SHARE/_UnWatched# USER Share calculated on my desktop Md5 = dd8bcbd94a63396aaf8cd2fa19215953 Disk Share file calculated on my desktop MD5 = c251de235ce81c77fc10f71a74f40ac5 <==================================== SO.... Does this point to a problem with the SMB User share? what to do next? thanks Joe, regards Tony So far, it sure looks like it. I think you said you are using unRAID 4.4.2. I know that the 4.5beta6 has a newer version of SAMBA? Are you willing to try a quick upgrade to it for a test. It would involve renaming the existing bzroot and bzimage to bzroot.442 and bzimage.442 and then un-zipping the 4.5beta6 release on your PC and copying the bzroot and bzimage files from it to your flash drive. Then, a simple reboot will have you up and running on 4.5b6, but with a different SAMBA version with hopefully a consistent checksum. If the results don't change, then you can either stay on 4.5b6, or rename the two files from bzroot.442 to bzroot, and bzimage.442 to bzimage, and reboot to get back to your existing version. If you get consistent checksums on the newer SAMBA version, then it is a strong argument to stay on the new release. It is not completely definitive, since the 4.5b6 version probably has updated drivers for the network cards too, in addition to a newer Linux kernel. Joe L.
October 6, 200916 yr Author OK... i copied the bzimage & bzroot files onto the flash share. rebooted and it crashed.... I scratched my head a bit. pulled the USB stick out and did a check some on the files on the USB and the files I'd downloaded. they were different. So i recopied the new 4.5.b6 files accross and it then booted I've pulled the same two .avi files accross the network form the USER share and from the DISK share. again the disk share version hs the correct check sum the USER share version has a random md5... Time for some more head scratching... any ideas Joe? (to double check the web gui definatly now says version: 4.5-beta6 so I did boot the right version)
October 6, 200916 yr OK... i copied the bzimage & bzroot files onto the flash share. rebooted and it crashed.... I scratched my head a bit. pulled the USB stick out and did a check some on the files on the USB and the files I'd downloaded. they were different. So i recopied the new 4.5.b6 files accross and it then booted I've pulled the same two .avi files accross the network form the USER share and from the DISK share. again the disk share version hs the correct check sum the USER share version has a random md5... Time for some more head scratching... any ideas Joe? (to double check the web gui definatly now says version: 4.5-beta6 so I did boot the right version) I'm going to do a similar experiment copying an ISO to my windows box. I've never had an issue, but who knows. I'll still figure hardware as your issue... but don't know what to ask you to try next. Nice that it is repeatable... but frustrating when there should be no difference... At least you know the files are on the unRAID box, and can be read on the disk shares... How much memory do you have installed? How big a file are you moving/checking with? Is the md5 mismatch only on big files? Joe L.
October 7, 200916 yr Author I've not tried the MD5 on small files but I first noticed the problem when i navigated to a folder of pictures and they were all currupt. when i navigated to the same folder but via the disk share they were not... I guess that's a good clue that even small files have the same problem - but i can double check when i get home. Could it be some quirk with the way the shares are configured? I've got the OS USB stick at work today and I've been looking at all the files, but I don't know what they all mean... I have 4GB of memory installed in my unraid server. 4 x 1GB sticks.
October 7, 200916 yr What happens if you don't use Samba? Try an MD5 tool on the unRAID server, directly on the files in the User Share path, as well as in the actual disk path. Then try copying the file (not across the network) from the User Share path to a temp folder in a disk path, and test its MD5. Always test twice, to make sure you get the same MD5 each time.
October 7, 200916 yr What happens if you don't use Samba? Try an MD5 tool on the unRAID server, directly on the files in the User Share path, as well as in the actual disk path. Then try copying the file (not across the network) from the User Share path to a temp folder in a disk path, and test its MD5. Always test twice, to make sure you get the same MD5 each time. He did exactly that in this post http://lime-technology.com/forum/index.php?topic=4377.msg39947#msg39947 The MD5 checksums all looked good, from both user-shares at /mnt/user and disk shares at /mnt/diskX when checked from the Linux command prompt. It is apparently very repeatable. It is only when reading SMB shares over the LAN from user-shares is the corruption happening. It is not happening with disk-shares over SMB. We yet don't know if it is a network card driver issue, or an SMB issue, or a memory corruption issue of some kind when both are in use through the fuse file-system. He did try upgrading to 4.5b6 from 4.4.2, and it made no difference. I know 4.5b6 uses a newer version of SAMBA, so it is less likely to be the culprit. If he has an alternate network card, that would be an interesting test. Also interesting test would be to remove all but one memory strip. It might eliminate a possible memory corruption as memory would be allocated differently. Joe L.
October 7, 200916 yr Author Joe To clarify, And this is a quick post from work, I haven't gone back and checked your post from yesturday but my test last night was from version 4.5 Beta 6 not beta 5. I picked the newest version i could see on the download site. I can try other versions if you think this is a valuable test but I would guess not. The desktop does have two NICs onboard. I've got one set off in the bios, so switching to the other would be a quick test. I'll do that when i get home. Also pulling 3 sticks of ram would also be quick so i'll do that as a next test. after that the only thing left to try would be to put the disks in my other machine, but this would have to wait untill the weekend as will require a lot of fiddeling and messing arround in cases which arn't easy to get to... still if we run out of ideas there is always that as a backup plan. Thanks Tony.
Archived
This topic is now archived and is closed to further replies.