nick5429

Community Developer
  • Posts

    121
  • Joined

  • Last visited

Everything posted by nick5429

  1. Can you point us to these cheap Norco non-hotswapping 5-in-3's, and ideally where to buy them? I've searched quite a bit and this is the only reference to non-backplane 5-in-3 I've been able to find :-/ I feel somewhat foolish paying $100 for a hotswapping 5-in-3 when you can get a non-hotswapping 4-in-3 for <$25. A cheap non-hotswapping 5-in-3 would be great.
  2. Disclaimer: very advanced topic. 99.9% of users will have no use for this. Intentionally posting code that doesn't compile lest someone try to do something they shouldn't. So I've been poking around a bit reading the raw block devices (/dev/sdX) via some C code I wrote a) for intellectual curiosity and b) to help further diagnose some of the weirdness I've reported in other threads I have 4 data disks (sda, sdb, sdc, sdd) and 1 parity (sde). Basic procedure (pseudocode mixed with actual C syscall names) //note: using lseek(2), open(2), etc rather than fseek(3), open(3), etc because they allow access to the entire block device with some special switches. Tried the latter with the same behavior. //simplified to just read 1 byte instead of a large buffer uint8_t diskBuf[5]; //one for each disk int parIdx=4; //parity lives at index 4 in the diskBuf array uint8_t calcParity; uint8_t expectedParity; int fd; off_t offset=511; devices = {sda, sdb, sdc, sdd, sde} for (d=0;d<5;d++){ fd = open("/dev/${devices[$d]}", O_RDONLY); //open /dev/sdX lseek(fd, offset, SEEK_SET); //seek to offset, starting from beginning of device read(fd, diskBuf[d], 1); printf("device %s reads 0x%02x", ${devices[$d]}, diskBuf[d]); close(fd); } calcParity = diskBuf[0] ^ diskBuf[1] ^ diskBuf[2] ^ diskBuf[3]; expectedParity = diskBuf[parIdx]; printf("Calculated parity: 0x%02x; parity read from sde: 0x%02x", calcParity, expectedParity); Output: Offset 511 is an easy example; even parity (which unraid uses) should be 0x00. This one could be hand-waved away as 'oh, well maybe it's actually using odd parity or something'. Thus: I can't make any sense of this. I don't think this has anything to do with the parity problem I reported in my other thread; this is consistent and has always returned the same values, the other is extremely flaky and inconsistent. I poked around in the unraid kernel driver code and verified that (as far as I can tell), the XOR calculation for parity is done the same. Could something in the unraid driver be interfering with my ability to directly read the parity device? I figured the unraid driver should only come into play if accessing the /dev/mdN devices, rather than the block devices. Has anyone else ever attempted something like this?
  3. Joe, Haha, thanks for the tips! I ran a brief memtest at the beginning of this, but you're right -- I do need to run a longer memtest check just to be sure. I'd been hoping to be able to reproduce the error more reliably before switching to and testing another controller lest I not prove anything to myself when I don't get an error. Though now that I'm typing this out, I realize that I do have a semi-reliable way to reproduce it: unRAID's built in parity check. That's what I get for trying to outsmart myself with all this fancy testing! Will report in a day or three after I've had a chance to run things through several cycles.
  4. Interesting. So I replaced all the SATA cables to no avail. I modified the script under 'how to troubleshoot recurring parity errors' from here: http://lime-technology.com/wiki/index.php/FAQ#Hard_Drives as follows, to allow greater flexibility / better logging. #!/bin/bash LOG_DIR=/root/hashes DEVICE=sda #COUNT=10000000 #5GB COUNT=2000000 #1GB SKIP=0 #start MAX=2000000000 #end block -- make sure the drive is at least this big #MAX=1017926000 #start #COUNT=10000 #5MB TIMES=9 #for each stride, repeat this many times cd $LOG_DIR if [ $# -ne 1 ] then echo "Need 1 param, got $#" exit else DEVICE=$1 echo "Running hashes for device=$DEVICE skip=$SKIP count=$COUNT" fi INITIALRESULT="" RESULT="" while [ $SKIP -lt $MAX ]; do echo "Begin $DEVICE at block $SKIP size $COUNT." INITIALRESULT=`dd if=/dev/$DEVICE skip=$SKIP count=$COUNT | md5sum -b | awk '{print $1}'` echo "Block $SKIP: $INITIALRESULT initial" >> $DEVICE.log for i in `seq 1 $TIMES` do RESULT=`dd if=/dev/$DEVICE skip=$SKIP count=$COUNT | md5sum -b | awk '{print $1}'` echo "Block $SKIP: $RESULT" >> $DEVICE.log if [ "$RESULT" != "$INITIALRESULT" ]; then echo "!!!!ERRORERRORERROR Block $SKIP md5 $RESULT did not match expected $INITIALRESULT" echo "!!!!ERRORERRORERROR Block $SKIP md5 $RESULT did not match expected $INITIALRESULT" >> $DEVICE.log fi done let SKIP=$SKIP+$COUNT done exit The script uses "dd" to read the raw contents of the same section of the disk 10 times and computes the md5 sum of the data it read and compares it to the initial read. An md5 sum will be the same for the same input data. I ran this over the first 1.5TB of each of my disks in parallel 2-3 times, resulting in ~45TB of reads from each disk (plus a few subset runs). In all that, I found two data miscompares. Even reading these same block addresses with a higher count, I've not been able to reproduce this through anything but "luck". Both miscompares happened on sda, but I can't reliably determine anything from a sample size of 2. In both cases, you see that it returns to reading the 'correct' data after having read the error, so it's not like this was caused by a write to the disk in the middle of the process. Aside: I'm not entirely confident that the "block number" reported by the unraid parity checker correlates with the skip/count parameters to dd. Whatever read pattern the unraid parity checker does seems to hit this more consistently; I've done a number of noncorrecting parity checks in the course of testing this, and typically end up with 1-4 parity mismatches. In contrast, a full run of my script runs 10x as long, but only found 2 errors in 3 full runs. The errors above are both in the 430000000 area, though I've noticed no such pattern in the parity checks. Note, not all of these ran to completion. All drives have completed an extended offline smart self-test without error. Currently attempting to reproduce the issue again. Any other thoughts? Ideas?
  5. Ah, thanks. I was having trouble sorting through the quasi-parallel lines of discussion
  6. I'm able to reproduce this on unraid 4.7, and have noticed it for a long time -- just sort of accepted it as the price of doing business, though it is quite annoying. This thread finally inspired me to do a little deeper digging. Please excuse the verbosity, just wanted to make sure I fully and accurately captured the causes/symptoms. Environment (thoroughly non-standard): unraid 4.7 installed on a full slackware 13.1 installation (running from an IDE drive). 5 modern SATA drives (1.5 - 2TB each) in unRAID array connected directly to motherboard SATA headers. AMD Phenom II X2. No cache drive. All disks manually spun up prior to the test. For the tests below, I made a point of playing a different mp3 each time (in VLC, per the OP), to minimize disk caching effects. I've noticed this with video playback at times in the past as well. The "skipping" causes no syslog entries, nor any noticeable spike when monitoring 'top'. I'm using Teracopy to copy the files over the network. My writes are typically sustained around 25-30MB/sec (while mp3 is concurrently playing back and not skipping). When the mp3 playback skipping occurs, the transfer rate reported by Teracopy drops to 5-7MB/sec. The skipping typically occurs several hundred MB into the transfer, and also sometimes occurs immediately after hitting 'cancel' on the Teracopy transfer (presumably the delete of the half-copied file?). Overall operation modes seem to be either: skipping (1-5 second pause, accompanied by low write speed for the same duration), and running fine at full speed (25+MB/sec transfer, no playback issues). Haven't noticed an 'in-between' ground. The MP3s being played are in the 3-10MB range at 128-192kbps (an exceptionally low data transfer/processing rate for modern hardware). Network is wired gigabit lan; ifconfig reports no errors/drops/collisions. Finally, I manually created a share on my IDE drive (non-array drive) to test (if you're copying files to set up this test, don't forget to do something like "sync; echo 3 > /proc/sys/vm/drop_caches" to clear the cache). I have only 1 IDE drive; hdparm unbuffered read test on it returns around 55MB/sec Based on the above, I am inclined to suspect the network infrastructure/drivers/etc or Samba as the culprit (either Samba itself, or the way in which unRAID sets up the samba configuration), since it never happened unless both operations were happening across the network, and the issue seems agnostic to the underlying storage. It's entirely possible my Slackware installation has a different Samba version than standard unraid 4.7 which may have introduced this issue. "/usr/sbin/smbd --version" returns "Version 3.5.2" for me.
  7. The post above shows that the AMD version of these servers does not support ESXi. Anyone know about the Intel Xeon servers that were for sale? What motherboard did it come with / does it support hardware passthrough?
  8. I seem to have developed intermittent parity errors (ie, iterative non-correcting parity checks don't show all the same errors) I had an unclean shutdown a while back and it's possible I didn't let the parity recalc finish. That was stupid, but is somewhat separate from my issue. Configuration: unRAID 4.7 on a full Slackware (13.1?) installation All drives are SATA and connected directly to motherboard headers. I have 3 1.5TB drives and 2 2TB drives: Status Disk Mounted Device Model/Serial Temp Reads Writes Errors Size Used %Used Free OK parity /dev/sde 9VT1_5YD517KW 31°C 38725618 2774174 OK /dev/md1 /mnt/disk1 /dev/sda SAMSUNG_HD154UI_S1Y6J1KS744713 * 33225543 415589 1.50T 1.50T 100% 1.38M OK /dev/md2 /mnt/disk2 /dev/sdc 00Z_WD-WMAVU3394155 * 24331755 242123 1.50T 1.47T 99% 26.74G OK /dev/md3 /mnt/disk3 /dev/sdb SAMSUNG_HD154UI_S1Y6J1KS744712 * 30915273 381624 1.50T 915.74G 62% 584.52G OK /dev/md4 /mnt/disk4 /dev/sdd 00P_WD-WCAZAD107336 31°C 46038453 1765660 2.00T 488.22G 25% 1.51T Total: 6.50T 4.38T 67% 2.12T Apparently my system configuration has some sort of log rotation turned on, so my syslog [attached] doesn't show my last boot (~2 months ago). Since a large portion of the identified "bad" blocks are >1500000000, my inclination is to think the issue lies with one of the 2tb drives (or hopefully the sata cables attaching to them). Smart reports attached. Aside: has anyone figured out a good way to determine which file a block maps to with reiserFS yet? ext2/3/4 has the 'debugfs' tool that can do it... Any thoughts other than 'replace the sata cables on the 2 2tb drives and try another non-correcting check'? smart.txt syslog.txt
  9. Just finished 2 (separate) rounds of preclear on a drive that will replace my parity, and just wanted some eyes more familiar with SMART stats to confirm that these deltas are nothing to be concerned about: 1st preclear 2nd preclear The only ones that seem intuitively concerning to me are the Raw_Read_Error_Rate and Hardware_ECC_Recovered changes. Though this page seems to indicate this may be fine. Thoughts?
  10. Is it possible to have my unraid samba shares visible ONLY to valid users? For instance, I have a user 'nick' and a user 'streaming'. 'Nick' is a valid user on all shares (movies, photos, backup, personal, etc) and has read/write access to everything. 'Streaming' is a valid user only on 'tv' and 'movies', and only has read access to those two shares. If 'streaming' tries to access any other shares, access is (correctly) denied. I want those shares to not show up as browseable for 'streaming' I set up my tv-connected streaming device to log in as 'streaming', and I want ONLY the 'tv' and 'movies' shares to be visible to that user so that I don't have to scroll through a dozen different shares just to get to the two that are relevant to the device. When 'nick' logs in, I want all shares to be visible and browseable. I've spent quite a bit of time over the past day trying to figure this out. There are a few relevant samba config options, but none does quite what I'm looking for. I've found about a dozen threads (on other forums/mailing lists) of people asking the same question, with no valid solutions offered. Setting "browseable=no" on the non-streaming shares would sort of accomplish this, but then 'nick' couldn't browse all the shares except by typing the name of the share or auto-mounting the shares on startup or something. I don't want to do this. Setting "hide unreadable=yes" sounds like it does exactly what I want. However, this only works on files/directories within a given share. It does not hide the top-level share itself. Any ideas??
  11. I'd forgotten that Makefiles can be extremely picky about using the correct whitespace (you must use tabs, rather than spaces, at the beginning of lines). Tab characters don't seem to have been preserved in Pastebin when I copy/pasted. Try downloading those two files using wget from here instead. edit: also, the preclear script needs updating for Slackware 13.1. See my post here for details. Joe said he'd update the main script, but in the meantime I've made my locally edited version of the preclear script available here. I make no promises whatsoever about the script, other than it worked for me
  12. I'd strongly recommend getting this to work first, before trying to mess with the dm-mod tweak. I think you've answered your own question here If your boot drive is on the raid controller, then you will not be able to boot unless you have the raid controller's drivers built into the kernel. A module won't work. Think about it: the module is stored on the hard drive that you're trying to access in order to boot from. But you can't access that hard drive until the module has been loaded! Another option is to create an initrd (initial ramdisk) which contains the relevant modules. If there's some legitimate reason that the mvsas driver should be built as a module rather than built into the kernel, then this is what you should do. This isn't something I've ever had great success with (potentially just from lack of really trying) and I've generally avoided the process. There should be a plethora of guides on how to do it available on Google. Here's a site that explains what these are and why you might need them. The easiest way to work around all of this, though, (and what I chose to do) is to just put your boot drive somewhere that is easily and directly accessible without going through an external controller. Directly connected to your motherboard (with the proper kernel drivers built-in), for instance.
  13. Strange. I just double-checked, and it still works fine for me. Have you successfully compiled and booted into the kernel (and tested that the unRAID interface works) without this tweak for dm-mod? You're using kernel 2.6.32.9, right? Take a look at the bottom of that Kconfig file. Make sure it includes the last two lines; it's easy to accidentally miss a line at the bottom of a large copy/paste: endmenu endif If that doesn't work, just replace them both with the originals that you copied over from the /unraid/ directory (from the wiki's instructions). dm-mod might not be absolutely critical...
  14. Strange. I just double-checked, and it still works fine for me. Have you successfully compiled and booted into the kernel without this tweak for dm-mod? You're using kernel 2.6.32.9, right? Take a look at the bottom of that Kconfig file. Make sure it includes the last two lines; it's easy to accidentally miss a line at the bottom of a large copy/paste: endmenu endif If that doesn't work, just replace them both with the originals that you copied over from the /unraid/ directory (from the wiki's instructions). dm-mod might not be absolutely critical...
  15. Check in your BIOS for settings related to IDE for the SATA chipset. you want to set the "mode" on the chipset to AHCI for best and native SATA performance. The only potentially-relevant setting I could find in my BIOS was for a SATA RAID mode, which some sites say might implicitly enable AHCI; I left it off. However, hdparm -I says NCQ is supported/enabled for these drives, which implies to me that the drives aren't running in legacy IDE mode.
  16. I don't actually have any of that hardware, except the nVidia motherboard. Looks like I have a few extra modules loading / kernel drivers compiled in that I don't need.
  17. After reading all the talk here on the board about 'parity errors', they were on my mind and I simply misspoke; thanks for being extra clear, though. I haven't been able to reproduce these ICRC errors in standalone testing yet as I'm not sure where on the disk they occurred, but... This "MBR preclear error" seems to stem simply from a different implementation of "echo" in my environment. My version of echo wants "\0" preceding octal numbers, and has no idea what I'm talking about when given, for instance, "\252" in the script: root@nickserver:/usr/src/linux# echo -ne "\252" \252root@nickserver:/usr/src/linux# "Step 6" # set MBR signature in last two bytes in MBR # two byte MBR signature echo -ne "\252" | dd bs=1 count=1 seek=511 of=$theDisk echo -ne "\125" | dd bs=1 count=1 seek=510 of=$theDisk The script is expecting out4 = 00170 and out5 = 00085 echo -ne "\252" | dd bs=1 count=1 seek=511 of=/dev/sdc >& /dev/null echo -ne "\125" | dd bs=1 count=1 seek=510 of=/dev/sdc >& /dev/null root@nickserver:~# dd bs=1 count=1 skip=511 if=/dev/sdc 2>/dev/null |sum|awk '{print $1}' 00092 root@nickserver:~# dd bs=1 count=1 skip=510 if=/dev/sdc 2>/dev/null |sum|awk '{print $1}' 00092 echo -ne "\0252" | dd bs=1 count=1 seek=511 of=/dev/sdc >& /dev/null echo -ne "\0125" | dd bs=1 count=1 seek=510 of=/dev/sdc >& /dev/null root@nickserver:~# dd bs=1 count=1 skip=511 if=/dev/sdc 2>/dev/null |sum|awk '{print $1}' #out4 00170 root@nickserver:~# dd bs=1 count=1 skip=510 if=/dev/sdc 2>/dev/null |sum|awk '{print $1}' #out5 00085 I'd like to think that's an accurate statement. On the other hand, it's possible that I know just enough to be a danger to myself ;-) Really though, thanks for the prod to 'go figure it out yourself'! This wasn't an issue that could have reasonably been figured out by anyone without access to my system. These are SATA drives. How would I check to make sure I'm not in IDE emulation mode? A quick bit of googling wasn't conclusive. The preclear script had a single CRC error that I haven't been able to repeat. I think I'm going to go ahead and power cycle and run it again, to see what happens. Though if anyone has other ideas (particularly to try to reproduce the CRC error) I'd be open to trying it, as a 10 hour test cycle is going to be a little frustrating if it keeps failing at the end Thanks again for your help, Joe.
  18. I'm setting up my unRAID (Pro) server for the first time (and running on a full Slackware 13.1 installation). Both of my SATA Samsung 1.5G 154UI drives gave me results similar to this after 10.5 hours: =========================================================================== = unRAID server Pre-Clear disk /dev/sdb = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Temperature: 32C, Elapsed Time: 10:32:36 ============================================================================ == == SORRY: Disk /dev/sdb MBR could NOT be precleared == == out4= 00092 == out5= 00092 ============================================================================ 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000245285 s, 2.1 MB/s 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000700 0000 0000 0000 003f 0000 7af1 aea8 0000 0000720 0000 0000 0000 0000 0000 0000 0000 0000 * 0000760 0000 0000 0000 0000 0000 0000 0000 5c5c 0001000 Each item is "DONE", but it fails with no indication of what the problem is or why, just "could NOT be precleared".... I see this in the syslog, but it seems odd that the parity errors would occur on both SATA drives at the exact same time Dec 14 01:55:48 nickserver kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x580000 action 0x6 Dec 14 01:55:48 nickserver kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x1980000 action 0x6 Dec 14 01:55:48 nickserver kernel: ata4.00: BMDMA stat 0x25 Dec 14 01:55:48 nickserver kernel: ata4: SError: { 10B8B Dispar LinkSeq TrStaTrns } Dec 14 01:55:48 nickserver kernel: ata3.00: BMDMA stat 0x25 Dec 14 01:55:48 nickserver kernel: ata4.00: failed command: WRITE DMA EXT Dec 14 01:55:48 nickserver kernel: ata4.00: cmd 35/00:00:68:53:f8/00:04:10:00:00/e0 tag 0 dma 524288 out Dec 14 01:55:48 nickserver kernel: res 51/84:b3:b5:54:f8/84:02:10:00:00/e0 Emask 0x10 (ATA bus error) Dec 14 01:55:48 nickserver kernel: ata4.00: status: { DRDY ERR } Dec 14 01:55:48 nickserver kernel: ata4.00: error: { ICRC ABRT } Dec 14 01:55:48 nickserver kernel: ata3: SError: { 10B8B Dispar Handshk } Dec 14 01:55:48 nickserver kernel: ata3.00: failed command: WRITE DMA EXT Dec 14 01:55:48 nickserver kernel: ata3.00: cmd 35/00:00:98:df:d2/00:04:0e:00:00/e0 tag 0 dma 524288 out Dec 14 01:55:48 nickserver kernel: res 51/84:61:37:e0:d2/84:03:0e:00:00/e0 Emask 0x10 (ATA bus error) Dec 14 01:55:48 nickserver kernel: ata3.00: status: { DRDY ERR } Dec 14 01:55:48 nickserver kernel: ata3.00: error: { ICRC ABRT } Thoughts? If I saw this in someone else's log, I might think it was due to an insufficient PSU. I don't think that's my issue, though; I've got a 480W Antec power supply running 3 HDDs, a CD/DVD drive, a graphics card, and the motherboard/CPU -- that's it. syslog.txt
  19. I just installed unRAID 4.6 on a fresh installation of 32-bit Slackware 13.1, loosely following the directions at http://www.lime-technology.com/wiki/index.php?title=Installing_unRAID_on_a_full_Slackware_distro I didn't have any "legacy" info in my unRAID configuration, so I chose kernel options to conform to the Slackware defaults (for instance, using the 'experimental' PATA drivers in libsata instead of the older ATA/IDE/etc option for PATA drives; this causes PATA drives to show up as /dev/sd* instead of /dev/hd*). Notable changes from the wiki instructions: kernel version is linux-2.6.32.9 You additionally need to copy over the following files from the unRAID distribution into your full Slackware distribution: /lib/libvolume_id.so.1.1.0 (and create a symlink to it from /lib/libvolume_id.so.1), /etc/exports-, /var/spool/cron/crontabs/root- The names of a variety of kernel options have changed; hopefully you can figure it out. I chose to disable "Device Drivers > ATA/ATAPI/MFM/RLL support" entirely, and enable the PATA drivers in "Serial ATA (prod) and Parallel ATA (experimental) drivers" instead, as explained above. Here's my kernel config: http://pastebin.com/HTnU8nYLp. Note that this is specific to my hardware, and is unlikely to work for you without modifications. Various things (lilo comes to mind) kept complaining that module 'dm-mod' did not exist. The unRAID kernel config files in the devices/md directory disabled the option to create it. To work around this: 1) build your kernel as specified in the wiki and make sure your system boots successfully; 2) replace /usr/src/linux/drivers/md/Makefile with this and /usr/src/linux/drivers/md/Kconfig with this; 3) 'make oldconfig' and enable dm-mod as a module, then 'make modules && make modules_install'; 4) add '/sbin/modprobe dm-mod' to /etc/rc.d/rc.modules. edit: it's probably better to download these two files (using wget) from my personal host here: http://www.nickmerryman.com/unraid/dmmod_2.6.32.9/ or from the attachments to this post (you'll have to rename the files) Follow the instructions in this thread to work around an issue in the UI where emhttp tries to call modprobe with an unsupported flag. I think the instructions there are slightly incorrect, or at least a bit unclear; you should replace "modprobe -rw" with "rmmod -w " (that's "rmmod<space><space><space><space>-w<space>", such that there are now two spaces between "-w" and "md-mod" and four spaces between "rmmod" and "-w"). Editing the binary in 'vim' worked fine for me. Also, the preclear script needs updating for Slackware 13.1. See my post here for details. Joe said he'd update the main script, but in the meantime I've made my locally edited version of the preclear script available here. I make no promises whatsoever about the script, other than it seems to have worked for me I think that's everything different from the wiki that I had to do. My drives are currently 'preclearing', and the only annoyance I currently have is that the "flash" samba share that unRAID creates shares my /boot partition, rather than actually sharing the /flash directory as its name would imply. And I have something set up wrong such that I don't get a pretty framebuffer on boot, but that's a) my own problem and b) not really a big deal. edit 12/13/2010: added preclear script details Kconfig.txt Makefile.txt