burnaby_boy

Members
  • Posts

    76
  • Joined

Everything posted by burnaby_boy

  1. I just checked the status of the array and noticed that one disk has a red dot beside it and has 80 errors. I checked the syslog and it is very long, so have attached the portion where the errors begin. It looks like the errors started a few hours ago. I'd appreciate any insight into what is going on and if the drive should be replaced. I have a 2 GB cache drive which could easily replace the problem drive. Thanks syslog.txt
  2. Thanks so much for your help with this issue, Joe. Very much appreciated!
  3. Hi Joe, I have run 3 successive NOCORRECT parity checks with no errors. Is there any need to run a regular parity check before resuming use of the array and adding new, precleared drives?
  4. Thanks, Joe. I've started another NOCORRECT parity check. Cheers
  5. Hi Joe, In nocorrect mode, the parity check completed with no sync errors. Does that mean that the problem is solved, or do I need to do further tests? Cheers
  6. Thanks, Joe. Yes I found it, and it's about halfway through the check. So far no errors. If the check completes without errors, is everything then OK with the array? Cheers
  7. I did smart reports on all the disks, and all reported no errors except for disk 9 that had a UDMA_CRC_ERROR_COUNT of 10. Does that have to do with cable issues? smart_sdd.txt
  8. How do I run the check in nocorrect mode? I'm using the default interface and I don't see where to set that. Thanks
  9. Hi Joe, Thanks for your input. The only thing that has changed in the array recently in the addition of a second sata card, an Adaptec 1430SA, that I installed yesterday. I put it in the second pci-e x16 slot. In the first slot I have a Supermicro AOC-SASLP-MV8 that has been in the server for the past year. I'll remove the Adaptec and run another check. Could that be the cause? Prior to that I have had few issues with the server since I built it last year. Thanks again.
  10. Greetings, I ran a parity check last night and it finished with 20 errors. I started another check, assuming that the errors would have been fixed during the first check. However, I'm halfway through the second parity check and 20 errors have popped up again. I am running unRaid 4.7 and my syslog is attached. I wondered if it might have be caused by the automatic movement of 15 files from the cache drive to the array in the middle of the check. Any ideas as to what might be going on? Cheers syslog.txt
  11. I have the latest bios installed for the board - GA-P35-DS3P (updated prior to installing unraid). In later bios releases Gigabyte must have wised up to the problems that HPA was creating and by default it was turned off. The affected drives were installed in other systems prior to being included in the unraid array, so no doubt HPA was added to the drives at an earlier point. Cheers
  12. After removing the HPA from both drives, a parity check showed 2 errors. I then ran a second parity check, which just completed showing no errors. I have not yet changed the Default partition format to MBR: 4k-aligned - once I do that, is there any need to run another parity check? Thanks again for your help, Joe. Cheers
  13. What do you mean when you say reconstruct the drive? Is that the hdparm command? After downgrading to 4.6, I used telnet to run the hdparm command on each drive separately. I received the same results as peter_sm. I then upgraded to 4.7 and rebooted, and followed the instructions for "trust the parity". The server is currently running a parity check - about 8 hours to go. Thanks so much for your help with this! Cheers
  14. Thanks for the prompt reply, Joe. After looking at the syslog, it turns out the size is identical to the previous example - both disks are SAMSUNG HD103UJ (current 1953523055, native 1953525168). I'll do as you suggest, downgrade to 4.6 and get rid of HPA there, upgrade to 4.7 and then use the "trust my parity" procedure. I guess this is it here - http://lime-technology.com/wiki/index.php?title=Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily Cheers
  15. Greetings, I upgraded to 4.7 today and had 2 disks show up being the wrong size. I searched the syslog and discovered that both have HPA detected on them. I am using a Gigabyte board but HPA has been off since before installing unraid. I was about to run the hdparm command that Joe L. recommended, but then wondered if there is any greater danger when 2 disks are affected. Just thought I'd check before going ahead. Cheers
  16. Thankfully, the parity check completed with no sync errors.
  17. OK, I'll run another parity check, Joe. Thanks again for your assistance with this - very much appreciated.
  18. Thanks for the info - so I guess the parity errors are unlikely to be from the superblock issue? Should I just run another parity check? Cheers
  19. I've just replaced an Adaptec 1430A with an Supermicro AOC-SASLP-MV8. All the drives were recognized and I was able to assign them to their proper positions in the array. I also added a new drive - a recertified WD 1.5TB EARS (jumpered and precleared) - that was a replacement for a faulty drive. I ran a parity check last night and it found 3 errors. I looked through the syslog and discovered the following reference to a bad superblock on the recertified drive: Nov 29 20:58:54 Media logger: mount: wrong fs type, bad option, bad superblock on /dev/md12, Nov 29 20:58:54 Media logger: missing codepage or helper program, or other error Nov 29 20:58:54 Media logger: In some cases useful info is found in syslog - try Nov 29 20:58:54 Media logger: dmesg | tail or so Nov 29 20:58:54 Media logger: Nov 29 20:58:54 Media emhttp: _shcmd: shcmd (108): exit status: 32 Nov 29 20:58:54 Media emhttp: disk12 mount error: 32 Nov 29 20:58:54 Media emhttp: shcmd (109): rmdir /mnt/disk12 Nov 29 20:58:54 Media kernel: REISERFS warning (device md12): sh-2021 reiserfs_fill_super: can not find reiserfs on md12 and then further down it looks like the issue is addressed: Nov 29 21:01:58 Media emhttp: shcmd (128): mkdir /mnt/disk12 Nov 29 21:01:58 Media emhttp: shcmd (129): set -o pipefail ; mount -t reiserfs -o noacl,nouser_xattr,noatime,nodiratime /dev/md12 /mnt/disk12 2>&1 | logger Nov 29 21:01:58 Media kernel: REISERFS (device md12): found reiserfs format "3.6" with standard journal Nov 29 21:01:58 Media kernel: REISERFS (device md12): using ordered data mode Nov 29 21:01:58 Media kernel: REISERFS (device md12): journal params: device md12, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Nov 29 21:01:58 Media kernel: REISERFS (device md12): checking transaction log (md12) Nov 29 21:02:00 Media kernel: REISERFS (device md12): Using r5 hash to sort names Nov 29 21:02:00 Media kernel: REISERFS (device md12): Created .reiserfs_priv - reserved for xattr storage. Nov 29 21:02:00 Media emhttp: resized: /mnt/disk12 Nov 29 21:02:00 Media emhttp: shcmd (130): chmod 700 '/mnt/disk12' Nov 29 21:02:00 Media emhttp: shcmd (131): rm /etc/samba/smb-shares.conf >/dev/null 2>&1 Nov 29 21:02:00 Media emhttp: shcmd (132): cp /etc/exports- /etc/exports Nov 29 21:02:00 Media emhttp: shcmd (133): killall -HUP smbd Nov 29 21:02:00 Media emhttp: shcmd (134): /etc/rc.d/rc.nfsd restart | logger There are also a whole lot of lines in the syslog referring to port 7 (the syslog which starts around 8pm last night was close to 300kb in size and too large to attach here): Nov 29 20:48:48 Media kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 20usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 7 ctrl sts=0x199800.usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 7 ctrl sts=0x199800. Nov 29 20:48:48 Media kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 20usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 7 ctrl sts=0x19usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 7 ctrl sts=0x199800. Nov 29 20:48:48 Media kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 207usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 7 ctrl sts=0x199800.usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 7 ctrl sts=0x199800. I'm wondering now whether I should run another parity check now or check the filesystems of disk 12. Some wise counsel would be much appreciated. This is the first time since building the unRaid server 10 months ago that I've had parity errors. I have attached the bottom portion of the syslog, including the bad superblock. Cheers, Ross syslog.txt
  20. The PSU's in both computers, that this HDD has been connected to, both seem to be sound. In the main unRaid unit (using a Corsair TX650W PSU) there are 11 other HDD's that seem to be free of the issues that this one has. In terms of vibration, when it was a part of the main array it was secured using silicone grommets. As you suggest, I will RMA, but include a note mentioning the continual sectors pending re-allocation. Thanks again for the assistance. Cheers
  21. I ran preclear 2 more times - on the first, the current pending sectors dropped by one, but then this is what I got on the second: ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 96 --- 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 132 63c63 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5682 --- 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5683 65c65 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 9 --- 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 46 ============================================================================ It seems to be getting worse. Do I just keep running preclear until the drive fails? - and then RMA it? Cheers
  22. OK, well I will try a couple more rounds with preclear and see what the results are. Thanks again for your analysis.
  23. Thanks for the feedback, Joe. It's a relatively new WD 1.5TB EARS (jumpered). Sounds like I should start an RMA. I certainly wouldn't feel comfortable putting it back in the array.
  24. Greetings, Last week I started to get errors on one drive in the array. There were no errors in the parity check, but the drive itself was showing errors - 67 and then 98. I replaced the drive with a new one and the data was rebuilt successfully. I put the questionable drive in an test unRaid server and did a preclear. These are the results: ============================================================================ 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 42 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 78 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5673 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5674 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 10 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 1 ============================================================================ Do I have any reason to be concerned? Cheers