fitbrit

Members
  • Posts

    445
  • Joined

  • Last visited

Posts posted by fitbrit

  1. Hi all

     

    I recently upgraded the guts of my server with a new mobo, CPU and RAM. I went from a socket 775 to a 4th gen i7 for my NORCO 4224. I have 8 drives connected via SATA 3 on the motherboard - parity 1, parity 2 and disks 1-6. The other 16 drives are connected via Marvel controllers, which I know are no longer recommended. They are not giving me any issues right now.

    What I am having trouble with is parity 2, disks 2, 3 and 5 - these are all disks connected to the motherboard SATA ports. Recently a spate of UDMA CRC errors have occurred - I would say about 100 over the past 6 days. They started soon after the hardware change.
    What I would like to know is how serious they are. I have invested in new reverse breakout cables, and these will be installed when they arrive. I am wondering whether it could be the motherboard SATA ports that are bad if the new cables do not fix the issues. Since the errors began, I have installed a new parity disk, added a second parity disk, and am currently increasing the size of one of the disks. Should I be concerned that may array now has erroneous data on it, or are these errors self correcting?

    Thanks for any light that can be shed.

     

    Model: Custom
    M/B: MSI - Z97 GAMING 7 (MS-7916)
    CPU: Intel® Core™ i7-4770K CPU @ 3.50GHz
    HVM: Enabled
    IOMMU: Disabled
    Cache: 256 kB, 1024 kB, 8192 kB
    Memory: 32 GB (max. installable capacity 32 GB)
    Network: eth0: 1000 Mb/s, full duplex, mtu 1500
    Kernel: Linux 4.18.8-unRAID x86_64
    OpenSSL: 1.1.0i
    Uptime: 
    
    Sep 26 12:09:00 MEDIASERVER root: Fix Common Problems Version 2018.09.08
    Sep 26 12:09:05 MEDIASERVER root: Fix Common Problems: Warning: Marvel Hard Drive Controller Installed ** Ignored
    Sep 26 12:09:08 MEDIASERVER ntpd[1864]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
    Sep 26 12:34:31 MEDIASERVER kernel: ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
    Sep 26 12:34:31 MEDIASERVER kernel: ata6.00: irq_stat 0x08000000, interface fatal error
    Sep 26 12:34:31 MEDIASERVER kernel: ata6: SError: { UnrecovData 10B8B BadCRC }
    Sep 26 12:34:31 MEDIASERVER kernel: ata6.00: failed command: READ DMA EXT
    Sep 26 12:34:31 MEDIASERVER kernel: ata6.00: cmd 25/00:00:a8:5c:0a/00:02:12:00:00/e0 tag 29 dma 262144 in
    Sep 26 12:34:31 MEDIASERVER kernel: res 50/00:00:a7:5c:0a/00:00:12:00:00/40 Emask 0x10 (ATA bus error)
    Sep 26 12:34:31 MEDIASERVER kernel: ata6.00: status: { DRDY }
    Sep 26 12:34:31 MEDIASERVER kernel: ata6: hard resetting link
    Sep 26 12:34:31 MEDIASERVER kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    Sep 26 12:34:31 MEDIASERVER kernel: ata6.00: configured for UDMA/133
    Sep 26 12:34:31 MEDIASERVER kernel: ata6: EH complete
    Sep 26 13:15:40 MEDIASERVER kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
    Sep 26 13:15:40 MEDIASERVER kernel: ata2.00: irq_stat 0x08000000, interface fatal error
    Sep 26 13:15:40 MEDIASERVER kernel: ata2: SError: { UnrecovData 10B8B BadCRC }
    Sep 26 13:15:40 MEDIASERVER kernel: ata2.00: failed command: READ DMA EXT
    Sep 26 13:15:40 MEDIASERVER kernel: ata2.00: cmd 25/00:08:90:d3:d8/00:02:27:00:00/e0 tag 3 dma 266240 in
    Sep 26 13:15:40 MEDIASERVER kernel: res 50/00:00:8f:d3:d8/00:00:27:00:00/e0 Emask 0x10 (ATA bus error)
    Sep 26 13:15:40 MEDIASERVER kernel: ata2.00: status: { DRDY }
    Sep 26 13:15:40 MEDIASERVER kernel: ata2: hard resetting link
    Sep 26 13:15:40 MEDIASERVER kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    Sep 26 13:15:40 MEDIASERVER kernel: ata2.00: configured for UDMA/133
    Sep 26 13:15:40 MEDIASERVER kernel: ata2: EH complete
    Sep 26 13:20:30 MEDIASERVER kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
    Sep 26 13:20:30 MEDIASERVER kernel: ata2.00: irq_stat 0x08000000, interface fatal error
    Sep 26 13:20:30 MEDIASERVER kernel: ata2: SError: { UnrecovData 10B8B BadCRC }
    Sep 26 13:20:30 MEDIASERVER kernel: ata2.00: failed command: READ DMA EXT
    Sep 26 13:20:30 MEDIASERVER kernel: ata2.00: cmd 25/00:08:28:59:66/00:02:2a:00:00/e0 tag 2 dma 266240 in
    Sep 26 13:20:30 MEDIASERVER kernel: res 50/00:00:27:59:66/00:00:2a:00:00/e0 Emask 0x10 (ATA bus error)
    Sep 26 13:20:30 MEDIASERVER kernel: ata2.00: status: { DRDY }
    Sep 26 13:20:30 MEDIASERVER kernel: ata2: hard resetting link
    Sep 26 13:20:31 MEDIASERVER kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    Sep 26 13:20:31 MEDIASERVER kernel: ata2.00: configured for UDMA/133
    Sep 26 13:20:31 MEDIASERVER kernel: ata2: EH complete
    Sep 26 13:20:48 MEDIASERVER nginx: 2018/09/26 13:20:48 [error] 7533#7533: *26027 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.2.16, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.2.10", referrer: "http://192.168.2.10/Main"
    Sep 26 13:20:48 MEDIASERVER php-fpm[7479]: [WARNING] [pool www] child 2303 exited on signal 7 (SIGBUS) after 84.000580 seconds from start

     

  2. But it doesn't do that. Right now, I can see that it's reading parity and ALL my other disks to write to the new disk. It is writing all zeros if my parity was correct, but it's calculating this decision on the fly. If it knew to just write all zeros, regardless of what's on the other disks, it would only be writing to the one drive, and NOT reaidng from all the others. Hence lower, power, heat and duration to do so. I know my system writes zeros at about 100 MB/s, but the fastest my data is rebuilt is at 50-60 MB/s between 4-6TB. It starts off at ~35 MB/s until 3TB, and then goes up to ~$2 MB/s until 4 TB.

  3. Let's say parity is 6TB, and you're replacing a 3TB array disk with a 6 TB drive. I am currently in the process of doing this for many 3TB drives.

     

    In theory, during the data rebuild, couldn't the last 3 TB on the 6 TB drive just be written with Zeros, after the first 3Tb of data has been rebuilt? That would certainly use less power, cause less stress on the drives and be faster, no?

  4. Interestingly, I have never had this issue -- and I have 14 VERY full Reiser disks on my media server [2-4TB disks, less than 1GB of free space on each]    I don't copy files as large as 35GB, however ... none of my media files exceed about 8.5GB in size.

     

    If you wanted to make sure you encountered this problem, what you would do is free up 10-20% on a drive, fill it up again, repeat several times. I never had a problem simply filling a reiserfs drive, it was once a disk had been partially emptied and refilled. Could be a month or a year until I would see the problems, and on many disks I never did. I have even seen the problems on disks that were never filled beyond 98% (40GB free), just a matter of time and use.

     

    I too migrated to XFS and haven't had problems since.

     

    Yes, that would describe my situation very well.

  5. Interestingly, I have never had this issue -- and I have 14 VERY full Reiser disks on my media server [2-4TB disks, less than 1GB of free space on each]    I don't copy files as large as 35GB, however ... none of my media files exceed about 8.5GB in size.

     

    Yeah, your disks are more full than mine, but I have 23 with parity. Maybe I should try replacing my gigabit switches to help even further - there are three between the Windows HTPC and the unraid server.