Squnkraid

Members
  • Posts

    26
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Squnkraid's Achievements

Noob

Noob (1/14)

1

Reputation

  1. Parity check finished with normal results; ~14 hours, ~100MB/s speed. Just like before. Server is back online, docker started. Radarr scanned all files and indeed, "Movie name" was no longer there. Changed the file extension on the large file in "lost & found" to .mkv and played it. It plays just fine... So all the moved files seem to be okay and not corrupted? That's something I don't understand. Corruption means loss of data and thus being unable to open/play a file, right? Figured it be best to just re-download it and delete the files in the "lost & found" share. In the end it probably all was due to a lack of power. Too many drives linked together, too long of a cable and thus a drop in power. Resulting in the sound I heard (and have heard on a couple of occasions), the bad speeds and ultimately read errors. So far all seems well again.
  2. I did not actually! Thank you for mentioning this! I didn't read the wiki entirely. My bad. Just opened the share and I see this: 139 11.7 GB 2014-09-19 13:37 Disk 3 = Not yet downloaded due to Parity running, but guessing it's .mkv = "movie name" 134 500 KB 2018-09-01 22:38 Disk 3 =.png = movieposter 136 500 KB 2018-09-01 22:38 Disk 3 =.png = movieposter (duplicate of 134) 137 500 KB 2018-09-01 22:38 Disk 3 =.png = movieposter (duplicate of 134) 133 373 KB 2018-09-01 22:38 Disk 3 =.png = background poster 138 163 KB 2017-09-05 07:20 Disk 3 =.srt = subtitle "movie name" 141 126 KB 2017-09-05 07:20 Disk 3 =.srt = subtitle "movie name" 140 8.08 KB 2021-06-08 03:20 Disk 3 =.nfo = "movie name" info & cast <movie> 142 4.42 KB 2017-09-05 06:57 Disk 3 =.nfo = releasegroup info etc. 135 888 B 2018-09-15 09:38 Disk 3 =.nfo = "movie name" info <movie> 143 609 B 2018-09-15 09:38 Disk 3 =.nfo = "movie name" info <video> Repair only mentioned one folder called "movie name". That folder is not showing up when I browse Disk 3. So my guess is that all content in that folder was somehow corrupted. So the movie file, subtitles and thumbnails. Looking at the file sizes that seems logical to me. EDIT: After downloading all the files to my PC, except the big one, I can confirm that my hunch was correct. After adding a file extension to all the files I was able to see what they were. All the files seem to be undamaged? So what should I do with them? Just re-download and be done with it and delete Lost & Found folder? Also, for my understanding, can you say what happened here exactly? Or a best guess of what caused the file corruption in this situation? Because Rebuild was successful, without errors. And the folder "movie name" is a really old folder that hasn't been written to in ages. Bit scary to see the limits of Parity after the array needs rebuilding. Straight up wake-up call to check all my backups!
  3. Think you were correct, although I still can't explain it all. But it seems to have been a power issue after all. I have a Be Quiet Pue Power 400CM. Across 2 Drive cables I had connected 13x 2,5" 5TB and 2x SSD After I expanded a few months ago it seemed I (unknowingly) connected a total of 11 drives on one Drive cable. All of them being Seagate 5TB drives. Which should be more than fine when it comes to total amount of power. But possibly the length of the cable was too much? I had 2x cables with each 5x SATA Power connectors linked together. It never caused any issue, so not sure why it did now, but rearranging power (and SATA cables, now all connected again to the HBA) seems to have done the job. No errors on boot, no errors in syslog and Parity Check started at full speed _and_ not filling syslog with the errors I had before. I'm going to get a third Drive Power cable for the PSU, so the drives are even more evenly spread out. Hopefully that will be the end of it. Still strange that parity check worked before. Keep you posted
  4. It looked like everything was normal again. Nothing abnormal in syslog. So, went ahead and changed out the SAS SATA cable (only 1 drive, Disk 2 was still conneted to the old SAS SATA cable). Aaaaaand next problem haha. As soon as I started a non correcting Parity Check i could hear a drive making abnormal sounds. Speeds of the check were terrible. As soon as I looked in Syslog: Jun 21 13:33:53 Server-UR kernel: mdcmd (84): check nocorrect Jun 21 13:33:53 Server-UR kernel: md: recovery thread: check P Q ... Jun 21 13:34:35 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:35 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:35 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 13:34:36 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 13:34:36 Server-UR rc.diskinfo[11766]: SIGHUP received, forcing refresh of disks info. Jun 21 13:34:40 Server-UR kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:40 Server-UR kernel: sd 2:0:2:0: Power-on or device reset occurred Jun 21 13:34:41 Server-UR kernel: sd 2:0:2:0: Power-on or device reset occurred Jun 21 13:34:41 Server-UR rc.diskinfo[11766]: SIGHUP received, forcing refresh of disks info. Jun 21 13:34:45 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:45 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:45 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 13:34:45 Server-UR rc.diskinfo[11766]: SIGHUP received, forcing refresh of disks info. Jun 21 13:34:45 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 13:34:49 Server-UR kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:49 Server-UR kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:50 Server-UR kernel: sd 2:0:2:0: Power-on or device reset occurred Jun 21 13:34:50 Server-UR rc.diskinfo[11766]: SIGHUP received, forcing refresh of disks info. Jun 21 13:34:50 Server-UR kernel: sd 2:0:2:0: Power-on or device reset occurred Jun 21 13:34:51 Server-UR kernel: ata2.00: exception Emask 0x50 SAct 0x300000 SErr 0x4890800 action 0xe frozen Jun 21 13:34:51 Server-UR kernel: ata2.00: irq_stat 0x0c400040, interface fatal error, connection status changed Jun 21 13:34:51 Server-UR kernel: ata2: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch } Jun 21 13:34:51 Server-UR kernel: ata2.00: failed command: READ FPDMA QUEUED Jun 21 13:34:51 Server-UR kernel: ata2.00: cmd 60/00:a0:40:f8:a2/04:00:00:00:00/40 tag 20 ncq dma 524288 in Jun 21 13:34:51 Server-UR kernel: res 40/00:a0:40:f8:a2/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Jun 21 13:34:51 Server-UR kernel: ata2.00: status: { DRDY } Jun 21 13:34:51 Server-UR kernel: ata2.00: failed command: READ FPDMA QUEUED Jun 21 13:34:51 Server-UR kernel: ata2.00: cmd 60/08:a8:40:fc:a2/00:00:00:00:00/40 tag 21 ncq dma 4096 in Jun 21 13:34:51 Server-UR kernel: res 40/00:a0:40:f8:a2/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Jun 21 13:34:51 Server-UR kernel: ata2.00: status: { DRDY } Jun 21 13:34:51 Server-UR kernel: ata2: hard resetting link Jun 21 13:34:54 Server-UR kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 21 13:34:54 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 13:34:54 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 13:34:54 Server-UR kernel: ata2.00: supports DRM functions and may not be fully accessible Jun 21 13:34:54 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 13:34:54 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 13:34:54 Server-UR kernel: ata2.00: supports DRM functions and may not be fully accessible Jun 21 13:34:54 Server-UR kernel: ata2.00: configured for UDMA/133 Jun 21 13:34:54 Server-UR kernel: ata2: EH complete Reconnected the SAS SATA cable again, at both ends, but same results when I start a Parity check. Just pulled the SAS SATA cable all together and connected Disk 2 with a SATA cable to the MB. Jun 21 15:28:14 Server-UR kernel: mdcmd (83): check Jun 21 15:28:14 Server-UR kernel: md: recovery thread: check P Q ... Jun 21 15:28:15 Server-UR kernel: ata3.00: exception Emask 0x50 SAct 0x40 SErr 0x4890800 action 0xe frozen Jun 21 15:28:15 Server-UR kernel: ata3.00: irq_stat 0x0c400040, interface fatal error, connection status changed Jun 21 15:28:15 Server-UR kernel: ata3: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch } Jun 21 15:28:15 Server-UR kernel: ata3.00: failed command: READ FPDMA QUEUED Jun 21 15:28:15 Server-UR kernel: ata3.00: cmd 60/f8:30:48:0c:01/03:00:00:00:00/40 tag 6 ncq dma 520192 in Jun 21 15:28:15 Server-UR kernel: res 40/00:30:48:0c:01/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Jun 21 15:28:15 Server-UR kernel: ata3.00: status: { DRDY } Jun 21 15:28:15 Server-UR kernel: ata3: hard resetting link Jun 21 15:28:18 Server-UR kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 21 15:28:18 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 15:28:18 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT2._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 15:28:18 Server-UR kernel: ata3.00: supports DRM functions and may not be fully accessible Jun 21 15:28:18 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 15:28:18 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT2._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 15:28:18 Server-UR kernel: ata3.00: supports DRM functions and may not be fully accessible Jun 21 15:28:18 Server-UR kernel: ata3.00: configured for UDMA/133 Jun 21 15:28:18 Server-UR kernel: ata3: EH complete Jun 21 15:28:22 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 15:28:22 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 15:28:23 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 15:28:23 Server-UR rc.diskinfo[11753]: SIGHUP received, forcing refresh of disks info. Jun 21 15:28:23 Server-UR kernel: ata3.00: exception Emask 0x50 SAct 0x2 SErr 0x4890800 action 0xe frozen Jun 21 15:28:23 Server-UR kernel: ata3.00: irq_stat 0x0c400040, interface fatal error, connection status changed Jun 21 15:28:23 Server-UR kernel: ata3: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch } Jun 21 15:28:23 Server-UR kernel: ata3.00: failed command: READ FPDMA QUEUED Jun 21 15:28:23 Server-UR kernel: ata3.00: cmd 60/00:08:c8:6e:03/04:00:00:00:00/40 tag 1 ncq dma 524288 in Jun 21 15:28:23 Server-UR kernel: res 40/00:08:c8:6e:03/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Jun 21 15:28:23 Server-UR kernel: ata3.00: status: { DRDY } Jun 21 15:28:23 Server-UR kernel: ata3: hard resetting link Jun 21 15:28:27 Server-UR kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 21 15:28:27 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 15:28:27 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT2._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 15:28:27 Server-UR kernel: ata3.00: supports DRM functions and may not be fully accessible Jun 21 15:28:27 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 15:28:27 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT2._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 15:28:27 Server-UR kernel: ata3.00: supports DRM functions and may not be fully accessible Jun 21 15:28:27 Server-UR kernel: ata3.00: configured for UDMA/133 Jun 21 15:28:27 Server-UR kernel: ata3: EH complete And this just repeats. Speeds are terrible. First it was ata2.00 and now ata3.00. Array starts without problems. Unraid doesn't report any issues. Disk 2, 3 and 4 all complete a SMART short self-test without errors. Only when I start a parity check problems emerge... And I can't figure out what I should change out next. Any ideas?
  5. Ugh, yeah, too tired it was haha. Was expecting "-nv" to give me some sort of answer of what to do next. Ran Check Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad CRC for inode 132 Bad flags2 set in inode 132 bad CRC for inode 132, will rewrite Bad flags2 set in inode 132 fixing bad flags2. directory inode 132 has bad size 8079572436068976644 cleared inode 132 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 3 - agno = 2 - agno = 1 - agno = 0 entry "Movie name" at block 0 offset 128 in directory inode 2147483776 references free inode 132 clearing inode number in entry at offset 128... - agno = 4 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... bad hash table for directory inode 2147483776 (no data entry): rebuilding rebuilding directory inode 2147483776 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 133, moving to lost+found disconnected inode 134, moving to lost+found disconnected inode 135, moving to lost+found disconnected inode 136, moving to lost+found disconnected inode 137, moving to lost+found disconnected inode 138, moving to lost+found disconnected inode 139, moving to lost+found disconnected inode 140, moving to lost+found disconnected inode 141, moving to lost+found disconnected inode 142, moving to lost+found disconnected inode 143, moving to lost+found Phase 7 - verify and correct link counts... resetting inode 2147483776 nlinks from 284 to 283 done Nothing left to do? Ran -nv again to see what that would give Phase 1 - find and verify superblock... - block cache size set to 1499720 entries Phase 2 - using internal log - zero log... zero_log: head block 221146 tail block 221146 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 0 - agno = 1 - agno = 3 - agno = 4 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Mon Jun 21 12:00:41 2021 Phase Start End Duration Phase 1: 06/21 12:00:35 06/21 12:00:35 Phase 2: 06/21 12:00:35 06/21 12:00:35 Phase 3: 06/21 12:00:35 06/21 12:00:38 3 seconds Phase 4: 06/21 12:00:38 06/21 12:00:38 Phase 5: Skipped Phase 6: 06/21 12:00:38 06/21 12:00:41 3 seconds Phase 7: 06/21 12:00:41 06/21 12:00:41 Total run time: 6 seconds So everything should be fine now I guess and I can boot up the array again?
  6. After 14 hours a successful rebuild. Next problem haha Rebooted the server and checked the log to be sure, glad I did. Came across this repeating error: kernel: XFS (dm-2): Metadata corruption detected at xfs_dinode_verify+0xa5/0x52e [xfs], inode 0x84 dinode kernel: XFS (dm-2): Unmount and run xfs_repair kernel: XFS (dm-2): First 128 bytes of corrupted metadata buffer: kernel: 000000002169b579: 49 4e 41 ff 03 02 00 00 00 00 00 63 00 00 00 64 INA........c...d kernel: 0000000019da1c05: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: 0000000072ff9089: 5e c5 9d 52 1f ea ff 25 5e c5 9e b0 07 59 9d 1c ^..R...%^....Y.. kernel: 00000000be2b61f7: 85 68 fc fa 01 2f 5d 03 70 20 68 52 bc ed c8 04 .h.../].p hR.... kernel: 0000000045f635c4: 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 ................ kernel: 0000000076bc70c3: 00 00 24 01 00 00 00 00 00 00 00 00 8f ea 0e ac ..$............. kernel: 000000003302a279: 69 f3 65 cb 49 98 8f 0f b1 c5 1a 58 1b a5 87 06 i.e.I......X.... kernel: 0000000024cd4b84: 2b 4d 62 2a 25 47 71 7f 30 ad a3 88 d0 ad e3 60 +Mb*%Gq.0......` DM-2 is disk 3 according to the log. Checking WIKI results in an XFS repair: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui So went ahead with Check -nV as suggested. The outcome is this: Phase 1 - find and verify superblock... - block cache size set to 1499720 entries Phase 2 - using internal log - zero log... zero_log: head block 221146 tail block 221146 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad CRC for inode 132 Bad flags2 set in inode 132 bad CRC for inode 132, would rewrite Bad flags2 set in inode 132 would fix bad flags2. directory inode 132 has bad size 8079572436068976644 would have cleared inode 132 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 bad CRC for inode 132, would rewrite Bad flags2 set in inode 132 would fix bad flags2. Would clear next_unlinked in inode 132 directory inode 132 has bad size 8079572436068976644 entry "Movie name" at block 0 offset 128 in directory inode 2147483776 references free inode 132 would clear inode number in entry at offset 128... would have cleared inode 132 - agno = 3 - agno = 4 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 entry "Movie name" in directory inode 2147483776 points to free inode 132, would junk entry bad hash table for directory inode 2147483776 (no data entry): would rebuild - agno = 2 - agno = 3 - agno = 4 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 133, would move to lost+found disconnected inode 134, would move to lost+found disconnected inode 135, would move to lost+found disconnected inode 136, would move to lost+found disconnected inode 137, would move to lost+found disconnected inode 138, would move to lost+found disconnected inode 139, would move to lost+found disconnected inode 140, would move to lost+found disconnected inode 141, would move to lost+found disconnected inode 142, would move to lost+found disconnected inode 143, would move to lost+found Phase 7 - verify link counts... would have reset inode 2147483776 nlinks from 284 to 283 No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Mon Jun 21 03:02:29 2021 Phase Start End Duration Phase 1: 06/21 03:02:23 06/21 03:02:23 Phase 2: 06/21 03:02:23 06/21 03:02:24 1 second Phase 3: 06/21 03:02:24 06/21 03:02:27 3 seconds Phase 4: 06/21 03:02:27 06/21 03:02:27 Phase 5: Skipped Phase 6: 06/21 03:02:27 06/21 03:02:29 2 seconds Phase 7: 06/21 03:02:29 06/21 03:02:29 Total run time: 6 seconds Now I'm not sure what to do. The WIKI says this: But I'm not sure how to interpret the outcome. I'm not seeing a clear "do this" message, or I'm just blink/tired. I tried googling this issue and found an answer saying "run Check -L", but I don't really understand what that does, so I thought I'd clear it first with you guys. Run "Check -L" or something else?
  7. Used a SATA cable on Disk 4. Changing out 1 thing at the time, just to be sure. Sync started, so I'll report back tomorrow if all goes well.
  8. Well, started a non correcting parity check. With immediate results: Disk 4 emulated, 1024 errors. Great... haha. Disk 4 is on the connector that used to be on Disk 3. So probably the cable is the one at fault? I have a spare SAS SATA cable. Should I just replace the SAS SATA it all together or first just disconnect Disk 4 and use a normal SATA cable and connect it to the motherboard? My 2 cents: First use a (new) SATA cable to the motherboard Rebuild disk 4 Parity check (non correcting) Replace the SAS SATA cables, but leave out Disk 4 Parity check (non correcting) If pass, connect Disk 3 too Parity check (non correcting) If pass: cable was bad, HBA is good, all is well again
  9. Thank you. Always forget there is an entire wiki... Sorry about that. Well, success after ~14 hours! But with a small hick-up. I had swapped the SAS SATA cable connectors between Disk 3 & 4. After that I started the procedure of rebuilding. I noticed that the speeds were pretty low to begin with. Around ~80 MB/s. Next day I woke up to an error. It aborted after 2,5 hours at 15%. Disk 3 had again 1024 errors. But SMART was still fine. I re-ordered the power cables (just for the hell of it) and unplugged the SAS SATA cable from Disk 3 and connected that disk with a normal (new) SATA cable directly to the motherboard. Booted the server and started the procedure of rebuilding again. The difference was huge. The starting speed was ~135 MB/s and this steadily declined during the rebuild. And after ~14 hours it completed with 0 errors with an average speed of 101 MB/s. Conclusions: Because I switched the SAS SATA cable connectors between Disk 3 & 4 I guess I can rule out a faulty cable. I don't see power being an issue, because it's all well within specs (and has worked for a long time without problems) So, that leaves the HBA card? Any idea on how to find out if it's bad? It's like it couldn't handle the traffic to Disk 3? Maybe something to do with SMR/cache? Final question: Can I resume normal operations again or should I first run another parity check (with/without write corrections)?
  10. Changed the cables and booted the server again. Tried to stop the array gave me a "retry unmounting disk share(s)" loop. Syslog showed a never ending list with the same messages: Jun 18 21:45:11 Server-UR emhttpd: shcmd (331): exit status: 32 Jun 18 21:45:11 Server-UR emhttpd: Retry unmounting disk share(s)... Jun 18 21:45:16 Server-UR emhttpd: Unmounting disks... Jun 18 21:45:16 Server-UR emhttpd: shcmd (332): umount /mnt/cache Jun 18 21:45:16 Server-UR root: umount: /mnt/cache: target is busy. Jun 18 21:45:16 Server-UR emhttpd: shcmd (332): exit status: 32 Some googling later; I finally disabled Docker and rebooted. Now I can stop/start the array again. But how do I enable Disk 3 Again? It's still red and I don't see an option to Rebuild? Do I just remove Disk 3 from the array, start/stop array again and add Disk 3 back? Just making sure I'm doing this correctly.
  11. OK, thank you so much. I know what to do now. And I'll be sure to write down any changes I make. I'll update this thread when I know more. And I totally forgot. About a 1,5 month ago Disk 1 started getting udma crc error count errors. Re-seating the cable didn't help so I went ahead and used a SATA cable directly to the motherboard. That stopped all errors, but apparently I didn't look hard enough and this cable is bad as well... Just my luck haha Ah okay, good to know. And yeah, performance isn't great, especially when running mover. But overall they are great little drives, perfect for long term media storage + Unraid.
  12. Really appreciate the quick reply! So next steps are: Shutdown, inspect & replace/reseat cable*, boot, rebuild. * Thanks for noticing Disk 1. I will double check the power cables and see which disks are connected to which SAS cable. Perhaps Disk 1 & 3 are running on the same cable. I have a spare SAS cable, so if that's the case I'll be sure to replace that. If not, what would you suggest is best? Just connect them with a SATA cable to the Motherboard or switch connecters with other drives and thus keep using the current SAS cable. BTW Is there a method to lookup the firmware version in Unraid? I started using the HBA cards a couple of months ago and flashed them myself. I used the links in https://forums.serverbuilds.net/t/guide-updating-your-lsi-sas-controller-with-a-uefi-motherboard/131 As far as I can tell they link to 20.00.07.00. I'll check on reboot to double check, but I'm pretty sure they are running the latest firmware version. I remember that being one of my concerns when flashing
  13. Hi, Seeking advice on what to do next. I ran mover today. Cache disk was filled ~92%. It looks like this has taken it's toll on Disk 3. Apparently Mover finished, but after that I got an error saying "Disk 3 is disabled, contents are emulated". I have attached the Syslog and Diagnostics for you to look at. Running: 13x 2,5" Seagate 5TB array (dual parity) 01x 2,5" Samsung 500GB cache 2x HBA M1015 cards What do you recommend I do next? From what I've read online I have only one option: Reboot & Rebuild (by adding disk 3 again). I have ran a SMART short self-test on Disk 3 that passed without any errors, I don't see why I should not add disk 3 again. I feel it's not the drive itself. My best guesses are: 1. Cable is bad 2. HBA card is bad 3. Too much I/O at once due to mover (and this being 2,5" SMR drives?) which resulted in a temp glitch. (Gut feeling) Your help and advice is much appreciated. server-ur-syslog-20210618-1650.zip server-ur-diagnostics-20210618-1903.zip
  14. I fixed it! To clarify and hopefully help someone else; What I ended up doing after updating DelugeVPN to :latest is this: - Edit DelugeVPN container and add all container ports that are running through DelugeVPN under "VPN_INPUT_PORTS" - To let those containers communicate with each other again, you need to change the local IP's in the respective settings of those containers (WebUI) to "localhost" That's pretty straight forward. WebUI is accessible and the containers can communicate with each other again. Problem was, for me at least, that SABnzbd is NOT running through the DelugeVPN. So Sonarr etc. were unable to connect to SAB. I have SABnzbd's WebUI set to port 8282 (Unifi already claimed 8080). Before the DelugeVPN update all other containers were connecting to SAB using "local lan IP:8282". That was no longer working. Q&A 27 in https://github.com/binhex/documentation/blob/master/docker/faq/vpn.md So editing DelugeVPN again and added port 8282 under "VPN_OUTPUT_PORTS". In Sonarr I tested it again, but I still couldn't connect with SAB. Only when using Local Lan IP it took some processing time, but eventually returned an error. Also tried localhost, but no luck. SABnzbd also uses port 9090 so I thought maybe that would work. I added 9090 to "VPN_OUTPUT_PORTS", but still no luck inside Sonarr. Nothing worked. But then I noticed that SAB uses port 8080 (opposed to 8282) when using the container IP. So I added 8080 to "VPN_OUTPUT_PORTS" in DelugeVPN and used the "container IP:8080" inside Sonarr and BINGO! Container IP AND port 8080 gave a successful connection! Already tried to download a NZB and it works like a charm again. @binhex Maybe I'm misreading Q&A 27, but as far as I can tell the container IP is not mentioned. Perhaps you could add that? Because it looks like that is the only way to get this to work. And thanks for the hard work!
  15. The following containers are routed through DelugeVPN (set this all up following Spaceinvader One's video's) - Sonarr, - Radarr, - Jackett, - NZBHydra Their ports are listed under VPN_INPUT_PORTS. I changed the server IP's to 'localhost' in settings and they can all communicate with each other and are accessible. No problem there. SABnzbd is outside the VPN connection. It uses port 8282. I have that port listed in DelugeVPN under VPN_OUTPUT_PORTS. But when I try to connect to it within Sonarr etc. it doesn't work. I have local server IP filled in, but also checked 'localhost' and 'container IP', it all comes back with an error. So not sure what I'm doing wrong here. I checked the Q&A section and all is setup like it should, I think. I'm probably overlooking something, but I don't know what.