Squnkraid

Members
  • Posts

    26
  • Joined

  • Last visited

Everything posted by Squnkraid

  1. Parity check finished with normal results; ~14 hours, ~100MB/s speed. Just like before. Server is back online, docker started. Radarr scanned all files and indeed, "Movie name" was no longer there. Changed the file extension on the large file in "lost & found" to .mkv and played it. It plays just fine... So all the moved files seem to be okay and not corrupted? That's something I don't understand. Corruption means loss of data and thus being unable to open/play a file, right? Figured it be best to just re-download it and delete the files in the "lost & found" share. In the end it probably all was due to a lack of power. Too many drives linked together, too long of a cable and thus a drop in power. Resulting in the sound I heard (and have heard on a couple of occasions), the bad speeds and ultimately read errors. So far all seems well again.
  2. I did not actually! Thank you for mentioning this! I didn't read the wiki entirely. My bad. Just opened the share and I see this: 139 11.7 GB 2014-09-19 13:37 Disk 3 = Not yet downloaded due to Parity running, but guessing it's .mkv = "movie name" 134 500 KB 2018-09-01 22:38 Disk 3 =.png = movieposter 136 500 KB 2018-09-01 22:38 Disk 3 =.png = movieposter (duplicate of 134) 137 500 KB 2018-09-01 22:38 Disk 3 =.png = movieposter (duplicate of 134) 133 373 KB 2018-09-01 22:38 Disk 3 =.png = background poster 138 163 KB 2017-09-05 07:20 Disk 3 =.srt = subtitle "movie name" 141 126 KB 2017-09-05 07:20 Disk 3 =.srt = subtitle "movie name" 140 8.08 KB 2021-06-08 03:20 Disk 3 =.nfo = "movie name" info & cast <movie> 142 4.42 KB 2017-09-05 06:57 Disk 3 =.nfo = releasegroup info etc. 135 888 B 2018-09-15 09:38 Disk 3 =.nfo = "movie name" info <movie> 143 609 B 2018-09-15 09:38 Disk 3 =.nfo = "movie name" info <video> Repair only mentioned one folder called "movie name". That folder is not showing up when I browse Disk 3. So my guess is that all content in that folder was somehow corrupted. So the movie file, subtitles and thumbnails. Looking at the file sizes that seems logical to me. EDIT: After downloading all the files to my PC, except the big one, I can confirm that my hunch was correct. After adding a file extension to all the files I was able to see what they were. All the files seem to be undamaged? So what should I do with them? Just re-download and be done with it and delete Lost & Found folder? Also, for my understanding, can you say what happened here exactly? Or a best guess of what caused the file corruption in this situation? Because Rebuild was successful, without errors. And the folder "movie name" is a really old folder that hasn't been written to in ages. Bit scary to see the limits of Parity after the array needs rebuilding. Straight up wake-up call to check all my backups!
  3. Think you were correct, although I still can't explain it all. But it seems to have been a power issue after all. I have a Be Quiet Pue Power 400CM. Across 2 Drive cables I had connected 13x 2,5" 5TB and 2x SSD After I expanded a few months ago it seemed I (unknowingly) connected a total of 11 drives on one Drive cable. All of them being Seagate 5TB drives. Which should be more than fine when it comes to total amount of power. But possibly the length of the cable was too much? I had 2x cables with each 5x SATA Power connectors linked together. It never caused any issue, so not sure why it did now, but rearranging power (and SATA cables, now all connected again to the HBA) seems to have done the job. No errors on boot, no errors in syslog and Parity Check started at full speed _and_ not filling syslog with the errors I had before. I'm going to get a third Drive Power cable for the PSU, so the drives are even more evenly spread out. Hopefully that will be the end of it. Still strange that parity check worked before. Keep you posted
  4. It looked like everything was normal again. Nothing abnormal in syslog. So, went ahead and changed out the SAS SATA cable (only 1 drive, Disk 2 was still conneted to the old SAS SATA cable). Aaaaaand next problem haha. As soon as I started a non correcting Parity Check i could hear a drive making abnormal sounds. Speeds of the check were terrible. As soon as I looked in Syslog: Jun 21 13:33:53 Server-UR kernel: mdcmd (84): check nocorrect Jun 21 13:33:53 Server-UR kernel: md: recovery thread: check P Q ... Jun 21 13:34:35 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:35 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:35 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 13:34:36 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 13:34:36 Server-UR rc.diskinfo[11766]: SIGHUP received, forcing refresh of disks info. Jun 21 13:34:40 Server-UR kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:40 Server-UR kernel: sd 2:0:2:0: Power-on or device reset occurred Jun 21 13:34:41 Server-UR kernel: sd 2:0:2:0: Power-on or device reset occurred Jun 21 13:34:41 Server-UR rc.diskinfo[11766]: SIGHUP received, forcing refresh of disks info. Jun 21 13:34:45 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:45 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:45 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 13:34:45 Server-UR rc.diskinfo[11766]: SIGHUP received, forcing refresh of disks info. Jun 21 13:34:45 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 13:34:49 Server-UR kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:49 Server-UR kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 13:34:50 Server-UR kernel: sd 2:0:2:0: Power-on or device reset occurred Jun 21 13:34:50 Server-UR rc.diskinfo[11766]: SIGHUP received, forcing refresh of disks info. Jun 21 13:34:50 Server-UR kernel: sd 2:0:2:0: Power-on or device reset occurred Jun 21 13:34:51 Server-UR kernel: ata2.00: exception Emask 0x50 SAct 0x300000 SErr 0x4890800 action 0xe frozen Jun 21 13:34:51 Server-UR kernel: ata2.00: irq_stat 0x0c400040, interface fatal error, connection status changed Jun 21 13:34:51 Server-UR kernel: ata2: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch } Jun 21 13:34:51 Server-UR kernel: ata2.00: failed command: READ FPDMA QUEUED Jun 21 13:34:51 Server-UR kernel: ata2.00: cmd 60/00:a0:40:f8:a2/04:00:00:00:00/40 tag 20 ncq dma 524288 in Jun 21 13:34:51 Server-UR kernel: res 40/00:a0:40:f8:a2/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Jun 21 13:34:51 Server-UR kernel: ata2.00: status: { DRDY } Jun 21 13:34:51 Server-UR kernel: ata2.00: failed command: READ FPDMA QUEUED Jun 21 13:34:51 Server-UR kernel: ata2.00: cmd 60/08:a8:40:fc:a2/00:00:00:00:00/40 tag 21 ncq dma 4096 in Jun 21 13:34:51 Server-UR kernel: res 40/00:a0:40:f8:a2/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Jun 21 13:34:51 Server-UR kernel: ata2.00: status: { DRDY } Jun 21 13:34:51 Server-UR kernel: ata2: hard resetting link Jun 21 13:34:54 Server-UR kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 21 13:34:54 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 13:34:54 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 13:34:54 Server-UR kernel: ata2.00: supports DRM functions and may not be fully accessible Jun 21 13:34:54 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 13:34:54 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 13:34:54 Server-UR kernel: ata2.00: supports DRM functions and may not be fully accessible Jun 21 13:34:54 Server-UR kernel: ata2.00: configured for UDMA/133 Jun 21 13:34:54 Server-UR kernel: ata2: EH complete Reconnected the SAS SATA cable again, at both ends, but same results when I start a Parity check. Just pulled the SAS SATA cable all together and connected Disk 2 with a SATA cable to the MB. Jun 21 15:28:14 Server-UR kernel: mdcmd (83): check Jun 21 15:28:14 Server-UR kernel: md: recovery thread: check P Q ... Jun 21 15:28:15 Server-UR kernel: ata3.00: exception Emask 0x50 SAct 0x40 SErr 0x4890800 action 0xe frozen Jun 21 15:28:15 Server-UR kernel: ata3.00: irq_stat 0x0c400040, interface fatal error, connection status changed Jun 21 15:28:15 Server-UR kernel: ata3: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch } Jun 21 15:28:15 Server-UR kernel: ata3.00: failed command: READ FPDMA QUEUED Jun 21 15:28:15 Server-UR kernel: ata3.00: cmd 60/f8:30:48:0c:01/03:00:00:00:00/40 tag 6 ncq dma 520192 in Jun 21 15:28:15 Server-UR kernel: res 40/00:30:48:0c:01/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Jun 21 15:28:15 Server-UR kernel: ata3.00: status: { DRDY } Jun 21 15:28:15 Server-UR kernel: ata3: hard resetting link Jun 21 15:28:18 Server-UR kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 21 15:28:18 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 15:28:18 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT2._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 15:28:18 Server-UR kernel: ata3.00: supports DRM functions and may not be fully accessible Jun 21 15:28:18 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 15:28:18 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT2._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 15:28:18 Server-UR kernel: ata3.00: supports DRM functions and may not be fully accessible Jun 21 15:28:18 Server-UR kernel: ata3.00: configured for UDMA/133 Jun 21 15:28:18 Server-UR kernel: ata3: EH complete Jun 21 15:28:22 Server-UR kernel: mpt2sas_cm1: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Jun 21 15:28:22 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 15:28:23 Server-UR kernel: sd 8:0:6:0: Power-on or device reset occurred Jun 21 15:28:23 Server-UR rc.diskinfo[11753]: SIGHUP received, forcing refresh of disks info. Jun 21 15:28:23 Server-UR kernel: ata3.00: exception Emask 0x50 SAct 0x2 SErr 0x4890800 action 0xe frozen Jun 21 15:28:23 Server-UR kernel: ata3.00: irq_stat 0x0c400040, interface fatal error, connection status changed Jun 21 15:28:23 Server-UR kernel: ata3: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch } Jun 21 15:28:23 Server-UR kernel: ata3.00: failed command: READ FPDMA QUEUED Jun 21 15:28:23 Server-UR kernel: ata3.00: cmd 60/00:08:c8:6e:03/04:00:00:00:00/40 tag 1 ncq dma 524288 in Jun 21 15:28:23 Server-UR kernel: res 40/00:08:c8:6e:03/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Jun 21 15:28:23 Server-UR kernel: ata3.00: status: { DRDY } Jun 21 15:28:23 Server-UR kernel: ata3: hard resetting link Jun 21 15:28:27 Server-UR kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 21 15:28:27 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 15:28:27 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT2._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 15:28:27 Server-UR kernel: ata3.00: supports DRM functions and may not be fully accessible Jun 21 15:28:27 Server-UR kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) Jun 21 15:28:27 Server-UR kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT2._GTF, AE_NOT_FOUND (20180810/psparse-514) Jun 21 15:28:27 Server-UR kernel: ata3.00: supports DRM functions and may not be fully accessible Jun 21 15:28:27 Server-UR kernel: ata3.00: configured for UDMA/133 Jun 21 15:28:27 Server-UR kernel: ata3: EH complete And this just repeats. Speeds are terrible. First it was ata2.00 and now ata3.00. Array starts without problems. Unraid doesn't report any issues. Disk 2, 3 and 4 all complete a SMART short self-test without errors. Only when I start a parity check problems emerge... And I can't figure out what I should change out next. Any ideas?
  5. Ugh, yeah, too tired it was haha. Was expecting "-nv" to give me some sort of answer of what to do next. Ran Check Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad CRC for inode 132 Bad flags2 set in inode 132 bad CRC for inode 132, will rewrite Bad flags2 set in inode 132 fixing bad flags2. directory inode 132 has bad size 8079572436068976644 cleared inode 132 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 3 - agno = 2 - agno = 1 - agno = 0 entry "Movie name" at block 0 offset 128 in directory inode 2147483776 references free inode 132 clearing inode number in entry at offset 128... - agno = 4 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... bad hash table for directory inode 2147483776 (no data entry): rebuilding rebuilding directory inode 2147483776 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 133, moving to lost+found disconnected inode 134, moving to lost+found disconnected inode 135, moving to lost+found disconnected inode 136, moving to lost+found disconnected inode 137, moving to lost+found disconnected inode 138, moving to lost+found disconnected inode 139, moving to lost+found disconnected inode 140, moving to lost+found disconnected inode 141, moving to lost+found disconnected inode 142, moving to lost+found disconnected inode 143, moving to lost+found Phase 7 - verify and correct link counts... resetting inode 2147483776 nlinks from 284 to 283 done Nothing left to do? Ran -nv again to see what that would give Phase 1 - find and verify superblock... - block cache size set to 1499720 entries Phase 2 - using internal log - zero log... zero_log: head block 221146 tail block 221146 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 0 - agno = 1 - agno = 3 - agno = 4 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Mon Jun 21 12:00:41 2021 Phase Start End Duration Phase 1: 06/21 12:00:35 06/21 12:00:35 Phase 2: 06/21 12:00:35 06/21 12:00:35 Phase 3: 06/21 12:00:35 06/21 12:00:38 3 seconds Phase 4: 06/21 12:00:38 06/21 12:00:38 Phase 5: Skipped Phase 6: 06/21 12:00:38 06/21 12:00:41 3 seconds Phase 7: 06/21 12:00:41 06/21 12:00:41 Total run time: 6 seconds So everything should be fine now I guess and I can boot up the array again?
  6. After 14 hours a successful rebuild. Next problem haha Rebooted the server and checked the log to be sure, glad I did. Came across this repeating error: kernel: XFS (dm-2): Metadata corruption detected at xfs_dinode_verify+0xa5/0x52e [xfs], inode 0x84 dinode kernel: XFS (dm-2): Unmount and run xfs_repair kernel: XFS (dm-2): First 128 bytes of corrupted metadata buffer: kernel: 000000002169b579: 49 4e 41 ff 03 02 00 00 00 00 00 63 00 00 00 64 INA........c...d kernel: 0000000019da1c05: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: 0000000072ff9089: 5e c5 9d 52 1f ea ff 25 5e c5 9e b0 07 59 9d 1c ^..R...%^....Y.. kernel: 00000000be2b61f7: 85 68 fc fa 01 2f 5d 03 70 20 68 52 bc ed c8 04 .h.../].p hR.... kernel: 0000000045f635c4: 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 ................ kernel: 0000000076bc70c3: 00 00 24 01 00 00 00 00 00 00 00 00 8f ea 0e ac ..$............. kernel: 000000003302a279: 69 f3 65 cb 49 98 8f 0f b1 c5 1a 58 1b a5 87 06 i.e.I......X.... kernel: 0000000024cd4b84: 2b 4d 62 2a 25 47 71 7f 30 ad a3 88 d0 ad e3 60 +Mb*%Gq.0......` DM-2 is disk 3 according to the log. Checking WIKI results in an XFS repair: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui So went ahead with Check -nV as suggested. The outcome is this: Phase 1 - find and verify superblock... - block cache size set to 1499720 entries Phase 2 - using internal log - zero log... zero_log: head block 221146 tail block 221146 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad CRC for inode 132 Bad flags2 set in inode 132 bad CRC for inode 132, would rewrite Bad flags2 set in inode 132 would fix bad flags2. directory inode 132 has bad size 8079572436068976644 would have cleared inode 132 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 bad CRC for inode 132, would rewrite Bad flags2 set in inode 132 would fix bad flags2. Would clear next_unlinked in inode 132 directory inode 132 has bad size 8079572436068976644 entry "Movie name" at block 0 offset 128 in directory inode 2147483776 references free inode 132 would clear inode number in entry at offset 128... would have cleared inode 132 - agno = 3 - agno = 4 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 entry "Movie name" in directory inode 2147483776 points to free inode 132, would junk entry bad hash table for directory inode 2147483776 (no data entry): would rebuild - agno = 2 - agno = 3 - agno = 4 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 133, would move to lost+found disconnected inode 134, would move to lost+found disconnected inode 135, would move to lost+found disconnected inode 136, would move to lost+found disconnected inode 137, would move to lost+found disconnected inode 138, would move to lost+found disconnected inode 139, would move to lost+found disconnected inode 140, would move to lost+found disconnected inode 141, would move to lost+found disconnected inode 142, would move to lost+found disconnected inode 143, would move to lost+found Phase 7 - verify link counts... would have reset inode 2147483776 nlinks from 284 to 283 No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Mon Jun 21 03:02:29 2021 Phase Start End Duration Phase 1: 06/21 03:02:23 06/21 03:02:23 Phase 2: 06/21 03:02:23 06/21 03:02:24 1 second Phase 3: 06/21 03:02:24 06/21 03:02:27 3 seconds Phase 4: 06/21 03:02:27 06/21 03:02:27 Phase 5: Skipped Phase 6: 06/21 03:02:27 06/21 03:02:29 2 seconds Phase 7: 06/21 03:02:29 06/21 03:02:29 Total run time: 6 seconds Now I'm not sure what to do. The WIKI says this: But I'm not sure how to interpret the outcome. I'm not seeing a clear "do this" message, or I'm just blink/tired. I tried googling this issue and found an answer saying "run Check -L", but I don't really understand what that does, so I thought I'd clear it first with you guys. Run "Check -L" or something else?
  7. Used a SATA cable on Disk 4. Changing out 1 thing at the time, just to be sure. Sync started, so I'll report back tomorrow if all goes well.
  8. Well, started a non correcting parity check. With immediate results: Disk 4 emulated, 1024 errors. Great... haha. Disk 4 is on the connector that used to be on Disk 3. So probably the cable is the one at fault? I have a spare SAS SATA cable. Should I just replace the SAS SATA it all together or first just disconnect Disk 4 and use a normal SATA cable and connect it to the motherboard? My 2 cents: First use a (new) SATA cable to the motherboard Rebuild disk 4 Parity check (non correcting) Replace the SAS SATA cables, but leave out Disk 4 Parity check (non correcting) If pass, connect Disk 3 too Parity check (non correcting) If pass: cable was bad, HBA is good, all is well again
  9. Thank you. Always forget there is an entire wiki... Sorry about that. Well, success after ~14 hours! But with a small hick-up. I had swapped the SAS SATA cable connectors between Disk 3 & 4. After that I started the procedure of rebuilding. I noticed that the speeds were pretty low to begin with. Around ~80 MB/s. Next day I woke up to an error. It aborted after 2,5 hours at 15%. Disk 3 had again 1024 errors. But SMART was still fine. I re-ordered the power cables (just for the hell of it) and unplugged the SAS SATA cable from Disk 3 and connected that disk with a normal (new) SATA cable directly to the motherboard. Booted the server and started the procedure of rebuilding again. The difference was huge. The starting speed was ~135 MB/s and this steadily declined during the rebuild. And after ~14 hours it completed with 0 errors with an average speed of 101 MB/s. Conclusions: Because I switched the SAS SATA cable connectors between Disk 3 & 4 I guess I can rule out a faulty cable. I don't see power being an issue, because it's all well within specs (and has worked for a long time without problems) So, that leaves the HBA card? Any idea on how to find out if it's bad? It's like it couldn't handle the traffic to Disk 3? Maybe something to do with SMR/cache? Final question: Can I resume normal operations again or should I first run another parity check (with/without write corrections)?
  10. Changed the cables and booted the server again. Tried to stop the array gave me a "retry unmounting disk share(s)" loop. Syslog showed a never ending list with the same messages: Jun 18 21:45:11 Server-UR emhttpd: shcmd (331): exit status: 32 Jun 18 21:45:11 Server-UR emhttpd: Retry unmounting disk share(s)... Jun 18 21:45:16 Server-UR emhttpd: Unmounting disks... Jun 18 21:45:16 Server-UR emhttpd: shcmd (332): umount /mnt/cache Jun 18 21:45:16 Server-UR root: umount: /mnt/cache: target is busy. Jun 18 21:45:16 Server-UR emhttpd: shcmd (332): exit status: 32 Some googling later; I finally disabled Docker and rebooted. Now I can stop/start the array again. But how do I enable Disk 3 Again? It's still red and I don't see an option to Rebuild? Do I just remove Disk 3 from the array, start/stop array again and add Disk 3 back? Just making sure I'm doing this correctly.
  11. OK, thank you so much. I know what to do now. And I'll be sure to write down any changes I make. I'll update this thread when I know more. And I totally forgot. About a 1,5 month ago Disk 1 started getting udma crc error count errors. Re-seating the cable didn't help so I went ahead and used a SATA cable directly to the motherboard. That stopped all errors, but apparently I didn't look hard enough and this cable is bad as well... Just my luck haha Ah okay, good to know. And yeah, performance isn't great, especially when running mover. But overall they are great little drives, perfect for long term media storage + Unraid.
  12. Really appreciate the quick reply! So next steps are: Shutdown, inspect & replace/reseat cable*, boot, rebuild. * Thanks for noticing Disk 1. I will double check the power cables and see which disks are connected to which SAS cable. Perhaps Disk 1 & 3 are running on the same cable. I have a spare SAS cable, so if that's the case I'll be sure to replace that. If not, what would you suggest is best? Just connect them with a SATA cable to the Motherboard or switch connecters with other drives and thus keep using the current SAS cable. BTW Is there a method to lookup the firmware version in Unraid? I started using the HBA cards a couple of months ago and flashed them myself. I used the links in https://forums.serverbuilds.net/t/guide-updating-your-lsi-sas-controller-with-a-uefi-motherboard/131 As far as I can tell they link to 20.00.07.00. I'll check on reboot to double check, but I'm pretty sure they are running the latest firmware version. I remember that being one of my concerns when flashing
  13. Hi, Seeking advice on what to do next. I ran mover today. Cache disk was filled ~92%. It looks like this has taken it's toll on Disk 3. Apparently Mover finished, but after that I got an error saying "Disk 3 is disabled, contents are emulated". I have attached the Syslog and Diagnostics for you to look at. Running: 13x 2,5" Seagate 5TB array (dual parity) 01x 2,5" Samsung 500GB cache 2x HBA M1015 cards What do you recommend I do next? From what I've read online I have only one option: Reboot & Rebuild (by adding disk 3 again). I have ran a SMART short self-test on Disk 3 that passed without any errors, I don't see why I should not add disk 3 again. I feel it's not the drive itself. My best guesses are: 1. Cable is bad 2. HBA card is bad 3. Too much I/O at once due to mover (and this being 2,5" SMR drives?) which resulted in a temp glitch. (Gut feeling) Your help and advice is much appreciated. server-ur-syslog-20210618-1650.zip server-ur-diagnostics-20210618-1903.zip
  14. I fixed it! To clarify and hopefully help someone else; What I ended up doing after updating DelugeVPN to :latest is this: - Edit DelugeVPN container and add all container ports that are running through DelugeVPN under "VPN_INPUT_PORTS" - To let those containers communicate with each other again, you need to change the local IP's in the respective settings of those containers (WebUI) to "localhost" That's pretty straight forward. WebUI is accessible and the containers can communicate with each other again. Problem was, for me at least, that SABnzbd is NOT running through the DelugeVPN. So Sonarr etc. were unable to connect to SAB. I have SABnzbd's WebUI set to port 8282 (Unifi already claimed 8080). Before the DelugeVPN update all other containers were connecting to SAB using "local lan IP:8282". That was no longer working. Q&A 27 in https://github.com/binhex/documentation/blob/master/docker/faq/vpn.md So editing DelugeVPN again and added port 8282 under "VPN_OUTPUT_PORTS". In Sonarr I tested it again, but I still couldn't connect with SAB. Only when using Local Lan IP it took some processing time, but eventually returned an error. Also tried localhost, but no luck. SABnzbd also uses port 9090 so I thought maybe that would work. I added 9090 to "VPN_OUTPUT_PORTS", but still no luck inside Sonarr. Nothing worked. But then I noticed that SAB uses port 8080 (opposed to 8282) when using the container IP. So I added 8080 to "VPN_OUTPUT_PORTS" in DelugeVPN and used the "container IP:8080" inside Sonarr and BINGO! Container IP AND port 8080 gave a successful connection! Already tried to download a NZB and it works like a charm again. @binhex Maybe I'm misreading Q&A 27, but as far as I can tell the container IP is not mentioned. Perhaps you could add that? Because it looks like that is the only way to get this to work. And thanks for the hard work!
  15. The following containers are routed through DelugeVPN (set this all up following Spaceinvader One's video's) - Sonarr, - Radarr, - Jackett, - NZBHydra Their ports are listed under VPN_INPUT_PORTS. I changed the server IP's to 'localhost' in settings and they can all communicate with each other and are accessible. No problem there. SABnzbd is outside the VPN connection. It uses port 8282. I have that port listed in DelugeVPN under VPN_OUTPUT_PORTS. But when I try to connect to it within Sonarr etc. it doesn't work. I have local server IP filled in, but also checked 'localhost' and 'container IP', it all comes back with an error. So not sure what I'm doing wrong here. I checked the Q&A section and all is setup like it should, I think. I'm probably overlooking something, but I don't know what.
  16. Really appreciate it. The :latest images contains both VPN_INPUT_PORTS and VPN_OUTPUT_PORTS so I used those. I added all the containers using Delugevpn to VPN_INPUT_PORTS and they all became accessible again! And they can communicate with each other too. But SABnzbd is running outside the Delugevpn container (no need for VPN) and I can't get Sonarr,Radarr and NZBHydra to communicate with SABnzbd. I re-ran every step and the only thing I'm not sure about is the Proxy Settings Sonarr > Settings > General > Proxy Settings - Use Proxy = Yes - Proxy Type = HTTP(S) - Hostname = ip of server / localhost (tried both) - Port = 8118 - Username = "empty" - Password = "empty" - Addres for the proxy to ignore = ip of server / 192.168.*.* (wildcard for LAN) (tried both) This doesn't work? Privoxy is on in DelugeVPN container. After some searching I found this: https://github.com/binhex/arch-delugevpn/issues/257 and I'm guessing that's the problem? Looking at my Delugevpn container logs I seem to be running into the same issue (again, running ":latest") 2021-03-06 00:55:11,052 DEBG 'watchdog-script' stdout output: [info] Privoxy not running 2021-03-06 00:55:11,053 DEBG 'watchdog-script' stdout output: [info] Attempting to start Privoxy... 2021-03-06 00:55:12,057 DEBG 'watchdog-script' stdout output: [info] Privoxy process started [info] Waiting for Privoxy process to start listening on port 8118... 2021-03-06 00:55:12,061 DEBG 'watchdog-script' stdout output: [info] Privoxy process listening on port 8118 2021-03-06 00:55:42,099 DEBG 'watchdog-script' stdout output: [info] Privoxy not running 2021-03-06 00:55:42,100 DEBG 'watchdog-script' stdout output: [info] Attempting to start Privoxy... 2021-03-06 00:55:42,099 DEBG 'watchdog-script' stdout output: [info] Privoxy not running 2021-03-06 00:55:42,100 DEBG 'watchdog-script' stdout output: [info] Attempting to start Privoxy... 2021-03-06 00:55:43,106 DEBG 'watchdog-script' stdout output: [info] Privoxy process started [info] Waiting for Privoxy process to start listening on port 8118... 2021-03-06 00:55:43,116 DEBG 'watchdog-script' stdout output: [info] Privoxy process listening on port 8118 It looks to be crashing and restarting every 30 seconds. Am I missing something in the setup or is this indeed a bug? If so, please let me know if you need me to follow these steps to give you the info you need: https://github.com/binhex/documentation/blob/master/docker/faq/help.md EDIT: I tried removing the Privoxy folder in Appdata and restarting the container, but that didn't help.
  17. I'm on :latest, but I don't have an 'ADDITIONAL_PORTS' option? What gives? I mean, Q24 says: "Edit 'ADDITIONAL_PORTS' env var and put applications Web UI port number in the 'value', if multiple ports required then use a comma to separate." On a side note - this is kind of hell for a newb Unraid user tbh. I'm new to Unraid and I'm in hour 4 trying to get this fixed, but I'm just going deeper down the rabbit hole. After finding this topic I changed the localhost and additional ports etc. But Sonarr (connecting to Delugevpn) couldn't connect to Sabnzbd (outside VPN network). Nothing was working. Finally decided to roll back but accidentally put a "~" in front of the version to pull. This resulted in a deleted Delugevpn container... First time for everything, right... Re-installing it, restoring appdata, nothing gave me back my previous setup. Appdata was intact, but the container was just not showing any of my old settings. No idea which search terms I needed to use to get this to work again so I decided for now to setup Delugevpn from the ground up (rewatching Spaceinvader's video) and find out later what I did wrong when restoring the backup. I followed the steps in de Q&A section, but like I said, now I don't see any "additional_ports" option is there. Kind of frustrating. Bouncing of the walls while falling to the bottom haha.
  18. Didn't know that. After thinking about it this also makes more sense and explains the two times 1124 errors found. Rebooted the server and everything is running great again Thank you again for being so helpful! Greatly appreciate it!
  19. The very first check after the powerloss was 'correcting', right? Does that matter? Unraid started that Parity Check by itself and by default "correcting" is enabled, right? Anyway, ran another Parity Check (non-correcting) as you suggested and it passed: Parity check finished (0 errors) Duration: 14 hours, 44 minutes, 35 seconds. Average speed: 94.2 MB/s Do I need to do anything else before I bring my dockers online again? Because the 3 drives that were affected still show an "error" in the dashboard array window. I assume I just need to "acknowledge" the errors and get on with it? Because I already did a Short and a Extended SMART test on all 3 drives. And they all passed.
  20. I just double checked in Parity/Read-Check history and I wrote it down wrong; both were 1124. 2021-01-16, 12:52:45 - 14 hr, 59 min, 32 sec - 92.7 MB/s - Status OK - Errors 0 Reboot 2021-01-15, 07:33:25 - 15 hr, 53 min, 47 sec - 87.4 MB/s - Status OK - Errors 1124 2021-01-14, 00:34:02 - 15 hr, 28 min, 25 sec - 89.8 MB/s - Status OK - Errors 1124 Startup Powerloss
  21. TL:DR Powerloss on server with 12 disks. 2 Parity Checks with 1024/1124 errors 3 disks with a few "reported Uncorrected" errors 3 disks passed Extended SMART test Reboot; Parity check (without correction) 0 errors 3 disks passed Short SMART test Another test? Or acknowledge errors and get on with it? Story: I had a accidental powerloss a few days ago. Used a Blitzwolff plug to measure the power of the server (before UPS) but when opening the app to watch the power consumption a ghost touch on the screen resulted in a power off... So, never doing that again. I tried searching online about what to do next. These are the steps that I followed: After reboot Parity Check ran Result: errors 1024 - Parity valid Three disks (array of 12 disks) showed a orange thumb down: parity drive: Reported uncorrect 10 parity2 drive: Reported uncorrect 26 disk 10: Reported uncorrect 6 Ran extended SMART test on all three drives: Passed Ran another Parity Check: Result: errors 1124 - Parity valid That concerned me a bit, but after reading here I did the following: Reboot Reran Parity Check (without error correction): Result: 0 errors - Parity valid Ran another short SMART test on all three drives: Passed So am I good to go? Just acknowledge the errors on the three drives and try to forget this ever happened? Or should I be running another Extended SMART test or Parity Check (with/without error correction)? Your advise is greatly appreciated!
  22. Changed the script to "hdparm -B 255 /dev/sdX" and it seems to work ! Since running the script at boot this morning the LCC numbers have stayed the same for all drives! Wish I had known about this from the start. 2020 has been a bad year for my drives as well haha. Thanks for the help @JorgeB
  23. @JorgeB So I ran the following script: #!/bin/bash hdparm -Z /dev/sdo hdparm -Z /dev/sdm hdparm -Z /dev/sdd hdparm -Z /dev/sde hdparm -Z /dev/sdc hdparm -Z /dev/sdb hdparm -Z /dev/sdn hdparm -Z /dev/sdl hdparm -Z /dev/sdf hdparm -Z /dev/sdh hdparm -Z /dev/sdj hdparm -Z /dev/sdk All drives, except sdk for some reason, return the following: disabling Seagate auto powersaving mode SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 04 53 40 00 21 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Drive sdk only reports disabling Seagate auto pwersaving mode? I'm not sure what this means. I found the following post regarding the last message: https://askubuntu.com/questions/768373/hard-drive-error-bad-missing-sense-data but can't really make heads or tales of it unfortunately. One answer mentions "sdparm" instead of "hdparm". Any thoughts on this? Edit: seems like sdparm is for SAS devices, so it's not that.
  24. Is this the same as "hdparm -B 255 /dev/sdX"? Because I've read that this disables it too? I haven't seen "-B 0" been mentioned anywhere? And do you agree with the following: first try "hdparm -Z /dev/sdX" before trying "-B 255/0" and "-B 254"?
  25. Hi @JorgeB Thanks for your reply. I haven't dealt with "hdparm" before. I thought setting the spindown time to 'never' would also prevent the headparking. Guess I was wrong. I've tried google and this forum, but I can't seem to find all the information needed to apply this. Just bits an pieces. I can find these options that seem to be relevant: 1. hdparm -B254 2. hdparm -B255 3. hdparm -Z Found this post in which all the options are listed: Not sure if -Z will do anything, because here it didn't do anything: But these are all posts from 2012... so not sure if this information still applies today? And I think the order in which to try this is 3, 2 and as last resort 1? Not sure how to apply this to a script. All I can find is that the command needs to be something like: hdparm -B 255 /dev/xxx In which 'xxx' stands for the particular drive I assume? So in my case I need to add this line 12 times, one for each drive?