deano_southafrican

Members
  • Posts

    34
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

deano_southafrican's Achievements

Noob

Noob (1/14)

4

Reputation

1

Community Answers

  1. Hey, if anyone is willing to chip in here, I'm looking for some opinions as to what to do next. I'm pretty sure after a couple unclean powerdowns (we have terrible power and I've had a UPS and an inverter die), I have been having issues with a single disk constantly throwing errors and I haven't been able to complete a parity sync for ages now, one because it's incredibly slow, and two because I was previously concerned about ruining the parity (this has gone out the window now). So basically I was getting this in my logs when parity would run: Apr 11 08:25:04 Tower kernel: mdcmd (37): nocheck cancel Apr 11 08:25:05 Tower sSMTP[15964]: Creating SSL connection to host Apr 11 08:25:06 Tower sSMTP[15964]: SSL connection using TLS_AES_256_GCM_SHA384 Apr 11 08:25:10 Tower sSMTP[15964]: Sent mail for [email protected] (221 2.0.0 closing connection d16-20020adff2d0000000b003418364032asm968558wrp.112 - gsmtp) uid=0 username=root outbytes=775 Apr 11 08:25:26 Tower kernel: ata9.00: failed to read SCR 1 (Emask=0x40) Apr 11 08:25:26 Tower kernel: ata9.01: failed to read SCR 1 (Emask=0x40) Apr 11 08:25:26 Tower kernel: ata9.02: failed to read SCR 1 (Emask=0x40) Apr 11 08:25:26 Tower kernel: ata9.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Apr 11 08:25:26 Tower kernel: ata9.02: failed command: READ DMA Apr 11 08:25:26 Tower kernel: ata9.02: cmd c8/00:00:18:4a:00/00:00:00:00:00/e0 tag 17 dma 131072 in Apr 11 08:25:26 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 11 08:25:26 Tower kernel: ata9.02: status: { DRDY } Apr 11 08:25:27 Tower kernel: ata9.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 11 08:25:27 Tower kernel: ata9.01: limiting SATA link speed to 1.5 Gbps Apr 11 08:25:28 Tower kernel: ata9.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 11 08:25:28 Tower kernel: ata9.01: SATA link down (SStatus 0 SControl 310) Apr 11 08:25:28 Tower kernel: ata9.02: hard resetting link Apr 11 08:25:28 Tower kernel: ata9.02: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 11 08:25:28 Tower kernel: ata9.00: configured for UDMA/133 Apr 11 08:25:29 Tower kernel: ata9.02: configured for UDMA/133 Apr 11 08:25:29 Tower kernel: ata9.02: device reported invalid CHS sector 0 Apr 11 08:25:29 Tower kernel: ata9: EH complete Apr 11 08:25:29 Tower kernel: md: recovery thread: exit status: -4 Apr 11 08:25:37 Tower kernel: ata9.00: failed to read SCR 1 (Emask=0x40) Apr 11 08:25:37 Tower kernel: ata9.01: failed to read SCR 1 (Emask=0x40) Apr 11 08:25:37 Tower kernel: ata9.02: failed to read SCR 1 (Emask=0x40) Apr 11 08:25:37 Tower kernel: ata9.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Apr 11 08:25:37 Tower kernel: ata9.02: failed command: READ DMA Apr 11 08:25:37 Tower kernel: ata9.02: cmd c8/00:20:40:5f:6c/00:00:00:00:00/e9 tag 20 dma 16384 in Apr 11 08:25:37 Tower kernel: res 50/00:00:37:5e:6c/00:00:00:00:00/ed Emask 0x4 (timeout) Apr 11 08:25:37 Tower kernel: ata9.02: status: { DRDY } Apr 11 08:25:38 Tower kernel: ata9.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 11 08:25:38 Tower kernel: ata9.01: limiting SATA link speed to 1.5 Gbps Apr 11 08:25:38 Tower flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Apr 11 08:25:38 Tower kernel: ata9.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 11 08:25:39 Tower kernel: ata9.01: SATA link down (SStatus 0 SControl 310) Apr 11 08:25:39 Tower kernel: ata9.02: hard resetting link Apr 11 08:25:39 Tower kernel: ata9.02: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 11 08:25:39 Tower kernel: ata9.00: configured for UDMA/133 Apr 11 08:25:39 Tower kernel: ata9.02: configured for UDMA/133 Apr 11 08:25:39 Tower kernel: ata9: EH complete This lead me down the rabbit hole. First a few quick SMART tests - all fine. Then extended SMART tests - also fine. Once, I had to run XFS_repair to fix an issue where the drive wasn't mounting. I have swapped the SATA cable with a previously used one and a brand new one, I also swapped to a SATA port on a PCIe-SATA adapter so as to test if it was maybe the port on the motherboard (which I thought would be fine since the other 3 don't have issues. So now I'm pretty convinced there is actually an issue with the drive. All the data (as far as I can tell) seems to be in-tact and I have cloud backups of the entire NAS stored per disk (ie. /mnt/disk1 rsyncs to cloud/disk1 and so on for each of the 3 data disks in my array). When writing to or reading from the drive I don't see any performance issues. EDIT: As I'm posting this it seems the drive is finally dead. It is unmountable and I can't get anything to work. I'm going to have to replace it. So any advice as to how to do that safely is appreciated. So I have a few actual questions. Is there something I haven't considered or may have overlooked? I believe I need to replace the disk. If I do replace the disk, should I let it sync with Parity first and then try to restore the data from the cloud backups or should I prevent parity from running, restore from backups and then sync? Would it be silly to try copy this data straight to the new disk (considering there may well be data corruption issues? Any other ideas/input would be greatly appreciated! I have attached diagnostics just in case but I have just rebooted a few times trying to change cables etc so I'm not sure what you might get from it. Thanks a million, appreciate you for reading through this! tower-diagnostics-20240411-0931.zip
  2. Reddit is going wild but seems far more sensible over here. I have a Pro license and in all honesty I'm unlikely to ever need another license and if I did I'd be willing to purchase a year long license as outlined in the post. We can all be brutally honest and say the amount of value we've gotten out of unRAID for the price has been incredible. That being said, it'd be annoying to have to worry about a subscription model or annual license charge. A lifetime option should exist and if you need to increase the price then so be it. Everyone needs to put down the pitch forks. In future though, please make your announcement before releasing the update for everyone to find and get riled up about.
  3. Just for interest sake, it seems there was a corruption issue with the flash drive. Time will tell but my server gave up and was offline for a bit. Ran Memtest and passed. Tried booting and there was an issue with the bz boot loader or something, it refused to boot. Anyway, I have restored from backup and moved over to a new flash drive and have my server up and running. We'll have to wait to see if everything works but at least for now it seems to be alright.
  4. Thanks @JorgeB, appreciate you taking a look. I'm going to run memtest, I had a hunch a while back either one of my RAM modules is bad or my mobo is dying. Will look into corruption on cache as well. Please point me in the right direction for resources or some way I can learn to look through my diagnostics to find corruption issues. I really just want to make sure my flash drive is fine as I don't want to keep making backups of a bad configuration. I have attached the complete diagnostics. tower-diagnostics-20231202-1021.zip
  5. Hey guys, woke up this morning to my docker service offline. My VM's were all fine, everything seemed to be working, even Plex was working well! But on the docker page it said "Docker service unable to start". I tried disabling Docker and then re-enabling but nothing worked. Eventually I hit reboot and waited. First thing I saw was a message that said something along the lines of "Flash drive is possibly corrupt" followed by some weird issues on the dashboard. After about 10 minutes I refreshed and everything seems fine, all my docker services are up and running. I have backups of my flashdrive so not overwhelmingly stressed out but I also don't really want to make another backup now since it'd possibly be corrupt. I have attached the logs incase someone more experience than me is able to spot anything. Should I restore my most recent backup to a new flashdrive or should I leave it as is? Appreciate your time and opinion. tower-syslog-20231201-0616.zip
  6. Here's the Diags just in case the above isn't enough. tower-diagnostics-20230803-2351.zip
  7. If anyone better at this kinda stuff could tell me what they think I'd be very appreciative. After a reboot or even safe powerdown, when the system is started up it immediately starts up a parity check, sometimes finding sync errors and sometimes not. I don't want to keep running parity checks so I had a look at the logs and it seems my disk 2 has XFS errors with corrupted meta data... ``` Aug 3 23:07:13 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Aug 3 23:07:13 Tower kernel: ata2.00: cmd 60/40:f8:c0:75:1e/05:00:00:00:00/40 tag 31 ncq dma 688128 in Aug 3 23:07:13 Tower kernel: res 40/00:f0:00:71:1e/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Aug 3 23:07:13 Tower kernel: ata2.00: status: { DRDY } Aug 3 23:07:13 Tower kernel: ata2: hard resetting link Aug 3 23:07:17 Tower kernel: mdcmd (37): nocheck cancel Aug 3 23:07:19 Tower kernel: ata2: found unknown device (class 0) Aug 3 23:07:20 Tower kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Aug 3 23:07:20 Tower kernel: ata2.00: configured for UDMA/100 Aug 3 23:07:20 Tower kernel: ata2: EH complete Aug 3 23:07:22 Tower kernel: md: recovery thread: exit status: -4 Aug 3 23:07:34 Tower kernel: XFS (md2p1): Metadata corruption detected at xfs_dir3_data_reada_verify+0x53/0x64 [xfs], xfs_dir3_data_reada block 0x3d02fa30 Aug 3 23:07:34 Tower kernel: XFS (md2p1): Unmount and run xfs_repair Aug 3 23:07:34 Tower kernel: XFS (md2p1): First 128 bytes of corrupted metadata buffer: Aug 3 23:07:34 Tower kernel: 00000000: 22 aa 78 d4 c1 33 56 90 3d dd 4d 64 24 52 56 2e ".x..3V.=.Md$RV. Aug 3 23:07:34 Tower kernel: 00000010: 5c 4d af 56 e8 16 83 e2 c2 a2 7b 8d 6d 48 45 99 \M.V......{.mHE. Aug 3 23:07:34 Tower kernel: 00000020: 6f ba fa 58 d9 54 aa 75 6c af d4 c7 1e 1c 6e 8d o..X.T.ul.....n. Aug 3 23:07:34 Tower kernel: 00000030: 42 a7 62 2a 3c ee 4a 31 d4 ab 58 a8 5d 81 ea a3 B.b*<.J1..X.]... Aug 3 23:07:34 Tower kernel: 00000040: 9a be b5 30 d4 47 bf 4f 16 cd a8 3b e5 93 02 94 ...0.G.O...;.... Aug 3 23:07:34 Tower kernel: 00000050: f9 6f 47 83 de 9f 9d 95 0f f5 65 f5 1f 07 13 6b .oG.......e....k Aug 3 23:07:34 Tower kernel: 00000060: 05 71 fc 6a 93 fc f2 61 b5 c3 78 c2 36 18 0c e2 .q.j...a..x.6... Aug 3 23:07:34 Tower kernel: 00000070: e2 27 e2 c7 28 a4 58 13 91 e7 da 5e 61 7a fb 29 .'..(.X....^az.) Aug 3 23:07:34 Tower kernel: XFS (md2p1): Metadata CRC error detected at xfs_dir3_block_read_verify+0x7c/0xf1 [xfs], xfs_dir3_block block 0x3d02fa30 Aug 3 23:07:34 Tower kernel: XFS (md2p1): Unmount and run xfs_repair Aug 3 23:07:34 Tower kernel: XFS (md2p1): First 128 bytes of corrupted metadata buffer: Aug 3 23:07:34 Tower kernel: 00000000: 22 aa 78 d4 c1 33 56 90 3d dd 4d 64 24 52 56 2e ".x..3V.=.Md$RV. Aug 3 23:07:34 Tower kernel: 00000010: 5c 4d af 56 e8 16 83 e2 c2 a2 7b 8d 6d 48 45 99 \M.V......{.mHE. Aug 3 23:07:34 Tower kernel: 00000020: 6f ba fa 58 d9 54 aa 75 6c af d4 c7 1e 1c 6e 8d o..X.T.ul.....n. Aug 3 23:07:34 Tower kernel: 00000030: 42 a7 62 2a 3c ee 4a 31 d4 ab 58 a8 5d 81 ea a3 B.b*<.J1..X.]... Aug 3 23:07:34 Tower kernel: 00000040: 9a be b5 30 d4 47 bf 4f 16 cd a8 3b e5 93 02 94 ...0.G.O...;.... Aug 3 23:07:34 Tower kernel: 00000050: f9 6f 47 83 de 9f 9d 95 0f f5 65 f5 1f 07 13 6b .oG.......e....k Aug 3 23:07:34 Tower kernel: 00000060: 05 71 fc 6a 93 fc f2 61 b5 c3 78 c2 36 18 0c e2 .q.j...a..x.6... Aug 3 23:07:34 Tower kernel: 00000070: e2 27 e2 c7 28 a4 58 13 91 e7 da 5e 61 7a fb 29 .'..(.X....^az.) Aug 3 23:07:34 Tower kernel: XFS (md2p1): metadata I/O error in "xfs_da_read_buf+0x9a/0xff [xfs]" at daddr 0x3d02fa30 len 8 error 74 ``` It says to unmount the disk and run XFS repair, is there anything I should know before doing this? I have my data backed up so I'm not stressed but I'd like to not cause any further damage or complications. Also, what would have caused this? The system is on a UPS and inverter for backup power... Could it have been the docker issues with 6.12.0? (I upgraded when it was released but now I'm on 6.12.3) I dont think it's a faulty cable as I had replaced the ones for my main array fairly recently although still possible I guess... Thanks in advance!
  8. Looks great! Giving it a test now. Small fix needed. When the WebUI port is changed, the WebUI button from the docker page still directs you to http://<IP>:8088 so in my case to qBittorrent. Have to manually navigate to http://<IP>:<port>.
  9. Was totally using the wrong auth option. Solved. This is great!
  10. This is actually fantastic and I've been wanting to set something up locally for my Proxmox server but best I could do was a remote share via rsync. Thanks for the effort! I can't seem to login though after initial creation. Username: admin & Password: pbspbs don't work for me. Any suggestions?
  11. Is there any way at all to allow multiple words per day? Or an alternative as nicely put together as this?
  12. Haha! You're a legend. When I added the drive to the new pool I selected the erase option which I assumed would format all drives in that pool... You were spot on, thank you!
  13. Don't know why I didn't include them in the first post... *face palm* tower-diagnostics-20220818-1822.zip
  14. TLDR: Needed to click the "format unmountable disk" button under the start/stop array option... #-------------------------------# This SSD came from my old laptop and I've decided to use it as a cache pool for my docker image etc. I know there are 3 partitions on this drive. When I added it to the new pool and erased it I stupidly thought it'd be fine as 2 of the 3 partitions were in the MB's. It's now flagged as Unmountable due to the partition layout. So, how would you recommend going about removing the other partitions (or all?) so unRAID will be happy? TIA
  15. Thank you! That worked for me, got it to run!