dyker

Members
  • Posts

    43
  • Joined

  • Last visited

dyker's Achievements

Rookie

Rookie (2/14)

0

Reputation

  1. Is it possible to have "FixCommonProblems" plug-in scan for any dmesg errors or warnings and if found, surface them? See this thread for reference. Or, if there is a better plug-in for that, feel free to enlighten me. Thanks!
  2. Thanks for your help. Sorry I edited the post when you were replying. Is there a way to surface those communication errors? I'm guessing "no" unless I scan the logs myself? I'll do that for a few weeks.
  3. I do have system notifications enabled. I did get notifications about the parity errors end of June, but never any message about the SATA cable communication issues from early June as indicated a few posts above in the log. Is there a different setting to get those surfaced proactively?
  4. Thank you for the reply! I found that ata7 is my parity drive. disk:0 description: ATA Disk product: WDC WD60EFAX-68J vendor: Western Digital physical id: 0 bus info: scsi@7:0.0.0 <<< THIS IS ATA 7, SCSI @ 7 logical name: /dev/sde version: 0A82 serial# : MATCHES PARITY DRIVE size: 5589GiB (6001GB) Replaced the cable. Also replaced the SATA card (it was on a daughter card and part of my planned upgrade was replacing a 2-port SATA card with a 4-port SATA card so I just went ahead and did it). Now what. Run parity check again? With writing Correction? Also is there a reason Unraid didn't tell me about all these errors? I mean, I guess it did in the log, but it seems like that should have been raised at a higher level to make it obvious to me somehow. Want to know if I should have a setting somewhere to make errors more obvious. Should I manually scan the logs for a few weeks and hope not to see errors?
  5. I also see this in my logs... errors, not sure how to interpret, or if it is related, but tons of "failed" errors on ata7. not sure which disk is ata7, if I should be concerned about this, and if it is a problem how would I get Unraid to tell me about it earlier? In my log (see first post) these ATA7 errors go way back to JUNE 1 when the last parity check happened. So some of these ata7 errors predate the beginning of the parity errors. Jul 1 09:09:33 VDUnraid kernel: ata7.00: cmd 60/40:f8:d0:a9:bc/05:00:c6:01:00/40 tag 31 ncq dma 688128 in Jul 1 09:09:33 VDUnraid kernel: res 40/00:f8:d0:a9:bc/00:00:c6:01:00/40 Emask 0x10 (ATA bus error) Jul 1 09:09:33 VDUnraid kernel: ata7.00: status: { DRDY } Jul 1 09:09:33 VDUnraid kernel: ata7: hard resetting link Jul 1 09:09:40 VDUnraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 1 09:09:40 VDUnraid kernel: ata7.00: configured for UDMA/33 Jul 1 09:09:40 VDUnraid kernel: ata7: EH complete Jul 1 09:10:05 VDUnraid kernel: ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x10002 action 0xe frozen Jul 1 09:10:05 VDUnraid kernel: ata7.00: irq_stat 0x00400000, PHY RDY changed Jul 1 09:10:05 VDUnraid kernel: ata7: SError: { RecovComm PHYRdyChg } Jul 1 09:10:05 VDUnraid kernel: ata7.00: failed command: READ FPDMA QUEUED Jul 1 09:10:05 VDUnraid kernel: ata7.00: cmd 60/58:00:68:b1:07/01:00:c7:01:00/40 tag 0 ncq dma 176128 in Jul 1 09:10:05 VDUnraid kernel: res 40/00:00:68:b1:07/00:00:c7:01:00/40 Emask 0x10 (ATA bus error) Jul 1 09:10:05 VDUnraid kernel: ata7.00: status: { DRDY } Jul 1 09:10:05 VDUnraid kernel: ata7: hard resetting link Jul 1 09:10:12 VDUnraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 1 09:10:12 VDUnraid kernel: ata7.00: configured for UDMA/33 Jul 1 09:10:12 VDUnraid kernel: ata7: EH complete Jul 1 09:11:11 VDUnraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 1 09:11:11 VDUnraid kernel: ata7.00: configured for UDMA/33 Jul 1 09:11:42 VDUnraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 1 09:11:42 VDUnraid kernel: ata7.00: configured for UDMA/33 Jul 1 09:25:15 VDUnraid kernel: ata7.00: exception Emask 0x10 SAct 0x300 SErr 0x10002 action 0xe frozen Jul 1 09:25:15 VDUnraid kernel: ata7.00: irq_stat 0x00400000, PHY RDY changed Jul 1 09:25:15 VDUnraid kernel: ata7: SError: { RecovComm PHYRdyChg } Jul 1 09:25:15 VDUnraid kernel: ata7.00: failed command: READ FPDMA QUEUED Jul 1 09:25:15 VDUnraid kernel: ata7.00: cmd 60/b8:40:e8:d0:e6/03:00:d0:01:00/40 tag 8 ncq dma 487424 in Jul 1 09:25:15 VDUnraid kernel: res 40/00:40:e8:d0:e6/00:00:d0:01:00/40 Emask 0x10 (ATA bus error) Jul 1 09:25:15 VDUnraid kernel: ata7.00: status: { DRDY } Jul 1 09:25:15 VDUnraid kernel: ata7.00: failed command: READ FPDMA QUEUED Jul 1 09:25:15 VDUnraid kernel: ata7.00: cmd 60/88:48:a0:d4:e6/01:00:d0:01:00/40 tag 9 ncq dma 200704 in Jul 1 09:25:15 VDUnraid kernel: res 40/00:40:e8:d0:e6/00:00:d0:01:00/40 Emask 0x10 (ATA bus error) Jul 1 09:25:15 VDUnraid kernel: ata7.00: status: { DRDY } Jul 1 09:25:15 VDUnraid kernel: ata7: hard resetting link Jul 1 09:25:21 VDUnraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 1 09:25:21 VDUnraid kernel: ata7.00: configured for UDMA/33 Jul 1 09:25:21 VDUnraid kernel: ata7: EH complete Here are a few more from June 28 before the parity check: Jun 28 15:15:48 VDUnraid kernel: ata7.00: status: { DRDY } Jun 28 15:15:48 VDUnraid kernel: ata7.00: failed command: READ FPDMA QUEUED Jun 28 15:15:48 VDUnraid kernel: ata7.00: cmd 60/40:e8:90:ed:75/05:00:1a:00:00/40 tag 29 ncq dma 688128 in Jun 28 15:15:48 VDUnraid kernel: res 40/00:00:18:f5:75/00:00:1a:00:00/40 Emask 0x10 (ATA bus error) Jun 28 15:15:48 VDUnraid kernel: ata7.00: status: { DRDY } Jun 28 15:15:48 VDUnraid kernel: ata7.00: failed command: READ FPDMA QUEUED Jun 28 15:15:48 VDUnraid kernel: ata7.00: cmd 60/48:f0:d0:f2:75/02:00:1a:00:00/40 tag 30 ncq dma 299008 in Jun 28 15:15:48 VDUnraid kernel: res 40/00:00:18:f5:75/00:00:1a:00:00/40 Emask 0x10 (ATA bus error) Jun 28 15:15:48 VDUnraid kernel: ata7.00: status: { DRDY } Jun 28 15:15:48 VDUnraid kernel: ata7: hard resetting link Jun 28 15:15:53 VDUnraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 28 15:15:54 VDUnraid kernel: ata7.00: configured for UDMA/33 Jun 28 15:15:54 VDUnraid kernel: ata7: EH complete Jun 28 15:16:49 VDUnraid kernel: ata7.00: exception Emask 0x10 SAct 0xff00 SErr 0x10002 action 0xe frozen Jun 28 15:16:49 VDUnraid kernel: ata7.00: irq_stat 0x00400000, PHY RDY changed Jun 28 15:16:49 VDUnraid kernel: ata7: SError: { RecovComm PHYRdyChg } Jun 28 15:16:49 VDUnraid kernel: ata7.00: failed command: READ FPDMA QUEUED Jun 28 15:16:49 VDUnraid kernel: ata7.00: cmd 60/40:40:f0:8a:5f/05:00:1b:00:00/40 tag 8 ncq dma 688128 in Jun 28 15:16:49 VDUnraid kernel: res 40/00:48:30:90:5f/00:00:1b:00:00/40 Emask 0x10 (ATA bus error) Jun 28 15:16:49 VDUnraid kernel: ata7.00: status: { DRDY } Jun 28 15:16:49 VDUnraid kernel: ata7.00: failed command: READ FPDMA QUEUED Jun 28 15:16:49 VDUnraid kernel: ata7.00: cmd 60/38:48:30:90:5f/02:00:1b:00:00/40 tag 9 ncq dma 290816 in Jun 28 15:16:49 VDUnraid kernel: res 40/00:48:30:90:5f/00:00:1b:00:00/40 Emask 0x10 (ATA bus error) Jun 28 15:16:49 VDUnraid kernel: ata7.00: status: { DRDY } Jun 28 15:16:49 VDUnraid kernel: ata7.00: failed command: READ FPDMA QUEUED Jun 28 15:16:49 VDUnraid kernel: ata7.00: cmd 60/b8:50:68:92:5f/01:00:1b:00:00/40 tag 10 ncq dma 225280 in Jun 28 15:16:49 VDUnraid kernel: res 40/00:48:30:90:5f/00:00:1b:00:00/40 Emask 0x10 (ATA bus error) Jun 28 15:16:49 VDUnraid kernel: ata7.00: status: { DRDY }
  6. Version: 6.11.5 I've had Unraid since 2017 and have ran monthly parity checks without problems. I have a 3 drive array. I've replaced the drives to upgrade once or twice so the drives aren't too old. On June 28 I popped off my case because I wanted to replace a drive with a larger drive. First I ran a parity check. I left the case off because I've not had a problem running with the case off before but apparently I also haven't ran a parity check with the case off? The drives warmed up to 49-50C during the parity check. I didn't bother looking at the results when it was done with the parity check, because I got busy with the weekend. But apparently there were 3 errors. Then, on July 1 (a few days later) my system ran it's monthly parity check... case still off. Yep, I just ran everything with the case off and forgot about it while I enjoyed the weekend. And apparently 2 more errors. These are literally the first errors I've had on Parity since going with Unraid, dates below are when they completed: Parity-Check 2023-07-01, 15:06:21 (Saturday)6 TB 15 hr, 6 min, 19 sec 110.4 MB/sOK 2 ERRORS Parity-Check 2023-06-29, 05:10:01 (Thursday)6 TB 14 hr, 23 min, 24 sec 115.8 MB/sOK 3 ERRORS Parity-Check 2023-06-01, 14:00:56 (Thursday)6 TB 14 hr, 55 sec 118.9 MB/sOK 0 ERRORS (AND ALL PRIOR CHECKS ALL THE WAY BACK TO 2017 ZERO ERRORS) So apparently DURING THE PARITY CHECK I also got emails saying my drives were hot: Event: Unraid Disk 2 temperature Subject: Warning [VDUNRAID] - Disk 2 is hot (47 C) Event: Unraid Parity disk temperature Subject: Warning [VDUNRAID] - Parity disk is hot (46 C) I later received emails that the drives returned to normal temperatures. I didn't see any of the emails until today, because, like I said, I got busy this weekend and didn't look at the server or any email. Well, do I have a problem now? Was the the heat to blame for the parity errors? What should I do now? All drives show healthy SMART. I actually am building a 2nd Unraid server and was going to put on a new SATA controller on this config and move this SATA controller to the one I'm building. So I'm glad to see the problems now before I started all the changes, but really need advice. I've not rebooted in 6 months. I just started a new parity check, but unchecked the "fix errors" button to see if I get a clean parity check.... Oh, with the case on so I should get good air flow. If anyone can provide advice or insight, I've attached the log, and would be grateful. Should I just say "OK, they are fixed, glad the parity errors are fixed" and just watch things? Thank you in advance! vdunraid-diagnostics-20230702-1704.zip
  7. Maybe I'm not understanding things quite right, but I don't see Java 17 and I'm on the latest version according to unraid. I'm trying to create and run a 1.18 world.
  8. @itimpi Thank you for pointing out the obvious. You are correct I did not press the balance button I thought it was more an information bubble and "done" would apply all my changes. I've just tried tapping that button, let it run, and success. Thank you again.
  9. Sorry to be dense, but I'm not sure if I'm doing this right and it didn't work for me. I go into first pool drive named "cache" and it looks like this: A little further down there is the "Balance" and I selected Then to the bottom of the page and I select Done and nothing changes. Below is an image of the 2nd cache drive. Note it doesn't say btrfs. Not sure if that means anything or not. The full diagnostics are linked on top post. Not sure how to proceed. Please help. Thank you!
  10. A few days ago I shut down my unraid server in order to remove a failing 3TB drive from my array (the cache drives were not failing, they were fine). I had to unplug a lot of things to remove the failed drive and when booting up Unraid I forgot to plug in one of my two 240GB SSDs which are cache drives in a Raid1 through the unraid software. As soon as I brought the server up I had the "oh shoot!" moment so I shut down again and plugged in the SSD that I missed. When I did that I received an error Cache Drive error Unmountable: Too many missing/misplaced device Along with this none of my dockers were present. "Oh shoot!" again. So I posted asking for help on how to fix this on this thread and waited, knowing I'd get good advice, and I did. Following that thread my cache was once again up and running and all my dockers (and their data) were back!! Horray! I thought. However, today I noticed my two cache drivers are not in any Raid. Instead they are now pooled. Everything seems to be working today but I'd like the cache drives protected with Raid. If I click the "view" button on the cache I see this: So here is my question. I want to re-raid these two cache drives without losing any data on the cache. I obviously broke the RAID configuration when I booted it up with one drive unplugged. And when I put them both back into the pool they now are "pooling" and not in a RAID. What is the safest way for me to re-setup Raid without losing data on my cache? I've attached my diagnostics. (Note, Unrelated to the cache issue, I'm currently rebuilding the bad drive that took me down this path onto a 4TB which will probably show up on the diagnostics). vdunraid-diagnostics-20210528-2311.zip
  11. Thank you so much for responding so quickly. I did exactly as you said, and all has returned to normal!! I don't see any "buy me a beer" link but if you reply with one I'd be so happy to!!
  12. Please help if you can. I had two cache drives (PNY 240 GB SSD) in an array pool and needed to install another 4TB platter drive. I forgot to plug in one of the cache drives after adding the physical 4TB drive when I booted up. As soon as I booted up Unraid gave me an error indicating the missing cache drive. So I shut down Unraid and plugged it in. Now I've booted and I get a single cache drive in the pool with the error Unmountable: Too many missing/misplaced devices. All my dockers are missing. And the docker data, specifically, some minecraft worlds that my kids had worked on for a long time. Also Unraid started up a parity check which won't complete for 6 hours. Unmountable disk present: Cache • PNY_CS1311_240GB_SSD_PNY52162201520201586 (sdd) The other cache drive is sitting down in unassigned devices and I could mount it, but I'm afraid to do anything at the moment since I don't want to lose those Minecraft files. Any advice would be appreciated. I want to (1) get the Minecraft files and (2) if possible have it boot up with both drives in the array like it was before (not possible?). Advice please? Maybe did I plug the cache drives back into the wrong SATA ports? Is that why it says "misplaced devices? I don't know how to proceed.
  13. TLDR: Check you Windows 10 Driver to ensure you have the up-to-date from manufacturer and NOT the rando Microsoft NIC driver. Found this thread through a web search as I was having same error code when transferring a 100GB file. Tried all the things then finally looked at my driver on my Windows 10 Dell system. It's a Realtek PCIeGbE Family controller. And guess what the driver loaded on my win 10 rig was provided by Microsoft dated 2015. Well I run Dell's support Assist monthly and this software is SUPPOSED to keep my drivers fresh (and often does) but apparently it ignores the NIC driver. Because when I visited https://support.dell.com for my system I found a driver provided by Realtek on the support page for my system (dated 8/11/2017). After loading this driver the error went away and also the transfer rate stays solid... doesn't fizzle to nothing looking like an occasional heart-beat. This is a dynamic issue however in my case it definitely was a problem on the Windows 10 side of the house not the Unraid side of the house.
  14. OK Thanks again for your help I first suspected a thunderstorm last night but here's what happened. Even though I did this to myself I thought I'd post for the future person with this error: I was running a minecraft server in a docker and every day it writes a new 1.25GB backup file to my cache drive. My cache drives are 240GB and I forgot about this and the cache drive filled up. Stupid me. Thanks so much for your help. I deleted a bunch of wasted backups. I'm up and running again!! I don't know how to point these backup files to my array where I have 4TB because the docker runs from the cache drive for speed.
  15. OK Thanks! I am able to boot into safe mode (picked no GUI). I can bring up the graphic panel on my laptop on the network. It says Array Stopped autostart disabled (safe mode). So I restarted the array (from safe mode). It started OK. The dockers aren't working says service failed to start... maybe that is because I'm in safe mode. I ran diagnostics. Does anyone see anything? I haven't tried starting the array yet still in safe mode. I'd appreciate further advice.