Jump to content

firrae

Members
  • Content Count

    31
  • Joined

  • Last visited

Community Reputation

0 Neutral

About firrae

  • Rank
    Advanced Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Well, I can't thank you enough @johnnie.black! After updating the firmware I see 0 read errors and the CRC errors have completely stopped increasing on all drives. Thanks a bunch for pointing this out, I would NEVER have thought of this being an issue.
  2. OK, will try that. Thanks for pointing me in a direction @johnnie.black!
  3. What would be the path forward do you think then? I'm not sure what I should do. I have multiple disks reporting read errors, but none show issues other than the CRC errors in SMART. Should I stop the rebuild, flash the firmware, and then... what? Rebuild if a parity check goes well?
  4. Interesting. Otherwise, if you don't me asking, but does the system look fine at a cursory glance? I do still have UNRAID reporting high read errors as well. Other than the glaring "its rebuilding a drive" thing of course.
  5. @johnnie.black is that a newer issue or has it been that way since before v6? I only got into UNRAID as v6 was launching and don't remember there being an issue until recently?
  6. Quick update. After I finished writing this I noticed that my SATA cable based drives were also getting these errors, but not all of them. The Parity drive is reporting 0 over its entire life, but the drive nearest it is showing an increasing, but slower than other drive on the SAS to SATA breakouts, number of CRC errors. This maybe leads to a combination of the cables and the cages? I really don't know at this point.
  7. Hi there, After digging around on Google and the forums I believe the issues with my array come down to the issue that I am getting UDMA CRC errors on a number of my drives, but honestly I'm not sure where to begin looking at the cause. In my eyes, and from reading, I believe it could be one or a combination of 3 things: My SAS to SATA cables (maybe they are cross-talking and the likely candidate?) - I've tried 2 different brands but still get the issue, though both brands the cables looked the same, just slightly different colours. - https://www.amazon.ca/gp/product/B0736J45V2/ My drive cages: I have a Rosewill RSV-L4412 which came with 3 drive cages (can't remember the part number for them) - https://www.rosewill.com/product/rosewill-rsv-l4412-4u-rackmount-server-case-or-chassis-12-sata-sas-hot-swap-drives-5-cooling-fans-included/ My SAS controller which is a Fujitsu (?) card flashed to be an LSI 9211-8i in "IT" mode At this point I believe the cables but I'd be interested in hearing what others think. 8 of my disks use these breakout cables as the way they connect, the other 4 go directly to the motherboard SATA ports. What I find interesting is it seems like the drives on these breakout cables have the issue much worse, though this is only so far a short term observation since I read about this, and the cage that's wired directly currently only has 3 drives in it, the rest are fully loaded with 4. I'm curious if people think I'd be better served with which of the potential options to try and solve this: Get different breakout cables. Get new drive cages. change out the controller. In any case I'd be interested in seeing the recommendations people have on this. This all comes from my seeing what I think are VERY high read error counts as I'm rebuilding my array after changing out a drive. Attached is my diagnostics file from the server. Its in the middle of building that drive as I mentioned, so whatever decision I make I'm a couple of days away from actually implementing at least assuming I can eve get the parts to do it at this point. I'm interested to see what people think. Thanks! tower-diagnostics-20200318-1415.zip
  8. You may have found it. I thought I had updated it, and while I need to go into the BIOS to be sure this is one of the update features on the second to last update they gave: I have VT enabled. I will try these BIOS updates if the server crashes otherwise I'll try them tomorrow night by gracefully shutting it down.
  9. 1) It seems the BIOS is fully updated, but I'll check again. 2) brand new PSU, replaced the original PSU. Tested the PSU on my main PC and it ran it fine for 3 days. 3) All the RAM is identical and was purchased at the same time. 4) RAM meets the mobo's requirements and is within their spec. 5) This is the one I can't decide if it is the problem. I have 4 fans in the case, one over the HDDs that's an intake, 1 more intake on the side, and 2 exhaust (back and top). I've had this issue happen with the fans in place and the case closed and with the side fan removed and the case fully open. 6) I cleaned pretty well everything before I put it in there. I moved the old PC into a new case and took that time to basically clean everything via compressed air before putting it back in. I'll follow up with the BIOS though and for heat, I figured there'd be some sort of warning or error log somewhere, but I can't find anything that indicates that.
  10. I have a monitor hooked up to it and was looking at it once when it happened, there was no shutdown procedure so it was a hard power off, the screen just went black and then the BIOS boot screen, that's where my thought of it being the PSU came from originally.
  11. Hi there, I've been trying to solve an issue with my server that seems to baffle everyone I speak to and at this point, I'm running out of possibilities. As the title says my server randomly reboots and there's seemingly no reason. I've captured logs via telnet and looked at them after the crash and found absolutely nothing. The log just stops as if I pulled the power plug. I checked my docker containers and each time there's no consistent action happening that I can point to. Below is a synopsis of my setup: Intel i7-920 EVGA X58 motherboard 12GB (6 2GB sticks) of DDR3 RAM 4 HDDs (2 4TB and 2 2TB) 1 SSD (an older 128GB Sandisk) 600W EVGA 80+ Bronze PSU (brand new as I thought this might be the issue originally) I do live in an apartment so I have a 1500VA APC UPS between the server and the wall. As I said I can't find any clear thing that causes the reboot and the only reason I know it happened is PLEX is no longer available or I hear the beep from the POST succeeding. I have found some potential contributing factors though: 1) When I'm running no docker containers the server seems fairly stable and was on for 24 straight hours where the reboot usually happens after 6hrs (rarely less, but it has happened after only 2 hours before). 2) At about 6 containers it seems to lag the web UI and then crash shortly after. 3) SabNZB seems to cause it when it's the only container fairly quickly. During all this, I am watching the system stats on the dashboard and only a few cores ever spike to 100% and memory never passed 40%, but there's seemingly no consistency on CPU and RAM usage and the rebooting. Finally, I have run memtest86 on it and after leaving the test to run for 2 straight days it never found an error so I've basically ruled out memory. I now have an error with community applications, likely corruption (I think this happened when it rebooted in the middle of trying to create a container), but even at the start, when CA was working, I was having this issue. Any help is appreciated as this is basically making it unusable. Edit: I have a telnet session into the server now to try and capture if/when the server reboots, this could take a while though.
  12. @Squid, OK, I'll try that this weekend and maybe move it to a new USB. I hope this might end up solving my stability issues, but from @trurl's comments, I'm not overly hopeful about that part...
  13. Yes, it never registered any power issues and my other PC that is also connected to it was perfectly fine. To note, it is an APC 1500 VA so it should be more than enough. At this point, I'm at a loss if it's not likely the USB. I've changed out the PSU, run memory checks, benchmarked the CPU and all the drives are reporting good health...
  14. Could this also be causing my issue where the server randomly restarts? Also is there a way to redo the flash drive but keep all my current settings and data? EDIT: Also yes the same error.
  15. I don't see any "bread" or any other types of errors after the system has come back up. Just tried to use the apps install again and it's the same issue.