geofbennett

Members
  • Posts

    25
  • Joined

  • Last visited

Everything posted by geofbennett

  1. Just as a follow up and to close this out in case anybody has the same or similar problem in the future... I replaced the parity drive, restarted, and rebuilt parity over 30 days ago. Every weekly parity check since has turned up with zero errors. I also reset the data drive that had reported errors and it has not reported any problems since (though I have a feeling it will before long) The attached diagnostics no longer show any of the ata links as slow to respond. Thanks again to everybody for their help. the-dark-tower-diagnostics-20220922-1035.zip
  2. Thanks again. After resetting everything back the way it was before (including original cables) I see that the ATA6 is showing the "slow to respond" message as well as some other messages that are not being mentioned for any of the other ports. Ok, new drive should be here Friday. Once it is installed should I check the diagnostics before or after rebuilding parity? Or Both? the-dark-tower-diagnostics-20220817-1704.zip
  3. Swapped Disk 3 and Parity. I used the new cables, but couldn't switch the cables at the motherboard, had to switch the cables at the drives because one of the plugs has a 90deg bend and the other port on the board is obstructed. For my own edification, which log and which details are we looking at for the ATA errors? (if you can explain without too much effort that is, I'm so grateful for the help, I want to learn more but I don't want to put you guys out any more than I have to) the-dark-tower-diagnostics-20220817-1409.zip
  4. Replaced both with the last 2 cables I had Disk 3 is in an ICY DOCK FatCage MB155SP-B Parity is connected directly to the motherboard Would it help to swap Disk 3 from it's current slot in the cage into a different slot to see if the ATA errors follow it or remain on that slot?
  5. I'm glad I asked. Diagnostics attached the-dark-tower-diagnostics-20220817-1108.zip
  6. Thanks. Just so I'm understanding correctly, 1. Replace Cables 2. Restart and run Parity Check (with corrections?) 3. Post Diagnostics
  7. which is why I'm kinda cool with only 1 drive having errors, but I have 2 drives with errors (one of them the Parity drive) which makes me nervous. I understand my data is not on the parity drive but it is not the only drive showing errors.
  8. It was my understanding that the Parity drive is what enables you to rebuild a Data drive if it should fail, but if you only have a single parity drive and multiple drives fail at the same time then you will lose data. Is that not true?
  9. Sorry about that... Yes, Disk 3 is the one that reported errors last month. Any ideas about the error report for the parity drive? Or should I just not worry about it? That's what's kind of concerning to me. I'm cool with waiting a bit to see if more errors appear if it is only one disk, but being there are 2 and one of them is Parity I'm getting a little nervous.
  10. /Settings/DiskSettings - Default spin down delay is already set to "Never". Is there another way to disable spindown? I've set the parity check scheduler "Write corrections to parity disk:" to "No", but the check mark next to "Write corrections to parity" on Main does not go away. I suspect that check mark only applies if I tell it to check parity outside the schedule, correct? Diagnostics attached the-dark-tower-diagnostics-20220816-1520.zip
  11. About a month ago my server warned 4 times over 3 hours of “[5] reallocated sector ct” and once of “[187] reported uncorrect”. At the same time it was running it’s parity check, at the end of which it also reported parity errors. (I have it run a parity check w/ corrections weekly, and this was the first time it EVER reported parity errors) Understanding that the Data drive is probably on the way out the door, I acknowledged the errors and restarted the server to clear the warning and see how long it would take for it to report more errors That disk has not reported any more errors since then. However, the parity check has reported errors each week since. Then yesterday, before the parity check was finished, it sent me the following : “Event: Unraid array errors Subject: Warning [THE-DARK-TOWER] - array has errors Description: Array has 1 disk with read errors Importance: warning Parity disk - ST4000VN008-2DR166_ZDHAF8R2 (sdg) (errors 134) “ Running extended SMART tests on all Data disks (including the one the reported errors) returns “Completed without error”. However, the SMART test on the Parity drive ends with “Interrupted (host reset)” All disks are 4TB Seagate Ironwolf NAS. The drive that gave the initial errors was put into service in May 2018 (currently shows 37245 power on hours). The Parity drive was only installed last November (currently shows 6715 hours, and is thankfully still under warranty). (Also note that the Parity drive has a green thumbs up and says “Healthy” on the dashboard) Am I looking at replacing both drives? Any opinions on whether it was errors on the Data drive or problems with the Parity drive that has been causing the parity errors? Any suggestions on further tests of the Parity drive to aid in a warranty claim? Thanks for your help
  12. For the record, replacing the bz files as described got my system back online. Thanks
  13. I am having this same problem. I'd like to know exactly which suggestion solved the problem? Was it overwriting the bz files? Also, how did you do the power cycle? Clicking on the "reboot" button in the webGui does nothing for me, so I wonder is there another way or do I just pull the plug. Thanks
  14. Tried to download diagnostics and got this message
  15. just tried again, and was able to get to the dashboard and home screens
  16. Sorry, I didn't think to take a screen shot before I shut down the web browser. Sorry, Like I said, I don't know scripting language. Every new thing I try I learn a little more, but removing the backslash was not mentioned on that page. I'll remember that if I should try this again, but I have a feeling that once I've fixed this, I will just delete old stuff manually from now on.
  17. Let me begin by saying I know practically nothing about script language. I’ve been pretty lucky in figuring out how to make my UNRAID do what I want, but this problem is blowing my mind. I have the User scripts plugin installed on my server running one script every night that deletes any video files that are over 30 days old from my security camera share. It has worked great for a long time. Now I’ve been trying to add a line to the script to do the same thing for files in my PLEX DVR recorder share. This is the line I was working on: find /mnt/user/Media\ NAS/DVR\ Library/Judge\ Judy\ (1996)/* -type d -ctime +30 -exec rm -rf {} \; I have figured out that the line is failing because of the parentheses in the folder name and I was trying to figure out how to fix it… I found a post that suggested putting quotes around the path so I tried that find ‘/mnt/user/Media\ NAS/DVR\ Library/Judge\ Judy\ (1996) /’ * -type d -ctime +30 -exec rm -rf {} \; After running the script the webpage display changed putting a series of 3-4 warnings at the top and I could no longer edit the script. When I went to the Dashboard or Home the web-pages were displaying warnings and everything seemed out of order. I tried clicking on “Reboot” but nothing happened. I closed the web-browser and tried to log back in, but after entering my user and password I get a webpage with only two lines on it: Warning: exec(): Unable to fork [logger -t webGUI 'Successful login user root from 10.0.1.28'] in /usr/local/emhttp/login.php on line 97 Warning: Cannot modify header information - headers already sent by (output started at /usr/local/emhttp/login.php:97) in /usr/local/emhttp/login.php on line 98 Fortunately, it seems like the server is otherwise working ok. I can access the shares, and the dockers I use like Krusader, Plex, and Deluge are still working and accessible. It seems to be just the WebGUI that has been affected. Any help would be very usefull
  18. The original MB was only $60 4 years ago. In fact the whole build 4 years ago was only around $300. Buying an older MB now just adds to that cost and leaves me little room to start exploring all the cool new options available to me. I'm just gonna put together a modern bargain system that I can start working with VMs and various applications.
  19. Thank you all for your input. I must admit that I too am now leaning towards the MB being the most likely problem. Still frustrated because a total rebuild was not exactly in the budget at the moment, but at least now I get to build the system that will take me into the next 4-5 years.
  20. Installed the Dynamix System Temperature plugin Temp in the closet is still 70F CPU temp is 32C (89.6F) MB temp is 33C (91.4) Drives are all reporting 27-29C (80.6 - 84.2C) System is oriented so the disks are on bottom, MB and PS are on top. The case fan is oriented directly upwards so air flow is from bottom to top.
  21. The closet is ventilated and temperature controlled. It is currently 70 deg F in the closet. During the Memtest the cpu never exceeded 41 deg C (105 deg F)
  22. Yep, now it's at least trying to restart itself. Moments ago I hear my desktop computer make an alert noise, I rushed to check it and found a warning from my security camera system that it couldn't find the designated storage location. I looked at the telnet log and nothing had been registered for about 10 minutes I checked the web GUI and it showed an uptime of only one minute. I clicked on plugins then FCP, and troubleshooting mode was now off. I clicked on dashboard and this time I got a notice that the webpage could not be found. I went to check the system and it was shut down. I pulled the boot drive and found the attached FCP log. FCPsyslog_tail 22 Feb 1643.txt
  23. Yep, purpose built to be very basic, only serves as central storage for my media files, backups, and security camera files. PS was the first thing I replaced. New one has only been plugged in for a little over a week. New PS made no difference. Air circulation and cleanliness - I dusted everything off pretty good while I was replacing the PS, it was not exceptionally dusty, probably less dusty than the last time I replaced a hard drive. I gave everything a good looking ove when I put it back together and the CPU fan, PS Fan and case fan were all spinning freely. Now let me tell you about my experience with Memtest... The first time I ran MEMTEST, it finished 1 pass and showed no errors in the memory, but the system rebooted on it’s own at about the 15 minute mark. Since I had 2 memory chips and 4 slots I tried running MEMTEST with the chips in every possible configuration to isolate which one might be causing the problem, but no matter how I arranged the chips, MEMTEST always rebooted the system after anywhere from 15 to 30 minutes (never showing any errors in the chips however). So I ordered some new chips. While waiting for the new memory to arrive, I tried to bring the system back online and discovered that now it was not recognizing the built in eth0 on the motherboard. The port had blinking green lights and my router recognized that it was there, but the server would not communicate. I found a post in the forums that suggested installing a virgin copy of unRAID on a different drive to see if it boots properly, so I gave that a try. Low and behold, the new install of unRAID on a 10 year old 2GB thumb drive booted right up, no problems. I ran MEMTEST on my original memory chips, in their original configuration, for about 2 hours without any problems. When the new chips arrived I installed them in the remaining 2 slots (now I have 4 chips installed) and started MEMTEST which continued this time for over 12 hours, ran 6 passes on all 4 chips finding 0 errors. I brought the system online and It ran for about another 39 hours without a problem so I shut it down, moved it from the workbench back into it's closet and brought it back online. That was yesterday. It was up for 24 hours before it shut itself down again. I've had at least 2 shutdowns today, and I noticed that on 1 of them it restarted itself
  24. Prior to the problems, I had only Community Applications installed, Docker was turned on, and I had OwnCould installed, but not turned on. Since I created a virgin unRAID, I have only Community Applications and FCP. Docker is turned off and I have not re-installed Own cloud. No VMs before or after.
  25. My system has been suffering from random shutdowns for about the last 4 weeks. It's been happening on average once a day, but I can go 2 or 3 days without one only to have 3 or 4 shutdowns in one day. I've got about 2 pages of notes, and over a dozen logs that I've recorded. Most of the time nothing shows up in the tail of the logs, but sometimes it does show what appears to be a controlled shutdown. Summary of what I've done for troubleshooting: Disconnected the UPS and plugged right into the wall Replaced the power supply Tested, then replaced, then added more memory. Created a virgin copy of unRAID on a new thumb drive and restored my configs. Updated Motherboard BIOS Started FCP Troubleshooting mode after (almost) every restart TELNET in and start " tail -f /var/log/syslog" after (almost) every restart I've been over, up and down, and all around all the settings looking for something out of the ordinary At the time the problem started, I had recently been trying to set up a share to use as a TimeMachine for my Mac systems, but couldn't get it to work right. I used MC to go in and delete all the files from that share and went back to using it for a Carbon Copy Cloner backup instead. The system currently consists of the following: unRAID 6.4.1 (was on a 6 year old PNY 8GB thumbdrive, now on a 10+ year old unknown make 2GB thumbdrive) ASUS M5A78L-M/USB3 Motherboard AMD Sempron Single-Core 2.8 GHz Socket AM3 CPU Crucial 2GB (2 x 1GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10600) A-Tech 2GB kit (1GBx2) DDR3 PC3-10600 DESKTOP Memory Modules (240-pin DIMM, 1333MHz) (these are brand new, purchased as a part of troubleshooting) Thermaltake Smart 500W 80+ White Continuous Power ATX 12V V2.3/EPS 12V Active PFC Power Supply PS-SPD-0500NPCWUS-W (original power supply was a LEPA N Series N500-SA 500W ATX12V Power Supply) 5x Seagate Drives (variety now that I have added more drives and replaced 2 failed drives since building it) This setup had worked flawlessly since I built it almost 4 years ago. The next step I am thinking about taking is disconnecting the 2 oldest Seagate Drives to see if they might be causing serious power drains (one of the drives is over 10 years old and the other is about 8 years old. The 10 yr old one is always reporting some sort of problem. Neither of them are currently storing any files on the array.) I sure could could use some thoughts or suggestions otherwise pretty soon I will have just built a brand new system. Thanks FCPsyslog_tail 18 feb 1430.txt FCPsyslog_tail 22 Feb 1100.txt