akshunj

Members
  • Posts

    55
  • Joined

  • Last visited

Everything posted by akshunj

  1. You'll need to attach logs so we can see what's happening. Tools--> Diagnostics-->Download
  2. New ssd is installed. Picked up an ADATA SU720. Let it sit for a few hours, then transferred my dockers and vms. Zero ata errors. Not sure how to mark this thread solved, but it is (knock on wood). The moral of the story, I think, is not to buy refurbished drives on ebay. Thanks for everyone who pitched in to assist. I really appreciate it.
  3. Issue is definitely related to SSD and controller interaction. 72hours running without ssd and no ata errors. All dockers running as before. Will update when my new ssd arrives in a couple of days
  4. Removed both ssd cache drives. Rebooted and restarted array. Logs are clean. No mention of the ata kernel error. Yet. Going to recreate my docker image and bring the dockers back up this eve. New ssd arrives next week. Will install as cache when it arrives and give it another go. It looks like the controller passed the kernel incorrect state info about multiple drives, although it was instigated by the ssd drive. I am going to assume @JorgeB is correct that certain drive models are no good for certain controllers (at least the ones in my signature). I find this odd as one controller is for a server, and the other is for a desktop. It's also weird that the drives worked in my dock, but not attached via internal sata. I looked up my purchase on ebay. These were *refurbished* drives. This was my mistake; I could have sworn they were new. I am hoping this is the root of my ACTUAL problem. Regardless, I will be using the warranty. The new drive I ordered is new, so maybe that plus the new make/model will work out. Still going to monitor the array for ata errors. The crash a few hours ago was *rough*.
  5. upgraded to beta and was immediately greeted by this: Oct 1 13:59:50 TChalla kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Oct 1 13:59:50 TChalla kernel: sas: ata7: end_device-7:0: cmd error handler Oct 1 13:59:50 TChalla kernel: sas: ata7: end_device-7:0: dev error handler Oct 1 13:59:50 TChalla kernel: sas: ata8: end_device-7:1: dev error handler Oct 1 13:59:50 TChalla kernel: sas: ata9: end_device-7:2: dev error handler Oct 1 13:59:50 TChalla kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1 I am going to remove the cache drive, and restore my appdata and dockers onto the array. This issue has never taken out my data drives before. Just need to reassure myself this is still an issue with the cache drive
  6. This system was my backup server for a month with zero issues. When I swapped in the drives from my main server, this began.
  7. Going to try that now. Server just crashed in spectacular fashion. First the ssd, then two more data drives. It was pretty unbelievable. Attached diagnostics from the latest fiasco tchalla-diagnostics-20201001-1323.zip
  8. So, after migrating my unraid install to the Xeon server, and plugging the ssd cache drives into the sata bays, the same error has returned with a vengeance. I have attached my logs in case anyone wants to take a peek. The fun happens on 9/30 around 1530 or so, if memory serves. I realize this is not particularly an unraid issue, as it really has nothing to do with the array, the software or anything other than how the kernel interacts with the controller and the device. I took the array offline, and ran an extended smart test on the drive that got kicked off (it passed). I pulled the drive last night to see if the remaining ssd would generate similar kernel errors on its own. Shortly after I restarted the array, I got this: Sep 30 21:31:36 TChalla kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Sep 30 21:31:36 TChalla kernel: sas: ata16: end_device-1:0: cmd error handler Sep 30 21:31:36 TChalla kernel: sas: ata16: end_device-1:0: dev error handler Sep 30 21:31:36 TChalla kernel: sas: ata9: end_device-1:2: dev error handler Sep 30 21:31:36 TChalla kernel: sas: ata10: end_device-1:3: dev error handler Sep 30 21:31:36 TChalla kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1 Drive remained alive and connected, and it's fine right now. Here's how I am thinking about this issues: 1. I am going to continue using the single ssd for now; maybe the drive that got kicked really was bad. However, if it was, I wonder why I worked fine in the sata dock 2. These were twin drives... same make and model. If there's a kernel issue with how this drive talks to the controller, maybe a new drive of a different make & model will fix the issue. Already ordered one. 3. If steps 1 & 2 don't work, I will upgrade to the latest unraid beta and try the new kernel. That's every option I can think of, at this point. I have replaced cables (power & sata), moved them from one sata controller to another, and the only way I got them to behave was to use a sata to usb3 dock. I am open to any other diagnostic suggestions. I am going to hit the slackware subreddit, and see if there may be some insights from those guys. Thanks all. tchalla-diagnostics-20200930-1535.zip tchalla-smart-20200930-1644.zip
  9. I'm relatively new to unraid, but I do have some linux knowledge. Are you on stable (just curious)? From your syslog, it looks like your cache filesystem has become corrupt (maybe from your first dirty reboot?) and the kernel is squawking about not being able to mount it. I don't think the cache filesystem is mounted in safe-mode, but I could be wrong (again, I am an unraid noob). I think there's a way to repair the file system from safe mode using "maintenance mode" on the webgui, but someone else would have to give you more guidance. I'm also not sure if this is your only problem.
  10. update: getting the ssd drives off sata and onto usb3 has fixed the issue. Usb3 is NOT a great alternative at all, but everything is stable and cache performance is... fine. I just bought a used xeon server and intend to migrate my drives over to this and upgrade to unraid beta. I am curious to see if my SSD's will fare better on the new mobo and upgraded kernel. Sidenote: Digging into unraid logs, I noticed a few Slackware references. I learned linux on Slackware 20 years ago. I think it's cool as hell that I'm using it today via unraid.
  11. Hmmm, I think I will check it out. How stable are the unraid beta releases?
  12. So, after changing power cables, sata cables, sata ports, power supply, UPS, and testing ram, I could only conclude that this linux kernel + these ssd drives (Crucial sata) do not play well together. I have switched the ssd drives to a usb3 sata dock and no issues so far. At some point, I will upgrade the ssd drives and try again via sata, but that time is not now. Thinking out loud, I know unraid uses the older LTS linux kernel. But I have no idea how bug fixes and security updates to that kernel are handled. I hooked the ssd's up to my home theater plex box via sata, and not a single error (different mobo, so obviously not apples to apples). It runs the latest version of linux mint with kernel 5.4. I wonder if some of the ssd issues that many experience on this forum are related to the older kernel version used by unraid.
  13. I hear that. I am going to backup appdata regularly and give it a spin. I'm not wild about buying new SSD's
  14. Thanks so much. The more I investigate, the more I think this is a hardware issue or kernel bug. That ATA error is from the kernel. I know my way around linux, but I'm still an unraid noob. After some googling, it appears this particular kernel error seems related to SSD's specifically. It's driving me nuts. Memtest was fine. Cables have been switched out. I had them on a pci-e sata card before this, and I removed the card and attached them to the sata mainboard thinking that the pci-e card was the issue. The hard drives seem fine, so I am left with two new-ish SSD's that are throwing out this kernel error at random times after boot. And both drives pass SMART tests. I am flummoxed. I am going to move everything back to the array, pull the SSD's from the sata connection, and try to use them as cache from a usb sata dock. I know throughput will be choked, but I am not sure how much that will translate to a real-world performance hit
  15. I am going to do a memtest to check the RAM first, then rebuild the docker image. There are so many posts about this type of issue. Not sure how to proceed
  16. So, same error just happened. This time affecting both cache drives in the pool. I did extended SMART tests on both drives and they passed. Moving on to the docker rebuild. New diagnostics attached, in case anyone wants to take a stab... tower-diagnostics-20200820-1649.zip
  17. changed out sata and power cables. Will rebuild docker image next, if issues continue. Both the SSD's in the cache pool are new-ish, so I am hoping it's not a drive issue.
  18. Thank you. I will give this a shot first and reply back here. Shout I try rebuilding the docker image also, as it seems susceptible to corruption?
  19. Hello all. Every few days, my dockers have dropped out and I am getting cache drive errors (I have rebooted to recover). After reading on this forum, I see that it could be a docker image corruption (I had a few dirty shutdowns in the past), a bad cable, or a bad cache drive. Wondering if I can get an assist in diagnosing? Logs are attached tower-diagnostics-20200819-0807.zip
  20. Thanks for the replies. I could not preclear the drive from my dock. The script would not run. But your replies helped me to educate myself a little more about the preclear script and what it's for. Thanks again
  21. Thanks for the reply! Is it relevant that I can get SMART data off the disk via the dock? Anywho, I will give it a spin. Worst case is that I need to preclear it again. Thanks again...
  22. Hello all. I have a usb3 Sata dock (https://amzn.to/3fokjTd) that I use to test out/explore old or used hard drives. I recently found a usable 4tb drive I want to incorporate into my array. Can I preclear it while it's in the dock (after formatting it to xfs)? Well I know I can preclear it, but will the preclear be persistent once I remove it from the dock and add it to my array via my internal sata bay? Or should I connect it to my internal sata bay first, and then preclear it? Hope that makes sense. Thanks
  23. I have been trying to adapt the Whipper docker for unRAID, and have failed miserably. @rix if you could add it to the Ripper docker, that would be a huge win. Thanks
  24. unRAID terminal command: docker run whipperteam/whipper (args) Runs as expected, except it spawns those additional docker images with the wacky names. Starting the docker from the unRAID UI does not generate any error messages. It simply refreshes to the same screen now showing the docker is stopped. And that makes sense as whipper is a really simple command-line app. There's nothing to start. You would only invoke it when you're ready to rip a CD. There's no daemon waiting for additional input.