trurl Posted January 20 Share Posted January 20 Looks like you're having connection problems on parity2. 2 hours ago, Decay said: Memtest but tried it some different hardware an got many failures. So that was probably one thing that caused the problems. You must never attempt to run any computer unless memory is working perfectly. Everything goes through RAM. The OS and other executable code, your data, EVERYTHING. The CPU can't do anything with anything until it is loaded into RAM. Have you done memtest with your new hardware? Quote Link to comment
Decay Posted January 20 Author Share Posted January 20 I´ll start with your second question. No, I haven´t don it, but put in the usb stick right now and start the test. Connection problem on parity 2... I´ll check if the cables connected the right way or should I directly change the cables of the parity drive? Quote Link to comment
trurl Posted January 20 Share Posted January 20 For now, just make sure the SATA and power connection for the disk is seated well. Are there power splitters involved? Be careful you don't disturb other connnections. Quote Link to comment
Decay Posted January 20 Author Share Posted January 20 Yes, I checked if all is seated well, and it looked like it is. Yes, Power splitters are involved. Quote Link to comment
Decay Posted January 20 Author Share Posted January 20 (edited) I have done that. Both parity drives are connected to the mainboard and also the cache drive and one from the array. The rest is connected to an sata pcie card. The sata card has 4 slots free, I could try to plug the parity drive to the controller. unraid-diagnostics-20240120-1816.zip Edited January 20 by Decay Quote Link to comment
Decay Posted January 20 Author Share Posted January 20 (edited) I just checked the diagnostic file and the problem is also shown at parity drive 1. Haven´t seen that on any other drive in the diagnostic file. Edit: disk two has the same problem. I check if all are connected to the same power cable. Edited January 20 by Decay Quote Link to comment
Decay Posted January 20 Author Share Posted January 20 (edited) Disk2 is on a different power cable than the parity drives and there is no power splitter involved, at these drives.. Could the power supply be the problem or does every disk should have the same problem? Could it be that the hard disks are defect? Edit: If I am right, drive1 also have that problem now or I haven´t seen that. Just checked old diagnostic files and that drive one also had that problem. unraid-diagnostics-20240120-1952.zip Edited January 20 by Decay Quote Link to comment
Decay Posted January 20 Author Share Posted January 20 (edited) Could it be a smart idea to replace the discs with that failure? If I am not wrong, I have 4 drives with that problem: - 2 parity drives - drive 1 and drive 2 from the array (really old drives, which I wanted to replace) I bought three new drives and could replace both parity drives with new drives. When the parity drives are rebuild the parity I could replace drive 1 from the array and let it rebuild. After that I could copy all date from array drive2 to the new array drive1. When that is done I would also replace the other drives with new ones or try the old parity drive as a array drive. Why I got to this idea: When I googled a bit and searched for the problem, "occurred at disk power-on lifetime", i could find some reports which say´s that it could be a powerproblem but mostly it ended with drive problems. Perhaps it is a really bad idea to change them. Edit: Memtest passed the test without any error. Shall I post the logfile? Edited January 21 by Decay Quote Link to comment
JorgeB Posted January 21 Share Posted January 21 I don't see any disk issues in the two last diags posted, where are you seeing errors? Quote Link to comment
Decay Posted January 21 Author Share Posted January 21 (edited) I did not see any errors directly. I did some searching on the internet to narrow down the following error, and I saw the same error messages as these: Error 127 [6] occurred at disk power-on lifetime: 54389 hours (2266 days + 5 hours) When the command that caused the error occurred, the device was active or idle. I therefore assumed that it might make sense to replace the disks. I replaced the cables, checked all the plugs again and couldn't find anything wrong. Some time ago I had replaced the power supply unit with a larger one, could it be that the error occurred before and is now negligible? I have attached the log file from the memtest here. MemTest86-20240121-013132.log Edited January 21 by Decay Quote Link to comment
JorgeB Posted January 21 Share Posted January 21 24 minutes ago, Decay said: saw the same error messages as these: Depends on the complete error, usually only UNC @ LBA errors are a reason for concern, but suggest running an extended SMART test on all disks and then post new diags. Quote Link to comment
Decay Posted January 21 Author Share Posted January 21 Okay, I´ll do that. Hope that the test will finish today, have to leave tomorror for my job again. Post results as fast as possible. Quote Link to comment
Decay Posted January 26 Author Share Posted January 26 Finally back home. The first parity disk is still running the smart-selftest, running since monday morning. I knew, it could take some time, but I think it is a little bit too long, isn´t it? Attached the new diagnostics. unraid-diagnostics-20240126-1151.zip Quote Link to comment
JorgeB Posted January 26 Share Posted January 26 Disk2 passed the latest test but SMART is showing some issues and it failed previous tests, so possibly it will have some errors in the near future. Test on parity aborted, try again. Quote Link to comment
Decay Posted January 26 Author Share Posted January 26 Just startet the smart test again for both parity drives. Would it make sense to change the parity drives with new ones? I got three new drives here and I´ll order some for the array also. Want to shrink the array a bit and also want to change the old drives. Quote Link to comment
JorgeB Posted January 26 Share Posted January 26 Parity2 finished, you only need to do parity1. Both parity disks look healthy to me, though they are SMR, and that by itself may be a good reason to replace them. Quote Link to comment
Decay Posted January 26 Author Share Posted January 26 (edited) I accidently stopped the check on parity 1 and startet it agin last monday morning. Today it still was doing the samrt check, it was at 90% for days. Going to stop the check at parity 2, hope that parity 1 will run through this time. Edit: Checked the three new drives I got. They are Seagate drives, they should be cmr. Just for my understanding, should I avoid smr completly or just as parity drives? Most/all of the used drives are from WD, they are probably all smr drives. Should I stop the test and replace the drives and build the parity from scratch? After that I would replace all other drives. For my understanding it would make sense. Edited January 26 by Decay Quote Link to comment
JorgeB Posted January 26 Share Posted January 26 You can use SMR, but when used as parity it will degrade write performance for all the array, including writing to non SMR drives. 40 minutes ago, Decay said: Should I stop the test and replace the drives and build the parity from scratch? If you plan to replace them might as well. Quote Link to comment
Decay Posted January 26 Author Share Posted January 26 Just me is using the server, so the write performance was okay for me but I think I´ll replace the parity drives with non smr drives and rebuild parity. It will take some time, I´ll report the progress if it is done. The third new drive I´ll change with drive 1 of the array and copy all data from drive 2 to drive 1. Just ordered three more drives, these will replace the rest of the drives. It is hard to get new ones, most shops are sold out in my area. @trurl Have you seen the memtest log? For me it looks good or did I missed something? MemTest86-20240121-013132.log Quote Link to comment
Decay Posted January 27 Author Share Posted January 27 I think parity looks better now. Just did a short smart test till now. Wondering if I should give the dockers a try again or do you see any problems with that? unraid-diagnostics-20240127-1040.zip Quote Link to comment
JorgeB Posted January 27 Share Posted January 27 Looks OK to me so far, try using docker. Quote Link to comment
Decay Posted January 27 Author Share Posted January 27 (edited) Docker failed to start. Could it be possible that the docker.img, which was recreated days before, is corrupted? It was recreated with the old hardware, which had RAM problems and suddenly completly stopped to work? To be hones, I don´t understand the docker logfile from the diagnostic. Edit: Fix common Problems plugin shows: Unable to write to cache - Drive mounted read-only or completely full. The cache drive isn´t full and it does not show that it is read-only. unraid-diagnostics-20240127-1229.zip Edited January 27 by Decay Quote Link to comment
JorgeB Posted January 27 Share Posted January 27 Cache pool is now showing filesystem issues: Jan 27 12:19:01 unraid kernel: BTRFS critical (device sdb1): unable to find logical 9836321250464530432 length 4096 Jan 27 12:19:01 unraid kernel: BTRFS critical (device sdb1): unable to find logical 9836321250464530432 length 16384 And because of that it went read-only, since the docker image is there it won't also work, with btrfs I recommend backing up what you can and reformatting the pool, also btrfs was detecting data corruption on the pool, so keep and eye on it to see if more corruption is detected after reformatting, if yes there could be an underlying hardware issue. Quote Link to comment
Decay Posted January 27 Author Share Posted January 27 I also got a new cache drive, I probably do a backup and change the drive. Do you recommand a manual backup or just set alll back to array? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.