dmoney517 Posted March 9 Share Posted March 9 I am in the process of upgrading my build. New Hardware, Asus PRIME Z790-V WIFI Intel i9 12900k 32GB GSkill Ripjaws DDR5 2x LSI 9201-8i 6Gbps SAS PCI-E HBA P20 IT Mode ZFS Once I got everything hooked up, I couldnt boot to my flash drive, so fter some digging I saw that I needed to chagne my Flash drive from EFI- to EFI. Saved, and I finally got to boot. During the boot I noticed alot of "Power-On or Device Reset Occured" errors in the startup logs. And sure enough, once the array came up, I noticed that a disk was missing. I had to disks unassiged on standby, so I decided to just run a rebuild, and I would sort out the missing disk another day (Its late, I want my array up, and I am tired). Well after about 5 minutes of waiting on "Array Starting, Mounting in Progress..." it came up, but now one of my Parity disks is also in disable state. both disks are on the same HBA, so, I think there is an issue with the HBA. But I am also seeing incredibly slow parity. around 1 MB/s. I know the HBAs I have arent top of the line, but based on my math and research, I thought they would be faster than that. What options do I have? I know I need to check the HBA, but now with potentially a missing parity disk and a missing data disk, I am afraid to do anything. I also dont like the idea of parity build running for 30 days! I would love some guidance - Diagnostics attached. Please let me know if there are any questions or more info needed. nas-diagnostics-20240309-1846.zip Quote Link to comment
dmoney517 Posted March 10 Author Share Posted March 10 Update - my "good" parity disk is now failing with seek error rates. Am I going to lose data? Quote Link to comment
JorgeB Posted March 10 Share Posted March 10 Looks more like a HBA problem, it keeps resetting: Mar 9 18:40:08 NAS kernel: mpt2sas_cm0 fault info from func: _base_make_ioc_ready Make sure the HBA is well seated and sufficiently cooled, you can also try a different PCIe slot, then post new diags after array start. Quote Link to comment
dmoney517 Posted March 10 Author Share Posted March 10 Update again - P2 Seek error rates have disappeared. If I check the drive details, it is no longer in a failing state, and seek error raw value is at 0 (it had climbed to like 31457. I don't know what that means. Rebuild speed climbed up to 150MB/s. Which seems about normal. I went to sleep, but looks like my parity tuning paused overnight. I resumed this morning, and it is running at 90 MB/s, to complete in 4 hours. My plan is to let the rebuild finish. Shut down the array, reseat the HBA and check the cables, start back up and rebuild P1. I'm going to buy a few new drives to replace the P2 that is acting strangely. Quote Link to comment
dmoney517 Posted March 10 Author Share Posted March 10 3 minutes ago, JorgeB said: Looks more like a HBA problem, it keeps resetting: Mar 9 18:40:08 NAS kernel: mpt2sas_cm0 fault info from func: _base_make_ioc_ready Make sure the HBA is well seated and sufficiently cooled, you can also try a different PCIe slot, then post new diags after array start. Thanks, I will do that! Quote Link to comment
dmoney517 Posted March 10 Author Share Posted March 10 Rebuild finished. After messing around with cables and the seating of the hba, I was able to get everything in a usable state, all drives being recognized. However. Tailing the log I am still seeing the power on error repeatedly for drive sdm. This drive is currently part of unassigned devices. I am ready to rebuild the dsbled parity disk. Should I shut down and pull sdm before I start the parity build? Latest diagnostics attached. nas-diagnostics-20240310-1347.zip Quote Link to comment
dmoney517 Posted March 10 Author Share Posted March 10 Screen shots of error on console and sdm smart report attached. Quote Link to comment
dmoney517 Posted March 10 Author Share Posted March 10 6 hours ago, JorgeB said: Looks more like a HBA problem, it keeps resetting: Mar 9 18:40:08 NAS kernel: mpt2sas_cm0 fault info from func: _base_make_ioc_ready Make sure the HBA is well seated and sufficiently cooled, you can also try a different PCIe slot, then post new diags after array start. I initiated the Parity build of my P1. Still seeing alot of odd stuff in the logs, Not sure what it means. Latest Diagnostics here. nas-diagnostics-20240310-1424.zip Quote Link to comment
JorgeB Posted March 11 Share Posted March 11 Looks more like power/connection issues, can also be a weak PSU. Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 I'm using the same PSU I was using on my previous build. I went from a dual xeon system on an E-ATX board to an i9 on micro itx. The existing psu I would think would work. Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 3 hours ago, JorgeB said: Looks more like power/connection issues, can also be a weak PSU. Things went from bad to worse. I switched to brand new cables, pulled the disks that were throwing errors. Restarted, tried to rebuild using a new disk from unassigned. Now both my parity are in a bad state, and I am still seeing very slow rebuild times. I do not know how to recover from here at this point...Both my parity disks were plugged directly into the motherboard. nas-diagnostics-20240311-0921.zip Quote Link to comment
Solution JorgeB Posted March 11 Solution Share Posted March 11 Mar 11 09:22:21 NAS kernel: sd 9:0:0:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:6:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:1:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:2:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:5:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:3:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:4:0: Power-on or device reset occurred Still looks like a power/connection issue. 2 minutes ago, dmoney517 said: I do not know how to recover from here at this point Unassign both parity drives, start array, stop array, reassign both parity drive, start array, ideally after trying to fix the issue, or it will likely happen again. 1 Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 1 minute ago, JorgeB said: Mar 11 09:22:21 NAS kernel: sd 9:0:0:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:6:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:1:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:2:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:5:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:3:0: Power-on or device reset occurred Mar 11 09:22:21 NAS kernel: sd 9:0:4:0: Power-on or device reset occurred Still looks like a power/connection issue. Unassign both parity drives, start array, stop array, reassign both parity drive, start array, ideally after trying to fix the issue, or it will likely happen again. At this point I am not sure of the issue...I have reseated the HBAs and switched to all new cables. If the HBAs are bad, I wont get new ones until wednesday. I dont have anything near me I can pick them up from. I am going to get a new PSU as well. It wont be here until tomorrow. I am currently on a 500w. I assumed it would be enough as it was powering my previous system with no issues. I am getting a 750w. The board is brand new. If it is a bad board, I can get it replaced on Thursday, after I test the new HBAs and PSUs. Also going to get new drives in case they are a problem. My simple, straight forward upgrade has become very expensive and a huge headache. Tempted to put my old board back and return everything. Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 (edited) Disks both crashed during parity... 1 of them was plugged directly into a SATA port on the motherboard. The other is plugged into an HBA. I feel like the disks are bad? Could that be the issue? Or is it more likely power supply / cable related? Im afraid to do anything now. All my dockers are off and I am not writing anything to the array. If I power down, and a drive comes up missing, Im no longer protected by Parity correct? What is the safest way forward? Edited March 11 by dmoney517 Quote Link to comment
JorgeB Posted March 11 Share Posted March 11 Seems unlikely to me that the problem are the disks, but reboot or power cycle to see if they come back online, then post new diags, so that we can check SMART for both. Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 13 minutes ago, JorgeB said: Seems unlikely to me that the problem are the disks, but reboot or power cycle to see if they come back online, then post new diags, so that we can check SMART for both. Restarted server. I did not touch any cables or cards. Did not start array. Diagnostics attached. I really appreciate your help! I am in a bit of a panic right now. nas-diagnostics-20240311-1107.zip Quote Link to comment
JorgeB Posted March 11 Share Posted March 11 SMART for parity looks OK, parity2 is not online Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 All the error counts for read error and seek error rate don't matter? I just ran to best buy and bought a new psu. Currently using the 500w that powered my old dual xeon 5650 setup. Just grabbed a 750w. Quote Link to comment
JorgeB Posted March 11 Share Posted March 11 https://forums.unraid.net/topic/86337-are-my-smart-reports-bad/?do=findComment&comment=800888 Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 2 hours ago, JorgeB said: SMART for parity looks OK, parity2 is not online New PSU hooked up. Re-cabled and seated everythjing again while attaching the new PSU (750w). All drives are online again. Have not started the array. Diagnostics attached. Should I reassign Parity and start the array? nas-diagnostics-20240311-1458.zip Quote Link to comment
JorgeB Posted March 11 Share Posted March 11 Yes, try again, SMART for parity2 also looks OK. Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 1 minute ago, JorgeB said: Yes, try again, SMART for parity2 also looks OK. Should I leave P1 in dsbl and add a new disk to P2? Quote Link to comment
JorgeB Posted March 11 Share Posted March 11 You can try to sync both at the same time. Quote Link to comment
dmoney517 Posted March 11 Author Share Posted March 11 Holy moly - No errors in syslog like were coming on previous tries. Rebuild is cruising at 160 MB/s. Attached one last diags...But I am hopeful that the new PSU is fixing this problem! Note - I did see some Power On Failures in the start up when i booted the system. nas-diagnostics-20240311-1532.zip Quote Link to comment
Bizquick Posted March 11 Share Posted March 11 500 watts PSU does seam a bit low with that many drives and HBA cards. older 4TB and lower Spin Drives do take some power when I ran 6 4TB WD reds. I made sure my PSU was 650. kind of surprised your dual Xeon box didn't have power issues. unless the CPU's just constantly ran in low power mode. Which can happen if your Bios is not set to turn off C-states and a few other things all depending on what the BIOS lets you do. glad to hear it sounds like your up and running now. I would double check your HBA Firmware to make sure its P20. Sometimes I've gotten a few of those old 9201 cards and I've had to reflash them because who ever sold them didn't do it right or just claimed they did. But your log shows that's good I don't know how to read everything in the logs yet still learning. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.