Yekul Posted March 18, 2021 Share Posted March 18, 2021 (edited) So looking to jump into Unraid as i've found myself with a good opportunity to 'start fresh'. Downloaded it, opted in for trial to make sure it fits my purpose. One of my drives (the only SMR drive) reported a slew of SMART errors so I thought i'd do a bit intense scan to see what's going on. Went to bed, woke up to check it via web gui and it can't connect anymore. Ok weird I thought, it must have crashed. Was completely unresponsive, would just do the standard time out you get with no connectivity. So force restarted. Everything comes up fine of course, there was no pools or anything yet anyway. But in the disk profile page is lists the SMART scan as being interrupted. So check some other drives, add my first to the pool and begin the function of adding to pool and then starting the format. Again, it becomes -completely- unresponsive. No other pages will load at all. So I guess i'm just curious, is this normal? It doesn't give a progress bar or anything, all I have is 'started, formatting' and then I have to keep coming in and hitting refresh to see if it finishes? And if this is normal and it locks you out of the UI intentionally, follow up question; does this mean the shares themselves go offline every time something has to be formatted to be added to the pool etc? 8TB is the smallest drives I have so that's a lot of downtime if so particularly during the setup process where i'm likely to add/remove a few drives over the next few weeks. Edited March 21, 2021 by Yekul Quote Link to comment
JonathanM Posted March 18, 2021 Share Posted March 18, 2021 Not normal at all. I suspect either a bad HBA, cable, connection or drive is causing the system to hang waiting on a valid response from the drive. Quote Link to comment
Yekul Posted March 19, 2021 Author Share Posted March 19, 2021 Yeah I thought surely this can't be right. Thing is, it has happened with two different drives. Bit reluctant to power down right now as it's mid formatting, and I can hear it actually access the drive every so often when it's quiet. What's the best course of action here? So to clarify, using two different cables, two different drives, two different sata slots, I could try and swap motherboards, but this board was working fine previously and it has no problems actually accessing the drives. I can run the short SMART tests no problem for example, and track the progress through 10->100% status. The long test I walked away after it started, but it wouldn't connect to unraid via local GUI when I powered on my main PC this morning so I couldn't say if it had increased in % or was still locked to 10%. Quote Link to comment
itimpi Posted March 19, 2021 Share Posted March 19, 2021 A format should only take a few minutes so it sounds as if you have some underlying problem. Quote Link to comment
Yekul Posted March 19, 2021 Author Share Posted March 19, 2021 Hrm yeah wasn't sure if the format was meant to be doing a preclear type situation where it zeros the drive or not. Any ideas where to look for issues? As I said, brand new to Unraid and just trialing to make sure it's fit for purpose before jumping over. Haven't updated BIOS in a very long time but it's an old board (ab350m) so doubt there'd be anything too pressing, not like it's an x570 etc that is probably still receiving a lot of software related updates that may effect one thing or the other. Quote Link to comment
Vr2Io Posted March 19, 2021 Share Posted March 19, 2021 (edited) 5 hours ago, Yekul said: does this mean the shares themselves go offline every time something has to be formatted to be added to the pool etc? No, such loading quite little to the system. 1 hour ago, Yekul said: The long test I walked away after it started You should follow it, long test ( extended test ) no system involve, it is a kind of selftest. Check back any abnormal in SMART and test progress. 5 hours ago, Yekul said: Was completely unresponsive Disk problem can cause this behavior. Edited March 19, 2021 by Vr2Io Quote Link to comment
Yekul Posted March 19, 2021 Author Share Posted March 19, 2021 Restarted (had to hard reset with power button) and the drive is available and the array loads up fine with the one disc. Not really sure what's going on tbh... Quote Link to comment
Vr2Io Posted March 19, 2021 Share Posted March 19, 2021 (edited) If long test fail or won't complete, this is disk problem and need replace or RMA. To avoid system ( GUI ) unresponsive again ( supopose, you still can access through telnet / ssh ) , you can try stop array then perform long test. Some people will do monthly check by long test or parity check to ensure disk in heath. ** Pls stop auto spindown in 6.9.x, otherwise it will cause test aborted ** Edited March 19, 2021 by Vr2Io Quote Link to comment
Yekul Posted March 19, 2021 Author Share Posted March 19, 2021 So everything was going fine, started the extended offline test again. I could navigate to different pages, no unresponsive aspects at all. Then around ~30% or so through the test, same thing as before. Systems locks up and wont let me access anything. Can't SSH to server either. Happens with both drives tested, and one of them is brand new and passed testing no problems. Obviously not saying it's impossible for both to be faulty, but would be a hell of a coincidence. I just can't figure out what is going on, because unraid doesn't appear to power down or restart due to this. Quote Link to comment
Vr2Io Posted March 19, 2021 Share Posted March 19, 2021 (edited) How about the test history showing, pls show it as above pictures for individual disk. Edited March 19, 2021 by Vr2Io Quote Link to comment
itimpi Posted March 19, 2021 Share Posted March 19, 2021 1 hour ago, Yekul said: So everything was going fine, started the extended offline test again. I could navigate to different pages, no unresponsive aspects at all. Then around ~30% or so through the test, same thing as before. Systems locks up and wont let me access anything. Can't SSH to server either. Happens with both drives tested, and one of them is brand new and passed testing no problems. Obviously not saying it's impossible for both to be faulty, but would be a hell of a coincidence. I just can't figure out what is going on, because unraid doesn't appear to power down or restart due to this. You are likely to get better informed feedback if you attach your systems diagnostics zip file (obtained via Tools->Diagnostics) to your NEXT post. Ideally you want this with the array started and after encountering your problem Quote Link to comment
Yekul Posted March 19, 2021 Author Share Posted March 19, 2021 5 minutes ago, itimpi said: You are likely to get better informed feedback if you attach your systems diagnostics zip file (obtained via Tools->Diagnostics) to your NEXT post. Ideally you want this with the array started and after encountering your problem How do I do this when the system becomes unresponsive? I thought the log files cleared on reboot. Which is the only way I can access it, by doing a forced reboot. I installed the Fix Common Problems plugin, but couldn't find the Troubleshooting mode previously mentioned in older threads (I guess this actively wrote the logs somewhere else constantly so they'd be stored on a physical drive and not RAM?). Quote Link to comment
itimpi Posted March 19, 2021 Share Posted March 19, 2021 You can enable the syslog server (under Settings -> syslog server to gather logs that can survive a reboot. 1 Quote Link to comment
Yekul Posted March 20, 2021 Author Share Posted March 20, 2021 Ok so I tried to do this, it doesn't seem to be persisting through restarts. I have it set to ->settings->syslog server: local syslog server: enabled local syslog folder: app data mirror syslog to flash: yes (temporarily to troubleshoot, not like it writes much anyway with my current setup) However the diagnostic file still clears. I hooked up a monitor and noticed when I hit the power button and it started the 'graceful shutdown' that happens it starts doing that fine then gets to 'Starting diagnostic collection...' and then just seems to hang. So I suspect that's why i'm not actually getting any useful log files? The syslog.txt file inside the diagnostics folder is only showing information since the restart, nothing before. Attached logs anyway fwiw, but really just shows recent restart and one of the failing drives (same thing happens without it attached btw, as I only just plugged it in). picklerick-diagnostics-20210321-1021.zip Quote Link to comment
Yekul Posted March 20, 2021 Author Share Posted March 20, 2021 Addition to the above, I just tested the power button 'graceful' shutdown when the system is responsive still and it works completely fine. Shuts down correctly no problems at all. Quote Link to comment
Yekul Posted March 21, 2021 Author Share Posted March 21, 2021 Tentatively solved. Downgraded to confirm it wasn't a 6.9.1 issue. Still present. Then found C1 state enabled in BIOS, since disabling appears to be holding steady. Likely would have stopped being an issue when I swapped to a new intel setup in a few days time anyway, but posting a reply in case others stumble on this using older hardware like mine (Ryzen 1700, AB350M). 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.