Spyderturbo007 Posted November 19, 2020 Share Posted November 19, 2020 I woke up to this email from my server for my 6TB parity drive. Event: Unraid Parity Disk Error Subject: Alert [TOWER] - Parity disk in error state (disk dsbl) The GUI shows the disk as having; 22,756,061,247,961 Reads 18,446,744,073,704,421,376 writes 808 errors My assumption is that the drive is toast so I'm going to order another drive, but I have a few questions. 1. Is the safest thing to stop the array until the new drive gets delivered an installed? I only have one parity drive so another drive failing would mean data loss. 2. Does the new drive need to go through pre-clear? Thanks! Quote Link to comment
itimpi Posted November 19, 2020 Share Posted November 19, 2020 It could just be a case of the drive dropping offline for some reason and the drive is actually fine. If you post your system’s diagnostics zip file (obtained via Tools -> Diagnostics) in its current state we should be able to determine if this is the case. If it has dropped offline, then diagnostics taken after power cycling the server should give a better idea of whether the drive really has problems. Quote Link to comment
Spyderturbo007 Posted November 19, 2020 Author Share Posted November 19, 2020 Thanks itimpi. I'll work on getting that later today. I got paranoid and stopped the array and since I run a pihole docker, my Internet is down at the house. I didn't have time to edit DNS before I had to run out of the house for work. Quote Link to comment
trurl Posted November 19, 2020 Share Posted November 19, 2020 1 hour ago, Spyderturbo007 said: My assumption is that the drive is toast Based on what we have seen that is wrong more often than not. Quote Link to comment
Spyderturbo007 Posted November 19, 2020 Author Share Posted November 19, 2020 I was able to get the diagnostics as requested. The only thing I did this morning was stop the array after receiving the message. I've never had this happen before so it's a little odd. Thanks so much for taking the time to help me with this problem. tower-diagnostics-20201119-1323.zip Quote Link to comment
trurl Posted November 19, 2020 Share Posted November 19, 2020 SMART for parity looks OK but looks like it was disconnected as sdk and reconnected as sdo. Your syslog goes back a few months and it was all good until now. Anything you can think of that might have disturbed the connections? Doesn't look like any SMART tests have been done on that disk so you might try an extended SMART test on it and if it passes you can rebuild to that same disk. Quote Link to comment
trurl Posted November 19, 2020 Share Posted November 19, 2020 Also many of your disks are still ReiserFS. Quote Link to comment
Spyderturbo007 Posted November 19, 2020 Author Share Posted November 19, 2020 No changes other than normal updates. It's a SuperMicro rack mount chassis with a backplane, so I can't see how any connection issues would effect a single drive. Don't these point to a drive issue though? You know more than me, but thought I'd ask so I can understand how it works. Nov 19 02:15:24 Tower kernel: md: disk0 read error, sector=4971446528 Nov 19 02:15:24 Tower kernel: md: disk0 read error, sector=4971446536 Nov 19 02:15:24 Tower kernel: md: disk0 read error, sector=4971446544 Nov 19 02:15:24 Tower kernel: md: disk0 read error, sector=4971446552 Nov 19 02:15:24 Tower kernel: md: disk0 read error, sector=4971446560 Nov 19 02:16:51 Tower kernel: md: disk0 write error, sector=4971446528 Nov 19 02:16:51 Tower kernel: md: disk0 write error, sector=4971446536 Nov 19 02:16:51 Tower kernel: md: disk0 write error, sector=4971446544 Nov 19 02:16:51 Tower kernel: md: disk0 write error, sector=4971446552 Nov 19 02:16:51 Tower kernel: md: disk0 write error, sector=4971446560 I'm not sure what to do about the ReiserFS. Quote Link to comment
trurl Posted November 19, 2020 Share Posted November 19, 2020 Those just show what sectors it failed to access, doesn't indicate why. There was a long thread (still pinned near the top of this subforum) about converting from RFS to XFS. TL;DR In order to change the filesystem of a disk, you must reformat it, so you have to put its data somewhere else. Quote Link to comment
Spyderturbo007 Posted November 19, 2020 Author Share Posted November 19, 2020 (edited) The extended SMART test is in progress. It's been on 10% for about 45 minutes. I'm not sure if that is normal or not? Does the array need to be started for it to run the test? I was actually just reading the wiki article on it. I read this part and thought I might just do it as my drives begin to need replacing. Some are quite old. "At this point, there is NO general recommendation as to converting existing Reiser drives, UNLESS you are having a known Reiser-related issue. Some feel it is a good idea to begin converting existing drives to XFS, but others do not think it is necessary, and may be an over-reaction to the previous now-fixed issues. At any rate, it does seem wise to consider a slow migration strategy, as drives are added." Edited November 19, 2020 by Spyderturbo007 Quote Link to comment
trurl Posted November 19, 2020 Share Posted November 19, 2020 10 minutes ago, Spyderturbo007 said: The extended SMART test is in progress. It's been on 10% for about 45 minutes. I'm not sure if that is normal or not? Does the array need to be started for it to run the test? It will take several hours depending on size Quote Link to comment
Spyderturbo007 Posted November 19, 2020 Author Share Posted November 19, 2020 Thanks. I wasn't sure what to expect. It's 6TB, so I'll leave the array offline and check back later tonight. I really appreciate the help Constructor. Quote Link to comment
itimpi Posted November 19, 2020 Share Posted November 19, 2020 46 minutes ago, Spyderturbo007 said: Thanks. I wasn't sure what to expect. It's 6TB, so I'll leave the array offline and check back later tonight. I really appreciate the help Constructor. The extended test reads every sector on the drive so I would expect something like 1 - 2 hours per TB. Quote Link to comment
Spyderturbo007 Posted November 20, 2020 Author Share Posted November 20, 2020 (edited) Morning all. It says "Completed without error". I'm attaching the SMART report and new diagnostics. Thoughts on what to do next? One weird thing is that if I click on Show, next to SMART self-test history, it says "No self-tests have been logged. (To run self-tests, use: smartctl -t). WDC_WD60EFRX-68MYMN1_WD-WX51D6422029-20201119-1515.txt tower-diagnostics-20201120-0805.zip Edited November 20, 2020 by Spyderturbo007 Quote Link to comment
trurl Posted November 20, 2020 Share Posted November 20, 2020 Might as well try rebuilding parity to the same disk. It doesn't actually have any of your files after all. Quote Link to comment
Spyderturbo007 Posted November 20, 2020 Author Share Posted November 20, 2020 How would I go about doing the rebuild on the disk? I'm terrified of losing anything. Quote Link to comment
trurl Posted November 20, 2020 Share Posted November 20, 2020 Rebuilding to the same disk is the same whether parity or data Stop array Unassign disk to be rebuilt Start array with disk unassigned Stop array Reassign disk to be rebuilt Start array to begin rebuild Quote Link to comment
Spyderturbo007 Posted November 20, 2020 Author Share Posted November 20, 2020 Thanks for the help. Parity rebuild is in progress and is estimated to take 1 day 11 hours. I'll report back when it's finished. Should I refrain from using the array I assume since the parity rebuild is in progress? I don't want to lose any data if another disk fails. I'm also thinking a second parity drive would be a good idea for a situation like this in the future, but thought I'd ask for your opinions? Quote Link to comment
Spyderturbo007 Posted November 21, 2020 Author Share Posted November 21, 2020 I received a message when I logged in this morning that said that the parity was valid. I'm attaching log files following completion of the rebuild. Can someone take a look at them for me and let me know what I should do next? Thanks! tower-diagnostics-20201121-1330.zip Quote Link to comment
trurl Posted November 21, 2020 Share Posted November 21, 2020 After rebuild I always do a non-correcting parity check just to verify. Quote Link to comment
Spyderturbo007 Posted November 25, 2020 Author Share Posted November 25, 2020 Parity check finished (0 errors). Duration 16h, 9m average speed 103.2MB/s. Should I just chalk this up as an unusual glitch and move on? Thanks! Quote Link to comment
trurl Posted November 25, 2020 Share Posted November 25, 2020 Make sure you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.