trurl Posted April 21, 2019 Share Posted April 21, 2019 Are you supposed to have a disk4, it doesn't look like one was assigned. Disabled disk6 isn't showing in the SMART so it must have disconnected. Read errors on disk14. Possibly related to the controller issues also showing in syslog. Quote Link to comment
gideva Posted April 22, 2019 Author Share Posted April 22, 2019 Disk4 was just removed from array for the same reason (device disabled..) and I was supposed to replace it today. Disk6 was not in the SMART because when I took the diagnostic was under a reading check (I attach the last diag after check done OK). What I cannot understand is that in the last few weeks I had 4 disk failures in the same ways... One after the other I had disks failing with no apparent correlation... I checked the connections and they are ok and the PSU is working fine so now I start thinking that must be something else.... Any idea? I am really worried I will loose all the disks one after the other in days now! monstruo-diagnostics-20190422-0516.zip Quote Link to comment
JorgeB Posted April 22, 2019 Share Posted April 22, 2019 If the disks you've been having problems with don't share something in common like a miniSAS cable, controller, etc, try with different power supply. P.S. disk6 rebuild from a week ago will have some corrupt data since there were read errors on another disk during it. Quote Link to comment
gideva Posted April 22, 2019 Author Share Posted April 22, 2019 Ok... Now I have another issue! I decided to rebuilt disk 6 again and change it with a new one (a brand new WD red 8tb). I stopped the array and I tried to restart it again with lots of issues (the system was stucked). Finally I was able to start it again and now I have 2 disks saying: Unmountable: No file system. One is the faulty one and a second one that never gave problems before... What is happening? monstruo-diagnostics-20190422-1251.zip Quote Link to comment
gideva Posted April 22, 2019 Author Share Posted April 22, 2019 Little update... Just changed the disk slots and disk13 is back to work with no issues... and disk6 is rebuilding once again. There is no correlations between slots, cables or anything else so I am prone to think is the PSU. I will try to change it in a couple of days and we will see. Tomorrow I am gone for a couple of days so I do not know if I will be able to monitor if the rebuilt will be good (now is saying 18 hrs to go). Quote Link to comment
trurl Posted April 22, 2019 Share Posted April 22, 2019 1 hour ago, gideva said: I do not know if I will be able to monitor Do you have Notifications setup to alert you by email or other agent? You should. Quote Link to comment
gideva Posted April 27, 2019 Author Share Posted April 27, 2019 This thing is driving me crazy!!!! All of a sudden another disk gone (it says UNMOUNTABLE: NO FILE SYSTEM) and on top the parity-Sync/Data-Rebuild in progress says that there are 300 days to go!!! What the hell this is all about? It seems is a cascade effect and the disks are failing one after the other and on top the replacement of the failing disk is just moving the problem to the next one... (is the third one that I miss in a week) Quote Link to comment
trurl Posted April 27, 2019 Share Posted April 27, 2019 Did you already do this? On 4/22/2019 at 10:25 AM, gideva said: I am prone to think is the PSU. I will try to change it in a couple of days Post new diagnostics. Quote Link to comment
gideva Posted April 29, 2019 Author Share Posted April 29, 2019 Yes and no changes.... I changed the faulty disc and guess.... another disc failed (Unmountable: no file system)!!! I noticed one thing... It always happen to the next disc on the array (is a real cascade effect...). Any clue? Quote Link to comment
trurl Posted April 29, 2019 Share Posted April 29, 2019 7 hours ago, gideva said: Any clue? On 4/27/2019 at 3:36 PM, trurl said: Post new diagnostics. Quote Link to comment
gideva Posted April 29, 2019 Author Share Posted April 29, 2019 Here is the last diagnostics. The situation is like this... I tried to change the faulty disk with a new one, I preclear it and than I proceeded with the replacement. Something went wrong because the rebuilt reaching 49% was calculated around 300 days. For this reason I started all again from scratch going through the preclear procedure (apparently something happened during the first preclear) and a writing check. Than I started the rebuilt again and is when the following disk failed and anywat same result... 49.7% and 500 days to go!!! I understand that this is related to a mistake I did but now I need a solution (if there is one) for both problems: 1) Try to recover the faulty disk (if not possible no big deal since I am not going to loose sensitive/important files) 2) Try to find a final solution to this situation of cascade effect PS I tried to be as clear as possible hoping this will help you in anyway to pull me out from this nightmare monstruo-diagnostics-20190429-2003.zip Quote Link to comment
trurl Posted April 30, 2019 Share Posted April 30, 2019 What is the exact model of the new PSU you installed? Quote Link to comment
JorgeB Posted April 30, 2019 Share Posted April 30, 2019 Disk7 has a corrupt filesystem and is flooding the log, fix filesystem first the try rebuilding disk6 again, grab new diags if/when it fails. Quote Link to comment
witalit Posted April 30, 2019 Share Posted April 30, 2019 Damn the issues seem to be mounting, following thread for updates. Quote Link to comment
gideva Posted April 30, 2019 Author Share Posted April 30, 2019 50 minutes ago, johnnie.black said: Disk7 has a corrupt filesystem and is flooding the log, fix filesystem first the try rebuilding disk6 again, grab new diags if/when it fails. Just to be sure... (really do not want to f..k up again). 1) stop the array, then re-start it in mainteneance mode 2) Open GUI, click on the drive in the Main tab to get to the relevant dialog and start the repair. It's that correct? Quote Link to comment
gideva Posted April 30, 2019 Author Share Posted April 30, 2019 BTW. I will be out for work the entire month of May so not sure what can I do... I have a PC home connected to the server with Teamviewer installed so I might be able to access remotely. If a physical access is not necessary I might be able to work on it otherwise all is postponed to next month. Will keep you posted and once again thank you for your help and suggestions. Really imprudent to be more 4 since I am quite lost... Quote Link to comment
gideva Posted April 30, 2019 Author Share Posted April 30, 2019 1 hour ago, johnnie.black said: Yes Thnx Quote Link to comment
gideva Posted April 30, 2019 Author Share Posted April 30, 2019 1 hour ago, johnnie.black said: Yes I followed the instructions to repair the filesystem... On the Check Filesystem Status Menu I have CHECK -n. Shall I proceed like this? Sorry for the question but is the first time for me Quote Link to comment
gideva Posted April 30, 2019 Author Share Posted April 30, 2019 1 minute ago, johnnie.black said: remove -n Done... keeping finger crossed Quote Link to comment
gideva Posted April 30, 2019 Author Share Posted April 30, 2019 11 minutes ago, gideva said: Done... keeping finger crossed This is what I got: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. I need to be guided... I really want to solve the problem once forever and I do not want to make any mistake (hope you understand)... Quote Link to comment
gideva Posted April 30, 2019 Author Share Posted April 30, 2019 15 minutes ago, johnnie.black said: Use -L Done... What next? Shall I retry the CHECK with no -n? Quote Link to comment
JorgeB Posted April 30, 2019 Share Posted April 30, 2019 Start the array, disk should mount now. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.