Two ssds drives disappears and are now 'Not installed"

Keint · August 18, 2022

Hello

I recently has two problems with two ssds, I lost the first one few days ago, but I had a time to copy my files to another disque, after that I deleted the ssd of my array and make a new config without this one.

After that, another ssd crashed, I wasn't able to go inside the folders, instead of that I had: NO LISTING: TOO MANY FILES.

So I tried to repair the disk using the method: Checking and fixing drives in the webGui.
I tried but unfortunately without success, I wasn't able to run: -nv or any others commands, nothing happen after click on "check"

So I switch from maintenance mode to normal and from this action the disk 9 is now mentioned like: "Not installed" ...

I m a bit lost 😅 I m of course afraid to lost all the data from the disk 9...

Please kindly find attached my diag files...

Thanks for your time

tower-diagnostics-20220819-0100.zip

trurl · August 19, 2022

SSDs in the array can only be written at parity speed, and cannot be trimmed.

No SMART report for disk9. Reseat controller, check connections, both ends, SATA and power, including splitters.

Reboot and post new diagnostics

Keint · August 29, 2022

thanks for your answer, I m just back in my home and have finally access to my Unraid.

I disconnect the ssd and connect and now I ve got:

Unmountable: not mounted

Unraid propose me to format the drive ...

Is it safe to format it ? My two parity disk look good.

please kindly find attached the diag.

Thank You !

tower-diagnostics-20220829-1048.zip

JorgeB · August 29, 2022

Diags only show logged infor until August 14th, reboot and post new diags after array start.

Keint · August 29, 2022

Thanks for your fast answer !
Here we go, I reboot and diag again

tower-diagnostics-20220829-1150.zip

JorgeB · August 29, 2022

You should not have started a rebuild, anyway the SSD assigned as disk9 dropped offline, stop the array, unassign disk9, check filesystem on the emulated disk9.

trurl · August 29, 2022

4 hours ago, Keint said:

Unraid propose me to format the drive ...

Is it safe to format it ?

Just thought I would answer this question directly.

NEVER format a disk that has data you want to keep. When you format a disk in the array, parity is updated, and so rebuild can only result in a formatted disk. The correct way to deal with unmountable is with check filesystem, as already suggested.

Keint · September 9, 2022

Hello,

I m just back in town, I check again de ssds, I have a new crash again from another disque, totally 4 disks crashed, I don t know what to do, I lost 4 TB of data

I tried to repair the disk using the method: Checking and fixing drives in the webGui. Nothing happen, the drives are still ''Unmountable: not mounted''

I start to be desperate !

I can t access to the data on the failing drivers ... Is there any way to see what is on the disk and copy like an external hard drive on Mac ?

I switched the data cable, nothing change ... Maybe the sata pci card? on DISK SPEED I ve got this error:

DiskSpeed - Disk Diagnostics & Reporting tool
Version: 2.9.4

Scanning Hardware
09:16:27 Spinning up hard drives
09:16:27 Scanning system storage
09:16:28 Scanning USB Bus
09:16:37 Scanning hard drives
09:16:40 Scanning storage controllers
09:16:41 Scanning USB hubs & devices
09:16:42 Scanning motherboard resources
09:16:42 Fetching known drive vendors from the Hard Drive Database
09:16:43 Found controller SAS2308 PCI-Express Fusion-MPT SAS-2
09:16:43 Found drive Micron Micron_5210_MTFDDAK7T6QDE Rev: D2MU805 Serial: 20212847847F (sdh), 1 partition
09:16:43 Found drive Micron Micron_5300_MTFDDAK1T9TDT Rev: D3MU001 Serial: 20292935A463 (sdi), 1 partition

Lucee 5.2.9.31 Error (expression)

Messageinvalid call of the function listGetAt, second Argument (posNumber) is invalid, invalid string list index [2]

patternlistgetat(list:string, position:number, [delimiters:string, [includeEmptyFields:boolean]]):string

StacktraceThe Error Occurred in
/var/www/ScanControllers.cfm: line 1733

1731: <CFSET NR=i-2>
1732: <CFSET Part.Partitions[NR].PartNo=ListGetAt(CurrLine,1,":",true)>
1733: <CFSET Part.Partitions[NR].Start=Val(ListGetAt(CurrLine,2,":",true))>
1734: <CFSET Part.Partitions[NR].End=Val(ListGetAt(CurrLine,3,":",true))>
1735: <CFSET Part.Partitions[NR].Size=Val(ListGetAt(CurrLine,4,":",true))>

called from /var/www/ScanControllers.cfm: line 1643

1641: </CFIF>
1642: </CFLOOP>
1643: </CFLOOP>
1644:
1645:

Java Stacktracelucee.runtime.exp.FunctionException: invalid call of the function listGetAt, second Argument (posNumber) is invalid, invalid string list index [2]
  at lucee.runtime.functions.list.ListGetAt.call(ListGetAt.java:46)
  at scancontrollers_cfm$cf.call_000163(/ScanControllers.cfm:1733)
  at scancontrollers_cfm$cf.call(/ScanControllers.cfm:1643)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:933)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:823)
  at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:66)
  at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45)
  at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2464)
  at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2454)
  at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2427)
  at lucee.runtime.engine.Request.exe(Request.java:44)
  at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1090)
  at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1038)
  at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102)
  at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
  at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:684)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
  at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2527)
  at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2516)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Thread.java:748)

Timestamp9/9/22 9:16:43 AM CEST

Thanks !!!!

Cheers!

tower-diagnostics-20220909-0907.zip

Edited September 9, 2022 by Keint

JorgeB · September 9, 2022

Diags are after rebooting so we cannot see what happened, for now check filesystem on disks 8 and 10, don't format anything.

trurl · September 9, 2022

2 hours ago, JorgeB said:

check filesystem on disks 8 and 10

Be sure to capture the output so you can post it.

Keint · September 10, 2022

Thanks for your help

disk 8

Phase 1 - find and verify superblock... - block cache size set to 3057592 entries sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 would reset superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 would reset superblock realtime summary inode pointer to 130

Phase 2 - using internal log - zero log... zero_log: head block 38204 tail block 38200 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_icount 0, counted 6016 sb_ifree 0, counted 180 sb_fdblocks 976277431, counted 156280199 - found root inode chunk

Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes...

Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 2 - agno = 1 No modify flag set, skipping

phase 5

Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Sat Sep 10 19:10:43 2022 Phase Start End Duration Phase 1: 09/10 19:10:43 09/10 19:10:43 Phase 2: 09/10 19:10:43 09/10 19:10:43 Phase 3: 09/10 19:10:43 09/10 19:10:43 Phase 4: 09/10 19:10:43 09/10 19:10:43 Phase 5: Skipped Phase 6: 09/10 19:10:43 09/10 19:10:43 Phase 7: 09/10 19:10:43 09/10 19:10:43 Total run time:

disk 10

Phase 1 - find and verify superblock... - block cache size set to 3073088 entries sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 would reset superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 would reset superblock realtime summary inode pointer to 130

Phase 2 - using internal log - zero log... zero_log: head block 20168 tail block 20164 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_icount 0, counted 64 sb_ifree 0, counted 59 sb_fdblocks 468614399, counted 468614391 - found root inode chunk

Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes...

Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 No modify flag set, skipping

phase 5

Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Sat Sep 10 19:13:43 2022 Phase Start End Duration Phase 1: 09/10 19:13:43 09/10 19:13:43 Phase 2: 09/10 19:13:43 09/10 19:13:43 Phase 3: 09/10 19:13:43 09/10 19:13:43 Phase 4: 09/10 19:13:43 09/10 19:13:43 Phase 5: Skipped Phase 6: 09/10 19:13:43 09/10 19:13:43 Phase 7: 09/10 19:13:43 09/10 19:13:43 Total run time:

There is a data rebuilding on the disks

I can now see disk 8 and 10 datas which look ok to copy to another disk 😅

First log is in maintenance mode

seconde diag is aftter switch in normal mode

tower-diagnostics-20220910-1911-2.zip tower-diagnostics-20220910-1916.zip

JorgeB · September 11, 2022

Looking OK.

Keint · September 13, 2022

On 9/11/2022 at 10:57 AM, JorgeB said:

Looking OK.

Yep Everything is back to normal the SSDs I used had a problem ERRORNOD of samsung pm863a I removed them rebuilt the parity and now all is working well !

Thanks again for all your precious help !

Cheers !

Two ssds drives disappears and are now 'Not installed"

Recommended Posts

Keint

Link to comment

trurl

Link to comment

Keint

Link to comment

JorgeB

Link to comment

Keint

Link to comment

JorgeB

Link to comment

trurl

Link to comment

Keint

Link to comment

JorgeB

Link to comment

trurl

Link to comment

Keint

Link to comment

JorgeB

Link to comment

Keint

Link to comment

Join the conversation