Chanchalanch Posted May 14 Share Posted May 14 Hello all! I have had Unraid for a couple of years now and finally migrated to new hardware. I used to use a Dell C2100 with 12 hard drive bays and 2 internal SSD bays. I moved to a SuperMicro X8DT6 with 24 bays and 4 internal SSD connections. I watched a few @SpaceInvaderOne video's that have saved me more than once! I didn't however do my migration perfectly... I moved my drives over into the new server (with an additional 12 drives), plugged my USB stick in and let Unraid do the rest. Ever since then I've had issues with speed or just general weirdness. I am attaching my diagnostics in the hopes that someone can take a look and see some smoking gun items... I will ask to be gentle with my feeble mind. I am sure there are plenty of issues that maybe someone else would have seen and fixed or even never had in the first place... Two biggest issues I have are: 1. Either parity checking or a data rebuild after replacing a disk. I'm seeing speeds anywhere between 15MB/s to 28MB/s. I used to get an average of ~70MB/s 2. If I go to /mnt/user/Media/TV and try to see the contents of the directory, There's nothing... I SSH into the server, navigate to that folder and try and list out the directory and I see /bin/ls: reading directory '.': Structure needs cleaning I have done some research and I came across another post that suggested using the xfs_repair command. I wasn't sure which drive the issue was originating from so I ran the following command. for disk in /mnt/disk*; do echo "Checking $disk"; find "$disk/Media/TV" -type d -exec ls -ld {} \; ; done |grep clean I got several returns showing find: ‘/mnt/disk11/Media/TV/Star Trek The Next Generation’: Structure needs cleaning find: ‘/mnt/disk11/Media/TV/Seinfeld/Season 3’: Structure needs cleaning find: ‘/mnt/disk11/Media/TV/Modern.Family.S01.1080p.BluRay.x264-TENEIGHTY’: Structure needs cleaning find: ‘/mnt/disk11/Media/TV/Lost/Lost.S03’: Structure needs cleaning find: ‘/mnt/disk11/Media/TV/The Office (US)/Featurettes’: Structure needs cleaning find: ‘/mnt/disk11/Media/TV/The Office (US)/Season 6’: Structure needs cleaning I figured disk11 was the culprit. I tried to run xfs_repair on disk 11 and I got disk11: No such file or directory disk11: No such file or directory fatal error -- couldn't initialize XFS library I saw that Disk11 was showing as failed in the UnRAID gui so I removed it and am rebuilding currently. ( I have also added the Smart report from my old Disk11 just in case it's useful in some way) Any help provided would make me beyond grateful! chantry-server2-diagnostics-20240514-1722.zip chantry-server2-smart-Old Disk11.zip Quote Link to comment
Solution JorgeB Posted May 15 Solution Share Posted May 15 Check filesystem on disk11, run it without -n, then reboot to clear the logs and post new diags during a parity check. 1 Quote Link to comment
Chanchalanch Posted May 16 Author Share Posted May 16 (edited) On 5/15/2024 at 2:16 AM, JorgeB said: Check filesystem on disk11, run it without -n, then reboot to clear the logs and post new diags during a parity check. Thank you for your response! Working on that now. Had to wait for data rebuild to finish. Just to be clear, I am running "xfs_repair sdk" correct? (sdk is disk 11) The command seems to be doing SOMETHING as now I am seeing the following (attached video) xfs_repair.mp4 Edited May 16 by Chanchalanch Quote Link to comment
itimpi Posted May 16 Share Posted May 16 7 minutes ago, Chanchalanch said: Just to be clear, I am running "xfs_repair sdk" correct? (sdk is disk 11) No - that is the wrong device name. Even if it was the correct device name then running it that way would invalidate parity. Instead run it via the GUI (by clicking on the device on the Main tab) and Unraid will automatically use the correct device name and parity will be maintained. 1 Quote Link to comment
Chanchalanch Posted May 16 Author Share Posted May 16 @itimpi Oops. I ran that command... How screwed am I? The output started with Phase 1 - find and verify superblock... bad primary superblock - bad magic number !!! attempting to find secondary superblock... .found candidate secondary superblock... unable to verify superblock, continuing... .found candidate secondary superblock... unable to verify superblock, continuing... Then ended after hours with Sorry, could not find valid secondary superblock @JorgeB I did run the command through the GUI AFTER I made my mistake and the output is attached as a txt file. I also have attached diagnostics I got after a restart, while running a parity check. Thanks for all your help! zfs_repair_GUI_output chantry-server2-diagnostics-20240516-1605.zip Quote Link to comment
itimpi Posted May 17 Share Posted May 17 Running with the wrong device name would almost certainly have done nothing as if no valid superblock can be found the repair process will not go any further. Looking at the results of running from the GUI looks like it found a significant number of errors and corrected them. Any files/folders for which the correct name could not be determined will have been put into a lost+found folder on the drive with cryptic numeric names. Sorting these out is a manual process although the linux 'file' command can be used to at least determine the likely content type for each file. However it is frequently much easier to recover the files from backups (if you have them). 1 Quote Link to comment
JorgeB Posted May 17 Share Posted May 17 See how it looks now, also look for a lost+found folder, sometimes the repair output looks bad but the fs is still mostly OK. 1 Quote Link to comment
Chanchalanch Posted May 17 Author Share Posted May 17 Thank you both! I appreciate the help! It looks as if I am able to see my files again! Last and only thing, Any thoughts about why parity or data rebuild is running so slowly? Thanks again! Quote Link to comment
JorgeB Posted May 17 Share Posted May 17 There were some writes going on to a disk, but looked small, post new diags. Quote Link to comment
Chanchalanch Posted May 19 Author Share Posted May 19 @JorgeB Sorry, was away for the weekend... Here's the new diagnostics. chantry-server2-diagnostics-20240519-1127.zip Quote Link to comment
JorgeB Posted May 20 Share Posted May 20 Post the controller test results of the diskspeed docker test. Quote Link to comment
Chanchalanch Posted May 20 Author Share Posted May 20 @JorgeB Not sure if this is what you were looking for but I opened the WebUI for DiskSpeed and here's what was output Scanning Hardware 08:58:19 Spinning up hard drives 08:58:19 Scanning system storage 08:58:21 Scanning USB Bus 08:58:21 Scanning hard drives 08:58:28 Scanning storage controllers 08:58:29 Scanning USB hubs & devices 08:58:31 Scanning motherboard resources 08:58:32 Fetching known drive vendors from the Hard Drive Database Lucee 6.0.1.83 Error (expression) Message Can't cast Complex Object Type Struct to String Detail Use Built-In-Function "serialize(Struct):String" to create a String from Struct Stacktrace The Error Occurred in /var/www/ScanControllers.cfm: line 1851 1849: } else { 1850: if (ListFindNoCase("ADATA,Apacer,Corsair,KINGSTON,Netac,Patriot,PNY,Reletech,Sabrent,Seagate,XPG,ZHITAI",ListFirst(DriveData[i].Model," "))) DriveData.Vendor=ListFirst(DriveData[i].Model," "); 1851: if (ListFindNoCase("Western Digital",ListFirst(DriveData[i].Model," ") & " " & ListGetAt(DriveData,2," "))) DriveData.Vendor=ListFirst(DriveData[i].Model," ") & " " & ListGetAt(DriveData,2," "); 1852: if (ListFirst(DriveData[i].Model," ") EQ "WD") DriveData[i].Vendor="Western Digital"; 1853: if (ListFirst(DriveData[i].Model," ") EQ "TEAM") DriveData[i].Vendor="Team Group"; called from /var/www/ScanControllers.cfm: line 1819 1817: if (Left(DriveData[i].Model,9) EQ "GIGABYTE ") {DriveData[i].Vendor="GIGABYTE";DriveData[i].Model=Mid(DriveData[i].Model,10,999);} 1818: } 1819: if (Left(DriveData[i].Vendor,9) EQ "GIGABYTE ") {DriveData[i].Vendor="GIGABYTE";DriveData[i].Model=Mid(DriveData[i].Model,10,999);} 1820: if (Left(DriveData[i].Vendor,8) EQ "Sabrent ") {DriveData[i].Vendor="Sabrent";DriveData[i].Model=Mid(DriveData[i].Model,9,999);} 1821: called from /var/www/ScanControllers.cfm: line 1633 1631: <cfdump var=#cfhttp#> 1632: </CFIF> 1633: </CFIF> 1634: <!--- Match the vendor case to what we have in the database ---> 1635: <CFLOOP index="Key" list="#StructKeyList(HW)#"> Java Stacktrace lucee.runtime.exp.ExpressionException: Can't cast Complex Object Type Struct to String at lucee.runtime.type.util.StructSupport.castToString(StructSupport.java:204) at lucee.runtime.op.Caster.toString(Caster.java:2105) at scancontrollers_cfm$cf.call_000152_000153(/ScanControllers.cfm:1851) at scancontrollers_cfm$cf.call_000152(/ScanControllers.cfm:1819) at scancontrollers_cfm$cf.call(/ScanControllers.cfm:1633) at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:1059) at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:951) at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:65) at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45) at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2715) at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2701) at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2672) at lucee.runtime.engine.Request.exe(Request.java:45) at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1259) at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1205) at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:97) at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51) at javax.servlet.http.HttpServlet.service(HttpServlet.java:623) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:200) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:169) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:168) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:481) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:130) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:670) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:761) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:346) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:390) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:928) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1786) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) at java.base/java.lang.Thread.run(Unknown Source) Timestamp 5/20/24 8:58:32 AM MDT Quote Link to comment
JorgeB Posted May 20 Share Posted May 20 Post of screenshot of the results, it will show a graph with the speeds, like this: Quote Link to comment
Chanchalanch Posted May 20 Author Share Posted May 20 it looks like for whatever reason (not sure yet) DiskSpeed is having issues on my system. When I go to the WebUI I see what I posted before which contains a stacktrace. I looked in the logs and found 20-May-2024 09:52:04.841 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in [4433] milliseconds WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by lucee.commons.lang.ClassUtil (jar:/opt/lucee/server/lucee-server/patches/6.0.1.83.lco) to constructor com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl() WARNING: Please consider reporting this to the maintainers of lucee.commons.lang.ClassUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release I've completely removed the docker and used the "cleanup Appdata" plugin to remove the config files, then reinstalled and I am still getting the same results. I can look further into this and see what's going on, but if you DO have any ideas... 😄 Thanks again! Quote Link to comment
JorgeB Posted May 20 Share Posted May 20 Probably best to ask in the existing container support thread: 1 Quote Link to comment
Chanchalanch Posted May 21 Author Share Posted May 21 On 5/20/2024 at 9:16 AM, JorgeB said: Post of screenshot of the results, it will show a graph with the speeds, like this: Well I got it running with some help from @jbartlett. I ran the benchmark and ended up with the following. It looks like Disk 15 (sdi) was the issue. I replaced it with another drive I had and the data rebuild is currentlyrunning at ~90MB/sec! Thanks for your help! 1 Quote Link to comment
jbartlett Posted May 22 Share Posted May 22 (edited) I'd keep an eye out on that green one with the dip at the end, especially if it's the same model as the others. If another benchmark on that drive shows the same speeds, you will be limited by that drive like you were by sdi. Parity will follow the speeds of the lowest/slowest drive. Keep in mind that you'll also want to do a controller benchmark (click the i icon to the right of the controller label). It doesn't matter how fast your drives are, if they can serve up data faster than your controller can handle, it won't be able to keep up with all drives active. The screenshot by JorgeB is an example of this. Note that this is really only a hindrance during parity or other highly data intensive operations, you'll likely never run into this issue otherwise. Edited May 22 by jbartlett 1 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.