jbartlett Posted September 9, 2019 Author Share Posted September 9, 2019 21 hours ago, electron286 said: 12:44:32 Scanning hard drives Lucee 5.2.9.31 Error (application) MessageError invoking external process Detail/usr/bin/lspci: option requires an argument -- 's' Can you create a debug file for me? Launch the app again and replace the URL after the port with the following when you get this error: http://[IP]:18888/isolated/CreateDebugInfo.cfm Select "Create debug file with controller info" This will create a zip file on your appdata share that contains log files from the app and a copy of the sys file tree of devices and email it to [email protected] or upload to a file share and send the link to it. Quote Link to comment
jbartlett Posted September 11, 2019 Author Share Posted September 11, 2019 On 9/8/2019 at 12:26 PM, electron286 said: (Note using as the SMART controller type: 3Ware 2 /dev/twa1) Model family: SAMSUNG SpinPoint F3 Device model: SAMSUNG HD502HJ Serial number: S27FJ9FZ404491 Please execute the following command on the server, does it return the Serial Number? smartctl -i /dev/twa1 Quote Link to comment
electron286 Posted September 16, 2019 Share Posted September 16, 2019 On 9/10/2019 at 8:02 PM, jbartlett said: Please execute the following command on the server, does it return the Serial Number? smartctl -i /dev/twa1 No, it gives an error so I played around till I got the flags set properly for my controller it was prompting in the error. The following commands do properly return the respective drive serial numbers; smartctl -i /dev/twa1 -d 3ware,1 smartctl -i /dev/twa1 -d 3ware,0 smartctl -i /dev/twa1 -d 3ware,2 smartctl -i /dev/twa0 -d 3ware,0 smartctl -i /dev/twa0 -d 3ware,1 Quote Link to comment
electron286 Posted September 17, 2019 Share Posted September 17, 2019 On 9/9/2019 at 11:18 AM, jbartlett said: Can you create a debug file for me? Launch the app again and replace the URL after the port with the following when you get this error: http://[IP]:18888/isolated/CreateDebugInfo.cfm Select "Create debug file with controller info" This will create a zip file on your appdata share that contains log files from the app and a copy of the sys file tree of devices and email it to [email protected] or upload to a file share and send the link to it. Thanks, I just sent you two e-mails, one for each server, they have different controllers. I included the debug files for each server. Quote Link to comment
jbartlett Posted September 17, 2019 Author Share Posted September 17, 2019 19 hours ago, electron286 said: I just sent you two e-mails, one for each server, they have different controllers. I included the debug files for each server. This looks like it's going to be something I can't easily add support for without access to the card. So I ordered an SAS 9207-8i - same as yours but with internal ports. It'll be delivered tomorrow, I'll be able to plug it into my backup server sometime later this week. 1 Quote Link to comment
jbartlett Posted September 19, 2019 Author Share Posted September 19, 2019 On 9/16/2019 at 4:11 PM, electron286 said: No, it gives an error so I played around till I got the flags set properly for my controller it was prompting in the error. The following commands do properly return the respective drive serial numbers; smartctl -i /dev/twa1 -d 3ware,1 smartctl -i /dev/twa1 -d 3ware,0 smartctl -i /dev/twa1 -d 3ware,2 smartctl -i /dev/twa0 -d 3ware,0 smartctl -i /dev/twa0 -d 3ware,1 For this system, can you tell me exactly how everything is set up? Card make & model, where the cables plug into, backplanes, etc. Quote Link to comment
wgstarks Posted September 25, 2019 Share Posted September 25, 2019 This is a continuation of a discussion in the tunables tester thread. Ran a speed test with the results shown below. Disk 1 is failing (soon to be replaced) and I couldn't get the test to run past 40% (this was the third try), but it doesn't look good to me. Disk 8 is the same model and to me it should have similar characteristics but doesn't. There is a difference of almost 25% at the start of the test and it increases rapidly up to the 40% mark where the test stalled. Opinions? Quote Link to comment
BRiT Posted September 25, 2019 Share Posted September 25, 2019 Does swapping cables from Disk 1 and Disk 8 impact the speed test on the valid disk 8 any? Quote Link to comment
jbartlett Posted September 25, 2019 Author Share Posted September 25, 2019 When you click on the Benchmark all drives button, uncheck "Check all drives" and only check "Disk 1". Set the checkbox for "Disable SpeedGap detection". SpeedGap is logic that checks the high & low speeds of each test iteration. If it's over a given threshold (starting at 45MB), it considers the drive having been accessed and retries with a slightly larger threshold - and it will repeat ad nauseum. 1 Quote Link to comment
wgstarks Posted September 25, 2019 Share Posted September 25, 2019 4 hours ago, jbartlett said: When you click on the Benchmark all drives button, uncheck "Check all drives" and only check "Disk 1". Set the checkbox for "Disable SpeedGap detection". SpeedGap is logic that checks the high & low speeds of each test iteration. If it's over a given threshold (starting at 45MB), it considers the drive having been accessed and retries with a slightly larger threshold - and it will repeat ad nauseum. This is as far as the test gets- Not sure what the number that I circled represents. The trailing number "8" will continue to progress upwards. It got up to "255" on the last test after running all night. Quote Link to comment
jbartlett Posted September 25, 2019 Author Share Posted September 25, 2019 6 hours ago, wgstarks said: Not sure what the number that I circled represents. The trailing number "8" will continue to progress upwards. It got up to "255" on the last test after running all night. I'll need to dig into the code but I suspect it's just a formatting error. Telnet into your server and enter the following command, verify that Drive 1 is still sdj dd if=/dev/sdj of=/dev/null bs=1M skip=3000000 iflag=direct status=progress This will start a disk read of your drive starting at around the 3TB mark where you got a successful read and try to read to the end of the drive with progress updates roughly every second and giving how much data was read in the last second. You have some kind of media error on the drive in the middle where the read speeds are likely extremely variable. If you see this reproduced, run a long SMART test on it. I would avoid putting any data on this drive. Quote Link to comment
wgstarks Posted September 25, 2019 Share Posted September 25, 2019 1 hour ago, jbartlett said: You have some kind of media error on the drive in the middle where the read speeds are likely extremely variable. If you see this reproduced, run a long SMART test on it. I would avoid putting any data on this drive. Could this have caused the slow speeds on the most recent data rebuild? I already know the drive is failing. Replacement is scheduled to deliver on Saturday and then a week or so of preclears and I’ll swap them out. Just trying to figure out if I need to reset my tunables to defaults before that to avoid another extended rebuild time. I will try the terminal command and see what I get. Quote Link to comment
jbartlett Posted September 25, 2019 Author Share Posted September 25, 2019 2 hours ago, wgstarks said: Could this have caused the slow speeds on the most recent data rebuild? I already know the drive is failing. Replacement is scheduled to deliver on Saturday and then a week or so of preclears and I’ll swap them out. Just trying to figure out if I need to reset my tunables to defaults before that to avoid another extended rebuild time. I will try the terminal command and see what I get. Most assuredly, it impacted it. It might be a small spot, it could be huge. One of my future projects of this DiskSpeed app is to do a surface scan and create a heat map of the read speeds. This would show the impacted area of it. I would move everything off of the drive that you can move off, ASAP. Once you have your data moved off and the drive out, I'd like to acquire the drive if you're willing to part with it. I've got some bad drives I could test against but yours seems to be an ideal drive to develop the heat map against. Quote Link to comment
wgstarks Posted September 25, 2019 Share Posted September 25, 2019 32 minutes ago, jbartlett said: Once you have your data moved off and the drive out, I'd like to acquire the drive if you're willing to part with it. I've got some bad drives I could test against but yours seems to be an ideal drive to develop the heat map against. Sent you a PM. Quote Link to comment
jbartlett Posted September 27, 2019 Author Share Posted September 27, 2019 On 9/25/2019 at 12:05 PM, wgstarks said: I will try the terminal command and see what I get. Did the dd command reveal anything? Quote Link to comment
wgstarks Posted September 27, 2019 Share Posted September 27, 2019 The command competed without any errors. IIRC, the final speed was 157 MB/s. I’m guessing that’s not an average though. As soon as I get the replacement precleared I’ll send this one to you. Guessing about a week for the preclears. Quote Link to comment
bobokun Posted September 28, 2019 Share Posted September 28, 2019 I'm hoping some of you guys might have some insight on what is going on. So I ran the DiskSpeed test multiple times on all my drives but constantly it showed my Parity and Disk4 was very slow. This caused my parity checks to double in length the past few months because it would slow down to 3.5MB/S for hours and then ramp back up to 150MB/S. I figured I would need to replace both the Parity and Disk4. I started to preclear a new 10TB drive that I was planning on replacing my parity with and move all the contents from disk4 to disk5 so I can shrink the array. After waiting a couple days to move everything from disk4 to disk5 (remember it fluctuates from 3.5MB/s to 50MB/s max.. even with turbo write on), I finally have Disk4 empty. Before I send it back for RMA I decide to do a diskSpeed test one last time...shockingly all my drives are now running at full speed and I'm not even sure if I need to replace my Parity or disk4 anymore...Does anyone know the reasons why this might be the case? I'm afraid that it could be because Disk4 is currently empty and once Disk4 is no longer empty it will start to slow down dramatically again. The differences I can only think of is (I ran DiskSpeed without any other docker containers running.) I thought this could be the reason so I started up all my docker containers and re-ran the tests. It still ran at full speed. Another change I did was change the tunables in Disk settings from Tunable (md_num_stripes): 1xxx to Tunable (md_num_stripes): 8192 Here are the results for Disk4 and Parity. Disk4: Parity: Quote Link to comment
jbartlett Posted September 30, 2019 Author Share Posted September 30, 2019 That ... that is weird. I've never seen anything like that. Even if every other drive was being pounded at the same time, you wouldn't see something like this. Your Tunable's won't affect it because it doesn't go through the unraid driver to access, it's a straight dd read using a block size given by the drive as being the optimal size. The drive might not be bad. The only thing I can think of is to try pinning some CPU's to the docker to see if it cleans it up if it happens again. I'm also assuming that you didn't have any CPU intensive tasks going on or no VM's taking all the CPU's away. Quote Link to comment
bobokun Posted September 30, 2019 Share Posted September 30, 2019 3 minutes ago, jbartlett said: That ... that is weird. I've never seen anything like that. Even if every other drive was being pounded at the same time, you wouldn't see something like this. Your Tunable's won't affect it because it doesn't go through the unraid driver to access, it's a straight dd read using a block size given by the drive as being the optimal size. The drive might not be bad. The only thing I can think of is to try pinning some CPU's to the docker to see if it cleans it up if it happens again. I'm also assuming that you didn't have any CPU intensive tasks going on or no VM's taking all the CPU's away. I don't think it's the DispSpeed utility that is incorrect because the past few months my parity checks have been extremely slow (fluctuating between 3.5MB/s to 50MB/s) and now when I try to do a parity check it runs at 150-200MB/s full speed with no fluctuation. Should I still replace the parity just in case? I think the read speeds are fine but write speeds still seem to be slow because when copying from cache drive to disk4 it is writing at 30MB/s. From my understanding DiskSpeed doens't test for write speeds only read. Quote Link to comment
jbartlett Posted September 30, 2019 Author Share Posted September 30, 2019 5 hours ago, bobokun said: I don't think it's the DispSpeed utility that is incorrect because the past few months my parity checks have been extremely slow (fluctuating between 3.5MB/s to 50MB/s) and now when I try to do a parity check it runs at 150-200MB/s full speed with no fluctuation. Should I still replace the parity just in case? I think the read speeds are fine but write speeds still seem to be slow because when copying from cache drive to disk4 it is writing at 30MB/s. From my understanding DiskSpeed doens't test for write speeds only read. That's correct, my utility only does non-destructive reads. If you want to identify if Disk 4 has issues and remove the Parity drive from the equation, carefully recreate your array without Disk 4. The parity rebuild speeds will tell you if the Parity drive is the issue. If it looks good, mount the old Drive 4 using the UD plugin and copy files to it to see if you can duplicate the slow writes. While the Parity is rebuilding, you can also kick off a long SMART test on the old Drive 4. If everything still looks good, take the array offline and kick off a long SMART test on the parity drive and all of the others for a shiz-n-grins sanity check. Quote Link to comment
bobokun Posted September 30, 2019 Share Posted September 30, 2019 (edited) 13 hours ago, jbartlett said: That's correct, my utility only does non-destructive reads. If you want to identify if Disk 4 has issues and remove the Parity drive from the equation, carefully recreate your array without Disk 4. The parity rebuild speeds will tell you if the Parity drive is the issue. If it looks good, mount the old Drive 4 using the UD plugin and copy files to it to see if you can duplicate the slow writes. While the Parity is rebuilding, you can also kick off a long SMART test on the old Drive 4. If everything still looks good, take the array offline and kick off a long SMART test on the parity drive and all of the others for a shiz-n-grins sanity check. I did as you suggested and I created a new config, removed both Disk4 and the old Parity drive to rule out all bad options. I put the new parity drive in that has successfully passed preclear at 180-200MB/S. Now when I start the array and it is rebuilding the parity it's extremely slow! I don't think it's the drives anymore because the new parity drive was precleared super fast. Do you think it could be my SAS HBA card? I have the Dell PERC H310. If it's not that then it might be the miniSAS to sata cables that I'm using. Either way now I'm at a loss once again what I should do. I purchased some new miniSAS-Sata cables and hopefully that helps. EDIT: I powered off my machine and decided to use just normal SATA cables for all 5 drives instead of using the HBA card. So now all my drives are plugged in using SATA cables to my motherboard. Parity drive starts rebuilding started off slow at 19MB/s-30MB/s but after couple minutes it ramped up to 126MB/s. I just realized my HBA card is plugged into a PCIE 3.0 X4 (In x8) slot. I'm not sure if I need to plug it into a PCIE 3.0 x8 slot would that help? Edited September 30, 2019 by bobokun Quote Link to comment
jbartlett Posted October 1, 2019 Author Share Posted October 1, 2019 You shouldn't be bottlenecking at 5 drives. My x4 card started to bottleneck at 7 drives at those same read speeds. The DiskSpeed docker has a controller benchmark test - click on the i icon next to it and benchmark it. It will read all drives at once to get their single active speeds and then all at the same time. If it's total shit at the same time, your controller might be failing. It can't hurt to try it in an x8 slot. But if things are behaving connected to your motherboard but not your HBA card, then the card is quite likely the issue. Quote Link to comment
patchrules2000 Posted October 3, 2019 Share Posted October 3, 2019 Hi all, hopefully you will be able to help. Just upgrade my unraid rig from two sas controllers to a single 24 port controller. I am trying to complete a controller benchmark but it keeps failing about 2/3 of the way through with a timeout error. I suspect it is because of a software constraint set by Diskspeed for the controller test as all drives pass testing with good speed and no other errors when run individualy for each drive. It fails around drive 16 of 24 every run through. Is there a config value i could change to increase this timeout limit (if that is the actual problem) to aproximately double to enable a full run through of the 24 drives. Thanks for your help :) Quote Link to comment
doron Posted October 8, 2019 Share Posted October 8, 2019 On 9/17/2019 at 10:48 PM, jbartlett said: This looks like it's going to be something I can't easily add support for without access to the card. So I ordered an SAS 9207-8i - same as yours but with internal ports. It'll be delivered tomorrow, I'll be able to plug it into my backup server sometime later this week. Hi @jbartlett- has there been progress on this issue? I'm seeing the same issue ("usr/bin/lspci: option requires an argument -- 's') so I thought I'd ask. Thanks! Quote Link to comment
jbartlett Posted October 9, 2019 Author Share Posted October 9, 2019 (edited) On 10/2/2019 at 5:06 PM, patchrules2000 said: Hi all, hopefully you will be able to help. Just upgrade my unraid rig from two sas controllers to a single 24 port controller. I am trying to complete a controller benchmark but it keeps failing about 2/3 of the way through with a timeout error. I suspect it is because of a software constraint set by Diskspeed for the controller test as all drives pass testing with good speed and no other errors when run individualy for each drive. It fails around drive 16 of 24 every run through. Is there a config value i could change to increase this timeout limit (if that is the actual problem) to aproximately double to enable a full run through of the 24 drives. I didn't know there was a 24 port controller. I have the timeout set to five minutes to try to nip any rogue processes in the bud but at 24 drives testing for 15 seconds each, it would take at least six minutes to do a complete pass. I've updated the default timeout to ten minutes. I just pushed version 2.3 - please update the Docker and try again. Edited October 9, 2019 by jbartlett Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.