bsim

Members
  • Posts

    191
  • Joined

  • Last visited

Everything posted by bsim

  1. When i pulled the smart stats from drive 3 on another system, it showed some bad numbers, but still said it passed, I tried Crystal (attached and it showed at least some warning)...The drive did make a strange noise, so I'm leaning towards a creepy bad drive. I have a fresh 5TB drive that's been cleared that i'm going to swap in and do a rebuild. I would lean away from the controller because of being a high end controller and having cables that should show some sort of symptoms on other drives on that port. Initially, the drop offline was caused during a parity check of a system that hasn't been touched for months through multiple parity checks. The controller took two drives offline for some reason, but there seemed to be a setting in the controller for keeping the drives online even through minor errors (read errors, no write errors)...I'll see if the rebuild works. EDIT: I also just realized that the new drive will be in a different backplane/controller port anyway because I used the server to do the preclear during the attempted rebuilds...I'll have to wait to put the next 5TB drive in on the same port to see if the problem reoccurs. (It's a Supermicro 24 bay with an Areco 1280ML 24 Port 2GB)
  2. Thanks...just freaked me out a bit with total size showing less than the actual total size of the entire array! So, I started the rebuild, looked like it was going great...until I noticed one drive was building fine, then the second drive seemed to be showing an issue of a few writes too many. I stopped the rebuild, pulled out the drive, ran a smart check and a short smart check on it with success for both...Any idea why this would happen? Is this a possible cabling error that would show up after several months of working fine? The controller uses SFF-8087's to put multiple drives on one port, so I would probably lean away from the controller having issues. I am preclearing a drive just in case right now. Should this be my next direction? Does this diagnostics tell why? unraid-diagnostics-20170829-1338.zip
  3. I'm hoping this is a bug, but I am running the latest version of unraid pro 6.3.5...had two drives drop from the array (slot 1=4TB and slot 3=3TB) because of a controller burp (5TB, dual parity array, running for months), stopped array, unassigned both slots, started array, stopped array, assigned same drives to same slots, started array...and it showed it was going to rebuild 4TB (immediately stopped array rebuild)! Right now, I'm running with both drives emulated...What the **** is happening? Can someone of extreme authority tell me that this is just an unraid bug saying 4TB because drive one is 4TB and not 5TB like both parity drives?
  4. I got it! AAAAAAHHHHHH....Now I just have to put correlate 24 drives with what I figured out before sdb=1, sdc=2,sdd=3....and enter for each drive! UG. Would there be a regular expression that would allow me to correlate the drive letters to their controller slots? I'm terrible at regular expressions, but having used them many times, it seems that the possibility would be there for a jedi master of regular expressions to help users. Essentially the default under Settings, disk settings, global smart settings being set to areca are worthless as far as I can tell, as they don't give you a way to allow unraid correlate the numbers/drive letters for you to save some time. Perhaps the default global setting is what the main page temperatures use to pull smart data and why the main temperatures still don't function correctly, but at least now each disk pulls the correct smart data. Why doesn't the main page pull the temperature using each disks smart pulling setting? Is there a fix or a place to play around for a fix? As a side note, not sure if it's because I am currently doing a parity check or not, but "Unavailable - disk must be spun up" seems to still be keeping all of the buttons in the "Self-test" section still grey'ed out...I have several hours until I can check with the array offline or at least not in parity check cycle. I've executed the short smart test from the command line using the areca option and it seems that the test is reported as being "Interrupted (host reset)" after pulling the report a while later...Perhaps because they are mounted and in use? I'll see if I can get more when the parity ends (Just under 8 hours yet).
  5. I used Disk 5 (Normal Operation) as the example disk...All disks are on the one 24 port areca controller.
  6. Tried switching to the Areca setting...no changes under disks...It seems that it is saying drives need to be spun up, in which case, mine never spin down anyway. Pressing spin up just refreshes the page and sticks with the " Unavailable - disk must be spun up " message. Even downloading the report gives me an error message about being spun up. All of my testing buttons are greyed out...do i need to bring down my array or reboot my server to enable the Areca setting for smart?
  7. I saw that...but after changing it to areca, the temps didnt change on the main page, and still none of the attributes were enumerated under each disk smart stats. is a restart required to update them?
  8. Something that I have also noticed is that 99% of the smart data in the unraid webgui is garbage...I've researched a bit and found that unraid hasn't implemented controller manufacturer specific smart requests. To check from the command line... It requires an initial lsscsi -g|grep "Areca" to get the controller location (/dev/sg25), then smartctl -a -d areca,1 /dev/sg25 to get the information for the "sdb" specific drive (the single digit drive location corresponds to the drive letter for each disk...sdb=1, sdc=2, sdd=3...) I submitted a feature request hoping that something better would be implemented soon, as it's been requested a few times since before 6.2 and doesn't seem like it's made it to the final product. No response yet. So my command lines for the Areca... DRIVE INFORMATION - smartctl -a -d areca,1 /dev/sg25 DRIVE SHORT TEST - smartctl -t short -d areca,1 /dev/sg25 DRIVE LONG TEST - smartctl -t long -d areca,1 /dev/sg25 SHOW TEST RESULTS - smartctl -l selftest -d areca,1 /dev/sg25
  9. I am running Unraid 6.3.5 and I've seen this idea floating around on the forum connecting the newer functions of smartctl that allow polling smart data of drives from behind different controllers. For instance, I have an Areca 1280, and all of my smart data through the interface is still really choppy and mostly garbage. If I want to check actual smartctl data, it requires an initial lsscsi -g|grep "Areca" to get the controller location (/dev/sg25), then smartctl -a -d areca,1 /dev/sg25 to get the information for the "sdb" specific drive (the single digit drive location corresponds to the drive letter for each disk...sdb=1, sdc=2, sdd=3...) (Another issue specifically for the Areca was to apply the disk identification fix created by bubbaQ.) Has this been considered or is there a work around for it yet?
  10. That hiccup is kinda what I was leaning towards, being that there hasn't been any issues with drives in previous parity checks, and then all of a sudden burping on 2 drives with only read errors acknowledged by the controller. Looking at the smart status on the controller itself and the smart status on unraid's dashboard off hand, no issues show glaring (all green). I noticed that offline drives need the array to be stopped to run the smart tests from the dashboard (smart tests greyed out), but do I have to shut down the array to execute on disable drives from the command line? Just to make sure...these the best smart command lines on newer hardware? DRIVE INFORMATION - smartctl -a /dev/sdx DRIVE SHORT TEST - smartctl -t short /dev/sdx DRIVE LONG TEST - smartctl -t long /dev/sdx SHOW TEST RESULTS - smartctl -l selftest /dev/sdx
  11. I realized as I was shutting down to check, that I should have saved the diag information...It doesn't look like anything was dumped to flash which sucks (would be a great unraid feature to dump snapshots on errors by date/time). I do remember though, that before the shutdown, there were no sync errors...With being on a stable setup for a few months, I understood this as the drives just temporarily had an issue or the controller card farted. In the Areca web interface, I did find a " Auto Activate Incomplete Raid " that could have been an issue. From what I have researched, when the read errors occured, the controller may have taken the raid set (jbod=single disk) offline causing unraid to scram. After I brought the system up with the changed areca setting, the two drives don't show as failed in the areca interface any longer. I'm running a non-parity write check on the array to verify no other drives red ball, but my next step I'm thinking I will try an extended smart test on the "bad" drives, and then possibly rebuild the drives to verify that nothing in parity is screwed up. I did order replacement 5TB drives that will take a few days to arrive, so assuming no other drives red ball, I may just slap in the new two drives and go with them...Just hate the time lag when on the edge of data loss...Day and a half for parity check, day and a half (at least) to preclear drives, day and a half rebuild Areca Log...I'm thinking the failed was the controller setting after the multiple read errors. 2017-08-24 16:44:37 IDE Channel 13 Device Failed 2017-08-24 16:44:37 Raid Set # 12 RaidSet Degraded 2017-08-24 16:44:37 ST4000DM000-1F21 Volume Failed 2017-08-24 11:00:43 IDE Channel 13 Reading Error 2017-08-24 11:00:34 IDE Channel 13 Reading Error 2017-08-24 11:00:26 IDE Channel 13 Reading Error 2017-08-24 11:00:17 IDE Channel 13 Reading Error 2017-08-24 11:00:08 IDE Channel 13 Reading Error 2017-08-24 11:00:00 IDE Channel 13 Reading Error
  12. Racking my brains on this...I have have an Areca ARC-1280 (latest firmware) (on Unraid 6.3.5) that has been working beautifully...up until now (for 3 or 4 months). In my last parity check I had two drives red x ("device is disabled, contents emulated") on me (I have dual parity setup, so both running as emulated). When looking at the areca controller logs, it only shows that the drives had a bunch of "Reading Error"s and no write errors. I've seen that there are a few peculiar issues with the Areca cards. Is this a controller timeout issue or something? There were 0 sync errors reported throughout the parity check. What should I do? I first shut down the server, verified that the drives were seated correctly, reseated the sata cables. Right now I'm attempting to do a check with no parity corrections to see if anything else comes up funky. I have ran a few parity checks in the past (runs automatically every month) with the same setup, and no problems. If the drives only show a read error on the drives can the drives still be trusted by unraid to work correctly? What happens if a third drive drops with only read errors from the controller? The best solution I've found that seems risky for just having read errors reported Stop the array unassign both disks start the array stop the array assign the disks again start the array and allow the parity rebuild if rebuild successful, reverify by running non-correcting parity check Is this truely a drive issue or is it just a controller nuance/setting?
  13. Never mind folks...it did have to do with the motherboard settings for the processor/memory... Choosing Optimal settings will enable two settings that will cause unraid to crash... Specifically, these two settings are pertinent...it is possibly just one of these two...a bit more research is necessary to differentiate. "C State Mode" BAD = "C6" GOOD = "Disabled" "CPB Mode" BAD = "Auto" GOOD = "Disabled" Anyone know what they do and if they are valuable?
  14. Ahhh, the thrills of migration! Sooo...Swapped in a new motherboard (h8dgi-f) and using and Areco 1280ML 24 port controller card for 24 drives using the latest unraid version. At first I couldn't figure out why there was a problem booting from a known bootable unraid usb thumbdrive...then I determined that my problem was probably an "int13" issue with the card and the motherboard that ultimately disables booting from a USB drive (guessing a bug). Areco with 12 drives, unraid boots....Areco with 16 drives, usb unbootable...no areco, unraid bootable... Sooo...decided to try a small ssd off the motherboard directly to kick into a PLOP install that would kick off the USB unraid boot (still 24 drives on areco)...unraid starts booting, gets through to the "initfs" stage, and causes my system to reboot...as far as I can see, i don't see any logs left behind to give me a clue as to what is causing the issue. Anyone know my next step? Does Unraid have a problem with me using plop to kick the unraid booting usb? Is there any type of incompatibility issues here?
  15. Excellent! Started array, looks ok... Thanks! So I guess this means that unraid does not use the identification string but the actual drive serial number when it looks at drives?
  16. Thanks guys, great info! Never having a redball myself and having dual parity, I feel much more comfy with my data. So just to verify, a redball or drive error during normal read/write or parity check will never be caused by Smart info and just caused by actual read/write on the array (on either parity check or normal read/writes)? Does unraid ever try to use Smart data to infer or extrapolate how to use the drives (write to less risky drives first)?
  17. Was using 3xSAT2-MV8's and an Areca ARC-1280 24 port for my array. (NOTE: the Areca did need the drive identification fix that I found on this site to make the drive identifications look normal.) I upgraded to the latest unraid version (6.3.5) and on reboot came across a drive/cable/SAT2-MV8's that was somehow crashing the boot. I can only guess it was a drive/cable/SAT2-MV8's, because when I moved all of my drives to the Areca and removed the SAT2-MV8's, the server booted beautifully. However, the array did not start automatically (which i totally understand, and am thankful for). All drives were detected fine and gui shows the correct drive (by serial) for each device slot (verified from screenshot of old setup), but the "Identification" of the drives (that were on the SAT2-MV8's) are not the exact same (looks like the SAT2-MV8's add "Manufacturer_" to the beginning of the identification). The question is, knowing the drive's serials in the id's DO match up with the correct device slot according to the previously working setup, can I start it up with the different Identifications without something going horribly wrong? Thanks for your help,
  18. So I've searched quite a bit and can't find anything that really tells me clearly (between smart data, dual parity, and unraid redballs), but here's what I have so far... If you have Reallocated sector counts, watch for that number to keep rising to know if the drive is going to die sooner If you have Reported uncorrect , it is worse of a situation than Reallocated sector counts If you have a redball in unraid, something really bad has occurred such as an entire drive not being available. So the newer question is, if i have Dual parity up and running without any Red balls, the array passes a parity check without additional errors, BUT a drive has a lot of reallocated sector counts and reported uncorrects...I'm NOT loosing data in the background and I should look to replace the going bad drive as soon as possible? Does a redball only occur when unraid is incapable of handling errors (unresponsive drive,bad cable...etc...)? Will redball's ever occur based on drive smart data? Thanks for your help!
  19. I previously had a Supermicro H8DM8-2 and through a lot of researching found that the bios had a setting to enable iommu, but for some reason the chipset had issues actually allowing it. Hence Unraid = IOMMU: Disabled. So my question is if someone out there can verify that the Supermicro H8DGI-F Motherboard (SR5690 / SP5100 Chipset) has fully supported passthrough? I've seen around on the board that the SP5100 has similar bios options, but I don't want a replacement board to have the same issues. Thanks for the help. (Couldn't delete my original post in hardware...)
  20. I previously had a Supermicro H8DM8-2 and through a lot of researching found that the bios had a setting to enable iommu, but for some reason the chipset had issues actually allowing it. Hence Unraid = IOMMU: Disabled. So my question is if someone out there can verify that the Supermicro H8DGI-F Motherboard (SR5690 / SP5100 Chipset) has fully supported passthrough? I've seen around on the board that the SP5100 has similar bios options, but I don't want a replacement board to have the same issues. Thanks for the help.
  21. Upgraded, rebooted, still strictly firefox...looks like it worked the first time. I don't get why a client browser should be able to play with drive assignments, even without breaking other data drive assignments. Is there a different web form used for assigning cache drives than data drives?
  22. Got it...thanks, I saw this when searching and found it a bit hokey...but, I'll try the update and see if it goes away...I never use chrome or internet explorer...love firefox too much. The addons make the browser
  23. Unraid Pro 6.2.4...So I assign the two cache drives to cache slots (Identical SSD drives), the shares pick up the cache storage (data goes to the cache drive duo), but when restarting, the cache drives go back to "unassigned devices". Luckily, if I reassign the same cache drive to the same cache slot, I don't loose any of the data that was not moved by the mover. I've searched the forum and found some very strange solutions, but none of them really apply to me. I previously had the unassigned devices addon installed, and Ive removed it a while ago thinking that it might have a bug with the mirrored cache drives. Anyone have an idea?
  24. Was the native restart/reboot command hooked into the powerdown script automatically after 6.2? It allowed the hardware power button to do a clean shutdown.