SnickySnacks

Members
  • Posts

    105
  • Joined

  • Last visited

Posts posted by SnickySnacks

  1. On 12/13/2019 at 5:07 AM, TJOPTJOP said:

     

    What I also do not understand is how thoses dimms kan be broken. ECC dimms are error corrected right?


    Also, this is exactly what the log is telling you.

    A "CE memory read error" or "CE memory scrubbing error" is a "Correctable Error (CE)". It would be worse if you were getting "Uncorrectable Errors (UE)"

    My concern is that you've merely turned off the memory scrubbing and whatnot, hiding the errors rather than fixing them.

  2. Just to be clear, are you now running with ECC off? That seems like a bad idea.

    Doesn't that mean that instead of seeing the errors, it's just going to be failing silently and potentially corrupting the memory? (especially if you weren't seeing the errors in memtest in the first place)

     

    Quote

    So, I decide to install all the ram modules again, turn off ECC checking in my bios and run a full Memtest. Any suggestions how many pases I need to confirm if my ram is good or bad? I think that one pass will take 12+ hours.

     

  3. It will be something like:

    1) copy everything off the existing licensed USB onto a PC

    2) delete everything off the licensed USB

    3) copy everything from the trial USB to the licensed USB

    4) Delete the trial key from the licensed USB
    5) copy the licensed Unraid key file from the PC backup onto the licensed USB


    If you do it this way you should still have your trial USB and a backup copy of your licensed USB in case anything goes wrong.

  4. It's my preference to stop them on Unraid since I don't have to manage individual computers (or guests!).

    If you notice any new files that need to be added to the veto list, please let the community know.
    At this point my shares are all read-only as a ransomware preventative measure except for 1 share that I stage files to, so nothing can really make files in the array anymore anyways.

  5. 5 hours ago, Zonediver said:

     

    That's a normal behavior.

    After "every" boot/reboot you need to push the spindown-button only once and all is fine - until the next boot/reboot.


    Eh? Not that normal.

    I rarely if ever hit the spin down button and my disks always spin down normally after a reboot (once folder caching is done doing its thing).

     

    Flam3h:
    How sure are you that the drives aren't spinning down eventually? Fix Common Problems is running 10 minutes after your system comes up and you can see in the log that the system issues a spin down ~15 minutes after that, which generates a (likely harmless) error on an NVME drive:


     

    Oct 14 01:26:33 Tower emhttpd: shcmd (138): /usr/sbin/hdparm -y /dev/nvme0n1
    Oct 14 01:26:33 Tower root:  HDIO_DRIVE_CMD(standby) failed: Inappropriate ioctl for device
    Oct 14 01:26:33 Tower root: 
    Oct 14 01:26:33 Tower root: /dev/nvme0n1:
    Oct 14 01:26:33 Tower root:  issuing standby command
    Oct 14 01:26:33 Tower emhttpd: shcmd (138): exit status: 25


    Other than that, everything seems normal in the log.

  6. At the time those diagnostics were created, were any CPUs showing 100% load? If so, which ones?

    Also, have you tried booting in safe mode and seeing if this occurs with no plugins/dockers/vms loaded?

    There does seem to be some corruption on one of your disks:

    Oct  7 23:56:33 Homebase kernel: BTRFS critical (device sdj1): corrupt leaf: root=5 block=1953586397184 slot=84, bad key order, prev (288230376157862467 96 4) current (6150723 96 5)
    ### [PREVIOUS LINE REPEATED 4 TIMES] ###



    Should probably run a check on that one, as it looks like it eventually causes a kernel fault:
     

    Oct  8 02:02:49 Homebase kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
    Oct  8 02:02:49 Homebase kernel: PGD 4ad0b1067 P4D 4ad0b1067 PUD 4ad0b0067 PMD 0 
    Oct  8 02:02:49 Homebase kernel: Oops: 0000 [#1] SMP NOPTI
    Oct  8 02:02:49 Homebase kernel: CPU: 15 PID: 1848 Comm: fstrim Tainted: P           O      4.19.56-Unraid #1
    Oct  8 02:02:49 Homebase kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P2.10 09/09/2019
    Oct  8 02:02:49 Homebase kernel: RIP: 0010:btrfs_trim_fs+0x166/0x369
    Oct  8 02:02:49 Homebase kernel: Code: 00 00 48 c7 44 24 38 00 00 00 00 49 8b 45 10 48 c7 44 24 40 00 00 00 00 48 c7 44 24 30 00 00 00 00 48 89 44 24 20 48 8b 43 68 <48> 8b 80 80 00 00 00 48 8b 80 f8 03 00 00 48 8b 80 a8 01 00 00 0f
    Oct  8 02:02:49 Homebase kernel: RSP: 0018:ffffc9001294fc90 EFLAGS: 00010297
    Oct  8 02:02:49 Homebase kernel: RAX: 0000000000000000 RBX: ffff888f5db68200 RCX: ffff888fbf604878
    Oct  8 02:02:49 Homebase kernel: RDX: ffff888cac98de80 RSI: ffff888f5d718c00 RDI: ffff888fbf604858
    Oct  8 02:02:49 Homebase kernel: RBP: 0000000000000000 R08: ffff888f5911fa70 R09: ffff888f5911fa68
    Oct  8 02:02:49 Homebase kernel: R10: ffffea0022918ec0 R11: ffff888ffe9e0b80 R12: ffff888fbfafe000
    Oct  8 02:02:49 Homebase kernel: R13: ffffc9001294fd20 R14: 0000000000000000 R15: 0000000000000000
    Oct  8 02:02:49 Homebase kernel: FS:  000014b7fa3ac780(0000) GS:ffff888ffe9c0000(0000) knlGS:0000000000000000
    Oct  8 02:02:49 Homebase kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Oct  8 02:02:49 Homebase kernel: CR2: 0000000000000080 CR3: 00000001a6aa2000 CR4: 0000000000340ee0
    Oct  8 02:02:49 Homebase kernel: Call Trace:
    Oct  8 02:02:49 Homebase kernel: ? dput.part.6+0x24/0xf6
    Oct  8 02:02:49 Homebase kernel: btrfs_ioctl_fitrim.isra.7+0xfe/0x135
    Oct  8 02:02:49 Homebase kernel: btrfs_ioctl+0x4f6/0x28ad
    Oct  8 02:02:49 Homebase kernel: ? queue_var_show+0x12/0x15
    Oct  8 02:02:49 Homebase kernel: ? _copy_to_user+0x22/0x28
    Oct  8 02:02:49 Homebase kernel: ? cp_new_stat+0x14b/0x17a
    Oct  8 02:02:49 Homebase kernel: ? vfs_ioctl+0x19/0x26
    Oct  8 02:02:49 Homebase kernel: vfs_ioctl+0x19/0x26
    Oct  8 02:02:49 Homebase kernel: do_vfs_ioctl+0x526/0x54e
    Oct  8 02:02:49 Homebase kernel: ? __se_sys_newfstat+0x3c/0x5f
    Oct  8 02:02:49 Homebase kernel: ksys_ioctl+0x39/0x58
    Oct  8 02:02:49 Homebase kernel: __x64_sys_ioctl+0x11/0x14
    Oct  8 02:02:49 Homebase kernel: do_syscall_64+0x57/0xf2
    Oct  8 02:02:49 Homebase kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Oct  8 02:02:49 Homebase kernel: RIP: 0033:0x14b7fa4de397
    Oct  8 02:02:49 Homebase kernel: Code: 00 00 90 48 8b 05 f9 2a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 2a 0d 00 f7 d8 64 89 01 48
    Oct  8 02:02:49 Homebase kernel: RSP: 002b:00007ffc52c9f358 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    Oct  8 02:02:49 Homebase kernel: RAX: ffffffffffffffda RBX: 00007ffc52c9f4b0 RCX: 000014b7fa4de397
    Oct  8 02:02:49 Homebase kernel: RDX: 00007ffc52c9f360 RSI: 00000000c0185879 RDI: 0000000000000003
    Oct  8 02:02:49 Homebase kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000415fd0
    Oct  8 02:02:49 Homebase kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000415740
    Oct  8 02:02:49 Homebase kernel: R13: 00000000004156c0 R14: 0000000000415740 R15: 000014b7fa3ac6b0
    Oct  8 02:02:49 Homebase kernel: Modules linked in: veth xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap macvlan xt_nat ipt_MASQUERADE iptable_nat nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs dm_crypt algif_skcipher af_alg dm_mod dax md_mod bonding edac_mce_amd kvm_amd nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) drm_kms_helper btusb btrtl btbcm drm kvm btintel igb bluetooth agpgart syscopyarea sysfillrect crct10dif_pclmul sysimgblt fb_sys_fops crc32_pclmul crc32c_intel ghash_clmulni_intel i2c_piix4 i2c_algo_bit pcbc i2c_core aesni_intel aes_x86_64 crypto_simd wmi_bmof mxm_wmi ahci ecdh_generic cryptd ccp libahci glue_helper wmi button pcc_cpufreq acpi_cpufreq
    Oct  8 02:02:49 Homebase kernel: CR2: 0000000000000080
    Oct  8 02:02:49 Homebase kernel: ---[ end trace 9bdd9e618dc0d9c2 ]---
    Oct  8 02:02:49 Homebase kernel: RIP: 0010:btrfs_trim_fs+0x166/0x369
    Oct  8 02:02:49 Homebase kernel: Code: 00 00 48 c7 44 24 38 00 00 00 00 49 8b 45 10 48 c7 44 24 40 00 00 00 00 48 c7 44 24 30 00 00 00 00 48 89 44 24 20 48 8b 43 68 <48> 8b 80 80 00 00 00 48 8b 80 f8 03 00 00 48 8b 80 a8 01 00 00 0f
    Oct  8 02:02:49 Homebase kernel: RSP: 0018:ffffc9001294fc90 EFLAGS: 00010297
    Oct  8 02:02:49 Homebase kernel: RAX: 0000000000000000 RBX: ffff888f5db68200 RCX: ffff888fbf604878
    Oct  8 02:02:49 Homebase kernel: RDX: ffff888cac98de80 RSI: ffff888f5d718c00 RDI: ffff888fbf604858
    Oct  8 02:02:49 Homebase kernel: RBP: 0000000000000000 R08: ffff888f5911fa70 R09: ffff888f5911fa68
    Oct  8 02:02:49 Homebase kernel: R10: ffffea0022918ec0 R11: ffff888ffe9e0b80 R12: ffff888fbfafe000
    Oct  8 02:02:49 Homebase kernel: R13: ffffc9001294fd20 R14: 0000000000000000 R15: 0000000000000000
    Oct  8 02:02:49 Homebase kernel: FS:  000014b7fa3ac780(0000) GS:ffff888ffe9c0000(0000) knlGS:0000000000000000
    Oct  8 02:02:49 Homebase kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Oct  8 02:02:49 Homebase kernel: CR2: 0000000000000080 CR3: 00000001a6aa2000 CR4: 0000000000340ee0


     

    • Like 1
  7. I was contemplating the same thing a while back.
    A lot of it will depend on what kind of drives you are running

    WD Red NAS drives pull less than 2A peak:

    https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/wd-red-hdd/data-sheet-western-digital-wd-red-hdd-2879-800002.pdf

    While older WD Blue drives could use up to 3A, modern ones are also sub 2A:
    https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/wd-blue-hdd/data-sheet-wd-blue-pc-hard-drives-2879-771436.pdf

    It would seem to me that even accounting for motherboard/cpu/etc you should easily be able to handle 30+ something drives before you need to think about expanding your PSU (Seasonic Titan Prime 1000 should have 83A on the 12V rail). If you're running 7200 RPM drives or something this may be closer to 20, but 10 drives should be no problem at all.

    I am running 16 drives, I think, on a Seasonic 860 (71A) with no issues that I've seen. 

    And, unless you are trying to absolutely max out your storage capacity, at some point you'll likely be replacing smaller drives with larger ones as you expand. Doesn't make as much sense to add four 3TB drives when two 6TB or one 12TB will do and likely be the same price or cheaper (parity, etc, I know....)

  8. I ended up cheaping out a bit.
    Here's what I went with:

    CPU: Intel i3-9100

    Motherboard: Gigabyte C246-WU4

    Ram: Crucial CT16G4WFD8266 16GBx4

     

    Turns out Xeons are really expensive. Still cost around $800 for this setup, fully half of which was for the RAM.
    I decided the PCI lanes weren't as big of a deal as I was making them out to be. I settled for 8x + 8x + 8 native SATA ports, which will be enough to cover all 24 drives with two M1015s (or equivalent) and the onboard ports.

    I'm not sure going from 16GB RAM to 64GB will really do anything for me, but in the back of my mind I'm hoping it will help with Folder Caching or Crashplan or something.


    Pros:

    Parity check single core CPU usage closer to 30% vs the 100% that was occurring before

    Parity check speeds up to 115MB/sec vs 70MB/sec (so far, may get faster after the 2TB disks)

    I was able to set my tunables back to the default, rather than the (very low) values I was using to prevent CPU stalls.

    Cons:

    The motherboard has displayport connectors on it. I'm old and grumpy and don't own a single displayport monitor or adapter. And didn't even realize there was another option than VGA/DVI/HDMI. Had to pull out an ancient, broken video card as I really have no spares. :(
    Unraid won't boot unless I boot into GUI mode. I don't really care that much, so I'll leave it until I get the display situation worked out. Worst case I'll just set it as the default, so I don't have to hook a keyboard/monitor up to reboot it, if I can't fix it. Could be related to the video card or something.

    Edit: Getting the onboard video card working instead of the old, broken one I was using seems to let it boot up properly now. yay.

  9. On 7/29/2019 at 1:23 PM, johnnie.black said:

    There can be dependent on the number of disks connected, but there isn't a performance hit just because you're using one, you can easily calculate the max available bandwidth, it just depends if it's SAS/SAS2/SAS3 and linked with single or dual link to the HBA.


    Yes, that is what I said. When running "many drives". (I suppose that could be misinterpreted, but I meant "When there are a numerically large number of drives")

    I was looking at your testing thread earlier:


    I have 14 drives right now and it takes forever to get through (60MB/s last I checked), with 3 on the m1015 and 11 on the expander. 

    I had assumed it was due to my CPU, since that was getting pegged at 100% usage and stalling but...

    Now I'm wondering if I did something silly like plugging my M1015 into a x4 slot or something, because I really should be getter better speeds than this.

    More things to think about...

  10. That's pretty cool, but at the end the only thing going through my head was:
    "Man, I really hope he is also going to set up an offsite backup"

    I'm also a bit curious how much data he actually ended up with.

    I'd be pretty surprised if those drives in the bins were actually all full.

     

    • Like 1
  11. It's been a while but my thinking goes like this:

    Last I checked there's a very real performance hit when running many drives off an expander:

    Plus I feel like M1015s are probably easier/cheaper to get than RES2SV240s.
    I'm not planning to replace it right now, since the expander is already paid for, but I'd like the option to run 3xM1015s for full bandwidth should my expander ever fail (or 2xM1015s and the rest of the drives off the motherboard, is also an option).

    The motherboard/CPU I've been using (just a low end consumer board/phenom ii CPU I picked up from Microcenter) was what I had in my Unraid test build when I was seeing if it would work for what I wanted. I migrated it into the 4224 so I wouldn't have to spend money on new hardware. For years been meaning to upgrade to something a bit better and given the problems I've been having doing parity checks means now is probably a good time. Even though I have no plans to change the M1015 right now, it's always in the back of my mind that it, or the expander, could fail in the future.

    The Norco should fit a full ATX board, so no worries there.

    I'm a bit curious if one can actually run all the PCIe slots on a X11SCA-F with an Intel Xeon E. I thought those topped out at 16 lanes, but the board claims to support 8/8/4/1. Guessing the 4/1 must run through the PCH or something.

  12. Using this post to see if anyone has recommendations and to "think out loud" while I work on this.

    Currently running an AMD Phenom II X4 processor which chokes and dies (CPU stalls) when running dual parity checks, likely due to lack of AVX instructions. This locks up the UI/terminal when it happens. I've managed to reduce the occurrences by lowering md_stripes or something, but parity checks are very slow and will never be optimal with this setup.

    I figure this is as good a reason as any to upgrade to some real hardware.

    I run very few dockers and have no real plans to run VMs on the server (and if I do it's very unlikely they will need video card passthrough or anything as I have a dedicated gaming PC), but there's no kill like overkill so here is what I'm thinking:


    I'm not sure I have any real need for multi processor support.

    ECC RAM. I'd prefer a configuration that allows me to start with 32GB and move to 64GB (or more) later.

     

    IPMI. Being able to manage the computer remotely would be great.

     

    Built-in graphics or the ability to run headless. Right now I have a...Diamond Stealth video card from the 90s crammed into my unraid because the mobo won't boot without a graphics card. I have no need for a monitor on the system if there is IPMI support though.

     

    No bios update needed to boot the recommended processor. I have no spare processors sitting around, so either the motherboard must start with a good bios or be one of those magic boards that can somehow update without a processor.

     

    SATA ports don't matter. Running M1015+RES2SV240. Only change I'd make in the future is to move to multiple M1015s (or whatever is the new hotness) rather than using the expander. Currently running 12+2 drives in a Norco 4224

    M.2 and internal USB would be nice.

    Intel gigabit ethernet LAN port.

    I remember back in the day some super micro boards (X9SCM?) had issues running multiple M1015s and wouldn't post. Is that still a thing?

    Would the current version that Supermicro board be fine? The X11SCM-F with probably a Xeon E-2174G or E-2176G? Or is there another direction that I should be considering?

    This isn't really anything fancy or powerful, but trying to work out what I need is tough because there's so many options in server hardware out there that I'm a bit overwhelmed, to be honest. -_- 

    EDIT: It appears the X11SCM-F only has 1 PCI slot so that's probably not the way to go. Hmmm.

  13. I haven't tested this or anything, as I don't have this issue.

    But, given that the error seems to be generated from the tower kernel, it's likely you'd need to set it for unraid, not for each vm individually.

     

    What you'd need to do to test, now that I am looking at it, is on your usb edit the file

     

    /syslinux/syslinux.cfg

    and either create a new entry or add to the append line of an existing one

    clocksource=hpet

     

    So like

    label unRAID OS
      menu default
      kernel /bzimage
      append initrd=/bzroot

     

    would become

    label unRAID OS
      menu default
      kernel /bzimage
      append initrd=/bzroot clocksource=hpet

     

    it might work, do nothing, or not boot, but it's simple enough to undo either way.


    It's worth a try, at least.

     

  14. Rather than replacing the motherboard/cpu, which seems like a rather drastic solution, is it possible to change the clock source in the go file itself?


    It looks like linux should allow customization of what clock source the kernel is using. Having it start with hpet or jiffies instead of tsc might be an option?

     

    see https://www.kernel.org/doc/html/v4.10/admin-guide/kernel-parameters.html

     

    Is it possible to explicitly set clocksource=hpet for the people having this issue?