caplam

Members
  • Posts

    335
  • Joined

  • Last visited

Everything posted by caplam

  1. it was quite a bit long as i wasn't able to find the key of the tray 😁 So now rebuilding is running. Hope it can finish fine. Thank you JorgeB. 👍 I have the former disk2 unplugged.
  2. Not sure to understand. My present situation is: disks 1&3 ok disk4 disabled disk2 rebuilding (paused at 2%) parity1 ok parity2 failing lots of errors If i understand correctly: is for reassigning drives i unassign parity2 (as it's failing i suppose it's useless for rebuilding) i unplug disk2 and replace it with a precleared one clear enough it seems mdcmd has no help. at this stage i think i have only one parity disk in the array (parity2 has been unassigned) for this step i have a new precleared disk as my disk2 So i suppose that all these steps are for having disk4 back in the array.
  3. Today i think i made a big mstake. I was playing around with powertop and i think i made something i shouldn't. 3 minutes after playing with it i had 2 disks with read errors that were disconnected from array. Before that all was fine. A third disk had read errors but was removed. I tried to stop array without success. I couldn't also take a diag file. Server was unresponsive and i had to do cold reboot. I started rebuild procedure for one disk but one my parity drives has now read errors and rebuild is slow (350ko/s). I don't know what's next. I suspect the disk which is offline to be good. Have you any suggestion ? godzilla-diagnostics-20201023-1530.zip
  4. Ok i just have to figure out how you export panel to json edit: you have a pm with jsons
  5. i/o graph is ok but it shows performances and you can't really see impact on ssd endurance. With my previous ssds i didn't see writes could be a problem until it was too late. If i remember correctly i had 700Tb written in 9 months.
  6. i've this resut using stacking: For now it suits me.
  7. i managed to get a result. It's a first shot but it seems to be almost what i'm looking for. On the screenshot you can see high writes on 16th and 17th (i received my new ssds the 15th). Then i replaced the btrfs pool by 2 xfs single ssd pools. You can see the result in the numbers. edit: now to see more data on the same graph i'd like to display the same info for 2 ssd side by side. If i display the same query for the other ssd. Both will recover each other. So i guess i have 2 options but don't know if it's possible. - displaying info in the same way but with 2 bars for a day but one slightly shifted horizontally (i have no idea how to do that) - displaying the same query for the 2 ssds but with transparency so you can see them both.
  8. i imagine a bar graph. a graph for a week with a bar for each day a graph for a month with a bar for each day or could be another type of visualisation. What i want i to track anormal usage of ssd, especially high write rates like i had with my btrfs pool.
  9. I have my monitoring dashboard based on Gilbn work. I'll probably replace it with uud when i have time. Until then i'd like to had a monitoring of data written to my pools (as i had big troubles with excessive writes on cache pool). I absolutlely don't know to write a query for that. I start with diskio write_bytes field but i don't know how to use it to have: the data written displayed in the same way as cost/ kwh in ups dashboard. I mean have a graph with bar for a week with the amount of data written each day. And the same with a mont and data written each week. Do you know how to do that; i'm lost with time interval, group by,... ? edit: this could also be a suggestion of evolution. I think it would be a great idea for monitoring ssd health. It could also be done with smartdata nandwritten attribute. edit2: finally got it installed (only the dashboard) and modified to use my databases which are separated (docker,apc,unraid). @falconexe: do you have a 4K screen ? 😁 I only have full hd screens and it's way too big at least for drives panels (smart, life, health) a quick suggestion : threshold used for disk space is not really confortable : the 49,9% is light yellow. With a darker yellow the text is much more visible. I don't know why but i had to prefix all paths with /rootfs.
  10. i had that problem too. I had 2 ssd western blue 500Gb in a btrfs pool that died in 6months. I replaced them with a single 860 evo formatted in xfs (no pool possible with xfs in unraid). I upgraded to 6.9. beta 30 and tried again a btrfs pool with 2 brand new western blue 500Gb. And i had the same problem. So i'm back to xfs except that in 6.9 you can create several pools. I now have 2 xfs pool of signle ssd and no more problem. The downside is that you better have a strong backup as pools are unprotected. Now i only have to wait to see if and how zfs will be implemented.
  11. not sure if it will help you but i had a strange issue after upgrading to 6.9 beta 30 and changing my cache pool. I have a vm which vdisk was set on qcow2 format. The vdisk file was renamed to vdisk1.img instead of vdisk1.qcow2. I think it might be due to vmbackup plugin but i couldn't reproduce. of course with a vdisk named vdisk1.img and a qcow2 format the vm won't boot. So take a look at your vdisk name and at the format stated in the xml file of your vm and make sur format and file name match.
  12. i started a bug report here: https://forums.unraid.net/bug-reports/prereleases/690-beta30-excessive-writes-on-ssd-pool-r1092/ in conclusion i would say that it concerns vm settings that has changed for unknown reason (perhaps a bug) and not a write problem at least for this case (xfs formatted pool with vdisks); from my understanding btrfs is still problematic.
  13. So my server is back online with 2 single ssd pools formatted with xfs. I now use docker-xfs.img for docker image. Based the first hours activity, i can say that i have no more excessive writes. I will stay away from btrfs for the future. I found one strange thing on my vm explaining the excessive writes on vm-pool. For an unknown reason in vm setting memory assignement has changed from 2GB to 1GB. Normally it would be ok but yesterday at midnight it started to write at 20MB/s rate. The vm was swapping. Normally cache on this vm is mounted on tmpfs. So i shut of the vm and assign 4GB of memory: no more writes. I've never seen that before. So i checked other vm and the other vm which was writing a lot had it's memory down to 1GB (normally 2GB). Btrfs problem apart, i would say that was a bug in vm settings has with correct memory settings i have no more excessive writes.
  14. i transferred it with cp command (docker and vm service stopped). I checked and the no cow bit is not set for directory and img files. Share have enable COW set to auto. But what you explain is for the initial transfer. How does it act once the transfer is completed and services restarted ? I dropped the usage of docker.img. I'm now using directory option with dedicated share. I must admit i don't fully understand COW things and let the option on auto in shares settings. Until now i hadn't had to think about it. When i was using proxmox my vdisk were qcow2 and i could take snaphots. For now i decided to use 2 single ssd pools. Right now i unassigned one ssd and restart array (balance is running). I think the right path is then to convert to single pool. After that i will create the second pool and assign the second ssd, format it to xfs and transfer all data to it. The do the same thing for the first ssd (formatting xfs) and retransfer back the data need. I'll wait to see the future of zfs and unraid (interesting for me as i have plenty of ram ecc); but for sure i won't use anymore unraid with btrfs.
  15. if limetech is offering zfs, do you know if this will be a complete zfs array to replace the entire array and cache pool or the ability to make cache pool a zfs pool ? If this the first we'll lose the ability to have different size disks. For now i'm converting my cache pool to single disk and then i'll make 2 single ssd pools formatted with xfs. I'll use one as cache for array and docker, the other one for domains share. edit: i remember having my synology nas formatted with btrfs and not having such issues.
  16. i fired up my grafana docker and i can see that the average write rate on one of the ssd is around 6,5MB/s. so it will write more than 500 GB a day. edit this mean my ssds will last 13 month 🤬
  17. and if we use qcow2 for vdisks on ext4. That was the cas when i was using proxmox and i never had such a problem. At this time i had a small 256Gb ssd which i used during 4 years and it was still ok when i stopped my cluster.
  18. wow this study is astonishing. I will read that but it's question of 30 times write amplification. I don't even understand why btrfs is still being used. Performance are ridiculous. I hope unraid will offer an alternative.
  19. i read that. Does this mean there will be no solution to that? In your post and followings it's discussed about loopback and overhead. But here i don't see excessive writes on loop3 device so i guess the writes are on vdisks. 4 Gb for one hour on a vm that does almost nothing i find that huge considering inside the vm iotop show 200MB activity on disk. This vm is an ipbx with almost no traffic. It's my home line with no calls today. The other guest is a home automation system (jeedom) based on php and mysql. Inside the vm the activity is 10 times less than what we see outside. I find that the amplification is massive. And it doesn't concern all the vm. I have others with no problem. I have these 2 vm since a long time (rebuild them from time to time ) and before unraid they were running on proxmox on a lvm pool (formatted in ext4 if i remeber correctly). If i can't have vm running on a pool i'll probably consider another system. Perhaps the choice of raw for vdisk is not the best on a btrfs pool. I must admit i don't really understand all the implications. I don't get the point of using only btrfs for the pool and moreover we don't even have a gui for managing snapshots.
  20. I have a problem that i've not seen so far in the forum. In 6.8.3 i had an excessive write problem. It's known and related to partition btrfs. I upgraded to beta30 directly from 6.8.3 Yesterday i received new ssds so i setup a new pool (2 ssd) and make it my cache pool. It stores appdata, domains and system. I continue to have high writes rates on loop2 devices. https://forums.unraid.net/topic/97902-getting-rid-of-high-ssd-write-rates/ https://forums.unraid.net/bug-reports/prereleases/unraid-os-version-690-beta30-available-r1076/?do=findComment&comment=11066 So i stop docker service and restart it without loop device using the directory option. Now /var/lib/docker is mapped to a dedicated share. But i continue to have excessive writes here is the result of iotop -aoP on unraid server after 1 hour: Total DISK READ : 0.00 B/s | Total DISK WRITE : 4.19 M/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 7.88 M/s PID PRIO USER DISK READ DISK WRITE> SWAPIN IO COMMAND 28332 be/4 root 51.46 M 3.99 G 0.00 % 0.13 % qemu-system-x86_64 -name guest=wazo,debu~ny,resourcecontrol=deny -msg timestamp=on 27768 be/4 root 80.00 K 2.96 G 0.00 % 0.07 % qemu-system-x86_64 -name guest=Hermes,de~ny,resourcecontrol=deny -msg timestamp=on 24611 be/4 root 2.13 M 848.54 M 0.00 % 0.06 % shfs /mnt/user -disks 31 -o noatime,allow_other -o remember=330 19819 be/4 root 100.00 K 507.45 M 0.00 % 0.03 % qemu-system-x86_64 -name guest=PiHole,de~ny,resourcecontrol=deny -msg timestamp=on 21224 be/4 root 0.00 B 218.03 M 0.00 % 0.01 % [kworker/u65:10-btrfs-endio-write] 28870 be/4 root 0.00 B 169.11 M 0.00 % 0.01 % qemu-system-x86_64 -name guest=Apollon,d~ny,resourcecontrol=deny -msg timestamp=on 21422 be/4 root 0.00 B 159.13 M 0.00 % 0.00 % [kworker/u65:1-btrfs-endio-write] 27287 be/4 root 0.00 B 139.56 M 0.00 % 1.22 % dockerd -p /var/run/dockerd.pid --log-op~ --log-level=error --storage-driver=btrfs 15717 be/4 root 0.00 B 132.48 M 0.00 % 0.00 % [kworker/u65:2-btrfs-endio-write] 25364 be/4 root 0.00 B 130.80 M 0.00 % 0.01 % [kworker/u65:7-events_unbound] 10515 be/4 root 0.00 B 126.08 M 0.00 % 0.00 % [kworker/u65:9-btrfs-worker] 10708 be/4 root 0.00 B 97.09 M 0.00 % 0.00 % [kworker/u65:4-btrfs-endio-write] 10514 be/4 root 0.00 B 94.36 M 0.00 % 0.00 % [kworker/u65:0-btrfs-endio-write] 26862 be/4 root 0.00 B 68.48 M 0.00 % 0.00 % [kworker/u65:3-btrfs-endio-write] 22073 be/4 root 0.00 B 55.11 M 0.00 % 0.00 % [kworker/u66:7-btrfs-endio-write] 13555 be/4 root 0.00 B 52.02 M 0.00 % 0.00 % [kworker/u66:0-btrfs-endio-write] 13144 be/4 root 8.00 K 51.37 M 0.00 % 0.00 % [kworker/u66:14-btrfs-endio-write] 10269 be/4 root 0.00 B 50.30 M 0.00 % 0.00 % [kworker/u66:2-btrfs-endio-write] 25365 be/4 root 0.00 B 49.25 M 0.00 % 0.00 % [kworker/u66:5-btrfs-endio-write] 16626 be/4 root 0.00 B 48.81 M 0.00 % 0.00 % [kworker/u66:4-btrfs-endio-write] 3032 be/4 root 0.00 B 41.62 M 0.00 % 0.00 % [kworker/u66:3-btrfs-endio-write] 10709 be/4 root 0.00 B 40.86 M 0.00 % 0.00 % [kworker/u65:11-btrfs-endio-write] 10710 be/4 root 0.00 B 37.89 M 0.00 % 0.00 % [kworker/u65:12-btrfs-endio-write] 8224 be/4 root 0.00 B 30.77 M 0.00 % 0.00 % [kworker/u66:6-btrfs-endio-write] 2808 be/4 root 0.00 B 27.78 M 0.00 % 0.00 % [kworker/u66:1-btrfs-endio-write] 8142 be/4 root 0.00 B 10.25 M 0.00 % 0.01 % [kworker/u64:1-bond0] 3432 be/4 103 0.00 B 7.12 M 0.00 % 0.00 % postgres: 10/main: stats collector process 8848 be/4 nobody 8.00 K 2.38 M 0.00 % 99.99 % mono --debug Sonarr.exe -nobrowser -data=/config 26116 be/4 nobody 17.23 M 2.20 M 0.00 % 0.01 % mono --debug Radarr.exe -nobrowser -data=/config The first 2 line are 2 vms i can't post the same result for these commands as it was done in a ssh session in mremotng (no copy avaliable) but for the host hermes the amount of data written was around 10 times less and 20 times for the host wazo. It doesn't involve loop3 device. I don't know where it writes. Each guest has a single vdisk raw format stored in domains share using virtio driver. I don't really know where and how investigate.
  21. i have found a strange thing regarding ssd writes .it's related to vm. I'll post in bug reports
  22. i thought setting up a cache pool with 1Mib aligned partition would be enough but that's not the case. I saw your post in bug reports and right now i started docker service with the directory option (with a dedicated share). For now i'm redownloading docker images but my dsl connections is so slow that i think it's a 24 to 48 hours process.
  23. i've checked my containers against the "healthy" mention. No one have that. I haven't read anything about the format of docker image. I thought btrfs was mandatory. I assume it has nothing to do with the pool format which is also btrfs. You can store image directly on file system ? Does this mean you don't have a loop2 device anymore ? I didn't see such options. edit i missed that option (docker image) as it was explained in beta 25 release post. I only installed beta starting with 30. I think i will give a try to the directory option. I have one question though. In the release post squid writes that if choosing directory option it's better to make a dedicated share but in his example (and default path when choosing this option) the directory is /mnt/cache/system/docker/docker which is in the system share. I guess if i create a docker share the path is /mnt/cache/docker and it's ok?
  24. Hello, In 6.8.3 i have had hard time with excessive writes on ssd. I'm now with 6.9 beta30 and it seems i have the same problems. Like in 6.8.3 high writes are made trough loop2 device which is /var/lib/docker . I like unraid but i'm not going to change my ssds every six months. So i'm considering dropping unraid docker engine. I have 2 options : - drop unraid but i don't want to. Now i'm more familiar with it, it would be a waste. - drop the usage of docker engine. for option 1 i guess freenas would be a viable option but besides the fact of learning a new os it has its downsides (choice of hard drives,...) for option 2 i'm considering setting up an ubuntu vm with docker engine and portainer for managing containers, network, .... How can i set up such a vm ? which network adapter ? which size of vdisk? how to use existing appdata from containers ? how to passtrough gpu to plex and tdarr dockers inside the vm ? which data from the vm to backup and how ? Is there someone who followed this path ?