warwolf7 Posted January 24 Share Posted January 24 (edited) Hi, I have been trying to troubleshoot this by reading other threads. With no success Problem unraid version v6.12.1 A few weeks ago, the hdd were spinned down after 2 hours of inactivity like configured. I discovered yesterday that they don't spin down anymore. The disk all always spinning. Actually, when they are spun down, they are immidiatly spun back up Jan 24 05:36:03 ChapelleSixtine emhttpd: spinning down /dev/sdd Jan 24 05:37:08 ChapelleSixtine emhttpd: read SMART /dev/sdd Jan 24 05:37:09 ChapelleSixtine emhttpd: spinning down /dev/sde Jan 24 05:37:09 ChapelleSixtine emhttpd: spinning down /dev/sdc Jan 24 05:38:06 ChapelleSixtine emhttpd: read SMART /dev/sde Jan 24 05:38:13 ChapelleSixtine emhttpd: read SMART /dev/sdc What changed I changed the ram, because file corruptions on one of the cache drive in btrfs. Memtest86+ returned a few error per hour of test. I teste the new ram for 24+hr and got no error. I updated all the plugins Here is what I did try At every step, I did manually spin down the hdd and they where spin up shortly after <1min made sure turbo write ( i dont have the plugin) "Tunable (md_write_method):" is set to auto hdd spindown delay is 2hours enabled spinup groups is disabled I made sure that the share were properly configured on their proper drives stop every container stop docker stop array and restarted with "disk \ enable auto start : off" then try this in maintenance modes run inotifywait -mr /mnt/disk2 i got a bunch of /mnt/disk2/ OPEN,ISDIR /mnt/disk2/ ACCESS,ISDIR /mnt/disk2/ CLOSE_NOWRITE,CLOSE,ISDIR /mnt/disk2/ OPEN,ISDIR /mnt/disk2/ ACCESS,ISDIR /mnt/disk2/ CLOSE_NOWRITE,CLOSE,ISDIR /mnt/disk2/ OPEN,ISDIR /mnt/disk2/ ACCESS,ISDIR /mnt/disk2/ CLOSE_NOWRITE,CLOSE,ISDIR then I found that something was accessing the recycling bin folders. I removed the recycling bin plugin, restarted the server I confirmed that my userscripts were not running I looked at the open file and file activity and couldn't find anything that would use those disks the inotifywait returned only the "/mnt/disk2/ OPEN,ISDIR /mnt/disk2/ ACCESS,ISDIR /mnt/disk2/ CLOSE_NOWRITE,CLOSE,ISDIR " I tried inotifywait -mr -e access -e modify /mnt/disk2 but nothing else then "/mnt/disk2/ ACCESS,ISDIR " was seen I disabled the file activity plugin I tried to look at htop, but I couldn't see what was triggering the spin up. I looked at the "array page" and the reads and writes, they were at 0B/s, then I saw them jump to some numbers, all the same numbers for every disk for a very short time (1<sec, I'm not sure I was looking directly at that when it happenned). Every time I would spin down the drives, they would be spin up immediatly after sequentially for every step performed above Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdm Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdj Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdk Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdh Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdg Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdd Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sde Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdb Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdf Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdc Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdn Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/nvme1n1 Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/nvme0n1 Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdo Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdl Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdi Jan 23 23:13:19 ChapelleSixtine emhttpd: spinning down /dev/sdp Jan 23 23:13:20 ChapelleSixtine emhttpd: sdspin /dev/nvme1n1 down: 25 Jan 23 23:13:20 ChapelleSixtine emhttpd: sdspin /dev/nvme0n1 down: 25 Jan 23 23:13:37 ChapelleSixtine emhttpd: read SMART /dev/sdf Jan 23 23:14:02 ChapelleSixtine emhttpd: read SMART /dev/sdo Jan 23 23:14:26 ChapelleSixtine emhttpd: read SMART /dev/sdd Jan 23 23:14:27 ChapelleSixtine emhttpd: read SMART /dev/sdk Jan 23 23:14:47 ChapelleSixtine emhttpd: read SMART /dev/sdm Jan 23 23:14:54 ChapelleSixtine emhttpd: read SMART /dev/sdb Jan 23 23:15:01 ChapelleSixtine emhttpd: read SMART /dev/sdi Jan 23 23:15:02 ChapelleSixtine emhttpd: read SMART /dev/sdj Jan 23 23:15:24 ChapelleSixtine emhttpd: read SMART /dev/sdg Jan 23 23:15:28 ChapelleSixtine emhttpd: read SMART /dev/sde Jan 23 23:15:40 ChapelleSixtine emhttpd: read SMART /dev/sdn Jan 23 23:15:46 ChapelleSixtine emhttpd: read SMART /dev/sdc Jan 23 23:15:58 ChapelleSixtine emhttpd: read SMART /dev/sdl Jan 23 23:16:09 ChapelleSixtine emhttpd: read SMART /dev/sdh Jan 23 23:16:10 ChapelleSixtine emhttpd: read SMART /dev/sdp Would updating to 6.12.6 solve this? I read in another thread that someone on 6.12.1 had to downgrade back on 6.11.5 to get this fixed. ( Any help would be greatly appreciated. I have no idea how to find out what is spinning all my drive back up chapellesixtine-diagnostics-20240124-0918.zip Edited January 24 by warwolf7 I posted an incomplete post. Now it's complete Quote Link to comment
warwolf7 Posted January 24 Author Share Posted January 24 (edited) post 1 incomplete. it posted against my will. Please give me 10minutes to complete Post1 is now complete Edited January 24 by warwolf7 Quote Link to comment
warwolf7 Posted February 4 Author Share Posted February 4 Hi, if anyone has any guidance I would really appreciate it. Quote Link to comment
warwolf7 Posted February 13 Author Share Posted February 13 (edited) At least some ideas on what to look for and where please edit : I updated to 6.12.6 a couple of days ago but this hasnt changed. Disk spins right up after they were spun down by the spindown delay here is the current log Feb 12 17:12:08 ChapelleSixtine emhttpd: read SMART /dev/sdb Feb 12 17:18:03 ChapelleSixtine emhttpd: spinning down /dev/sdk Feb 12 17:18:13 ChapelleSixtine emhttpd: spinning down /dev/sdg Feb 12 17:18:25 ChapelleSixtine emhttpd: spinning down /dev/sdh Feb 12 17:19:12 ChapelleSixtine emhttpd: read SMART /dev/sdk Feb 12 17:19:23 ChapelleSixtine emhttpd: read SMART /dev/sdg Feb 12 17:19:35 ChapelleSixtine emhttpd: read SMART /dev/sdh Feb 12 17:37:03 ChapelleSixtine emhttpd: spinning down /dev/sdi Feb 12 17:38:08 ChapelleSixtine emhttpd: read SMART /dev/sdi Feb 12 17:39:02 ChapelleSixtine emhttpd: spinning down /dev/sdd Feb 12 17:39:10 ChapelleSixtine emhttpd: spinning down /dev/sdc Feb 12 17:40:07 ChapelleSixtine emhttpd: read SMART /dev/sdd Feb 12 17:40:14 ChapelleSixtine emhttpd: read SMART /dev/sdc Feb 12 18:16:03 ChapelleSixtine emhttpd: spinning down /dev/sde Feb 12 18:17:05 ChapelleSixtine emhttpd: read SMART /dev/sde Feb 12 18:17:52 ChapelleSixtine emhttpd: spinning down /dev/sdj Feb 12 18:18:11 ChapelleSixtine emhttpd: read SMART /dev/sdj Feb 12 19:12:02 ChapelleSixtine emhttpd: spinning down /dev/sdb Feb 12 19:13:08 ChapelleSixtine emhttpd: read SMART /dev/sdb Feb 12 19:19:03 ChapelleSixtine emhttpd: spinning down /dev/sdk Feb 12 19:19:14 ChapelleSixtine emhttpd: spinning down /dev/sdg Feb 12 19:19:25 ChapelleSixtine emhttpd: spinning down /dev/sdh Feb 12 19:20:11 ChapelleSixtine emhttpd: read SMART /dev/sdk Feb 12 19:20:23 ChapelleSixtine emhttpd: read SMART /dev/sdg Feb 12 19:20:34 ChapelleSixtine emhttpd: read SMART /dev/sdh Feb 12 19:40:02 ChapelleSixtine emhttpd: spinning down /dev/sdd Feb 12 19:40:09 ChapelleSixtine emhttpd: spinning down /dev/sdc Feb 12 19:41:07 ChapelleSixtine emhttpd: read SMART /dev/sdd Feb 12 19:41:15 ChapelleSixtine emhttpd: read SMART /dev/sdc Feb 12 20:17:02 ChapelleSixtine emhttpd: spinning down /dev/sde Feb 12 20:18:05 ChapelleSixtine emhttpd: read SMART /dev/sde Feb 12 20:18:07 ChapelleSixtine emhttpd: spinning down /dev/sdj Feb 12 20:19:13 ChapelleSixtine emhttpd: read SMART /dev/sdj So today, I turned the spin dow delay to never Edited February 13 by warwolf7 added informations Quote Link to comment
KingfisherUK Posted February 16 Share Posted February 16 This is curious - I updated from 6.12.6 to 6.12.8 today and I'm now seeing this exact same issue - disks spin down and then immediately spin back up again, even if "forced" to spin down via the GUI. I've tried disabling Docker, no change - it looks like something is triggering a SMART read immediately when a disk is spun down, which spins it up again. Quote Link to comment
foo_fighter Posted February 18 Share Posted February 18 I'm seeing the same thing in 6.12.8(also upgraded from 6.12.6). SMART reads on one ZFS drive every 20 or so minutes which prevents s3 sleep. Quote Link to comment
itimpi Posted February 18 Share Posted February 18 2 hours ago, foo_fighter said: I'm seeing the same thing in 6.12.8(also upgraded from 6.12.6). SMART reads on one ZFS drive every 20 or so minutes which prevents s3 sleep. If you are talking about the "read SMART" entries in the syslog I do not think it is these spinning up the drives. They are done by the system just after detecting a drive that was spun down has been spun up because something accessed the drive. You need to identify what is accessing the drive to cause the spinup. The File Activity plugin might give you a clue. Quote Link to comment
foo_fighter Posted February 18 Share Posted February 18 (edited) I see this endless cycle in syslog. But you're saying that the read SMART is caused by the spin up, not the other way around?: Feb 18 06:23:32 Tower s3_sleep: All monitored HDDs are spun down Feb 18 06:23:32 Tower s3_sleep: Extra delay period running: 18 minute(s) Feb 18 06:24:32 Tower s3_sleep: All monitored HDDs are spun down Feb 18 06:24:32 Tower s3_sleep: Extra delay period running: 17 minute(s) Feb 18 06:25:32 Tower s3_sleep: All monitored HDDs are spun down Feb 18 06:25:32 Tower s3_sleep: Extra delay period running: 16 minute(s) Feb 18 06:25:34 Tower emhttpd: read SMART /dev/sdf Feb 18 06:26:32 Tower s3_sleep: Disk activity on going: sdf Feb 18 06:26:32 Tower s3_sleep: Disk activity detected. Reset timers. Feb 18 06:27:33 Tower s3_sleep: Disk activity on going: sdf Feb 18 06:27:33 Tower s3_sleep: Disk activity detected. Reset timers. Feb 18 06:28:33 Tower s3_sleep: Disk activity on going: sdf Feb 18 06:28:33 Tower s3_sleep: Disk activity detected. Reset timers. Feb 18 06:29:33 Tower s3_sleep: All monitored HDDs are spun down Feb 18 06:29:33 Tower s3_sleep: Extra delay period running: 25 minute(s) Feb 18 06:30:14 Tower emhttpd: spinning down /dev/sde Feb 18 06:30:33 Tower s3_sleep: All monitored HDDs are spun down Feb 18 06:30:33 Tower s3_sleep: Extra delay period running: 24 minute(s) Feb 18 06:31:33 Tower s3_sleep: All monitored HDDs are spun down Feb 18 06:31:33 Tower s3_sleep: Extra delay period running: 23 minute(s) Feb 18 06:31:41 Tower emhttpd: read SMART /dev/sde Feb 18 06:32:34 Tower s3_sleep: Disk activity on going: sde Feb 18 06:32:34 Tower s3_sleep: Disk activity detected. Reset timers. Feb 18 06:32:34 Tower emhttpd: read SMART /dev/sdh Feb 18 06:33:34 Tower s3_sleep: Disk activity on going: sdh Edited February 18 by foo_fighter Quote Link to comment
warwolf7 Posted February 18 Author Share Posted February 18 (edited) 1 hour ago, foo_fighter said: I see this endless cycle in syslog. But you're saying that the read SMART is caused by the spin up, not the other way around?: This is what I did read in all of the post that I did found regarding problems of hdd spinning backup. The smart read is triggered once the disk is spun up. 10 hours ago, itimpi said: If you are talking about the "read SMART" entries in the syslog I do not think it is these spinning up the drives. They are done by the system just after detecting a drive that was spun down has been spun up because something accessed the drive. You need to identify what is accessing the drive to cause the spinup. The File Activity plugin might give you a clue. I did look at file activity plugin and couldn't find what was causing this. If someone would be willing to go over each step to figure out what is going on, I'm willing to try anything. During the day, only 25% of the disk should be spinning but currently they are all spinning all the times. I plan on adding more disk, but I would like this issues to be resolve before doing so. That's a lot of wasted energy. Its not too bad in the winter time, but the extra heat from the psu and hdd in the summer is going to add more cost to the cooling. It all adds up. Edited February 18 by warwolf7 spelling error Quote Link to comment
warwolf7 Posted February 25 Author Share Posted February 25 (edited) okok, I have some info to give. Not a solution yet, but maybe someone here can help me some of my hdd are connected directly to sata port on server, most others hdd are through an lsi card Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05). They all get woke up <1min after being spun down. (so I don't believe it is due to the card) I disabled docker stopped the array changed one disk spin down delay to 15min. This happens on every disk a min or less after the spin down. so it does matter what hdd does it, therefore I decided to spin up and down a single disk I installed iotop through nerdtools and ran this command iotop -bktoqqq -d .1 TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 15:51:01 14897 be/4 root 12159.54 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:51:01 14897 be/4 root 60745.88 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:51:01 14897 be/4 root 60019.35 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:52:02 17063 be/4 root 50308.71 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:52:02 17063 be/4 root 25501.22 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:53:01 19147 be/4 root 16931.36 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:53:01 19147 be/4 root 60612.63 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:53:01 19147 be/4 root 75370.34 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:54:01 21278 be/4 root 32661.45 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:54:02 21278 be/4 root 4982.77 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:54:02 21278 be/4 root 4704.45 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:54:02 21278 be/4 root 30341.35 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:54:02 21278 be/4 root 37.56 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:54:03 21278 be/4 root 50048.81 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:55:01 23468 be/4 root 4637.08 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:55:01 23468 be/4 root 20174.36 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:55:08 23468 be/4 root 37.27 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:55:08 23468 be/4 root 40318.78 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:55:08 23468 be/4 root 10475.77 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:55:08 23468 be/4 root 7213.60 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:55:08 23468 be/4 root 52775.84 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:55:16 14867 be/4 root 640.61 K/s 0.00 K/s ?unavailable? [kworker/u12:1-loop1] 15:55:16 24055 be/4 root 1017.45 K/s 0.00 K/s ?unavailable? smbd -D ... 16:10:01 24220 be/4 root 6987.66 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:10:01 24220 be/4 root 60275.00 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:10:01 24220 be/4 root 15163.80 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:10:01 24220 be/4 root 45080.63 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:11:02 26550 be/4 root 6994.43 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:11:02 26550 be/4 root 20261.11 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:11:08 26550 be/4 root 10327.58 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:11:08 26550 be/4 root 29950.20 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:11:08 26550 be/4 root 10215.82 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:11:09 26550 be/4 root 19700.42 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:11:09 26550 be/4 root 30249.68 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:12:01 28608 be/4 root 60618.31 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 16:12:01 28608 be/4 root 30564.69 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 in unraid log I got the Feb 25 15:54:03 ChapelleSixtine emhttpd: spinning down /dev/sdd Feb 25 15:55:08 ChapelleSixtine emhttpd: read SMART /dev/sdd Feb 25 16:10:03 ChapelleSixtine emhttpd: spinning down /dev/sdd Feb 25 16:11:08 ChapelleSixtine emhttpd: read SMART /dev/sdd here is what I see btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 is the only thing accessing a disk after a spindown. The thing is, No drive in the array is btrfs. I have 2 cache nvme that are in btrfs I saw the syslog and iotop populate in realtime simultaneaously while hearing the disk being accessed. Doesn't seems like a coincidence as I heard and saw that exaclty 3 times 🤔hypothesis : The "32f96308-aa9a-4c86-b799-d222f4af4a75" is meant to be a hdd right? well, I don't have any hdd with that uuid. Could that cause the sytem to wake up all disk trying to find it? As it wakes all disks when that trace is shown using the iotop command if any disks is spun down. I used the command " lsblk -o KNAME,TYPE,SIZE,MODEL,UUID" and "blkid /dev/*" and "blkid" and "fdisk -l" and "lshw -class disk -class tape" I can't find that guid/uuid number anywhere. Any ideas on where that uuidguid would be from? Edited February 25 by warwolf7 Quote Link to comment
itimpi Posted February 25 Share Posted February 25 If you have a docker.img file that is also btrfs internally by default. I think the libvirt.img file is also btrfs. Where do you have them located? 23 minutes ago, warwolf7 said: The "32f96308-aa9a-4c86-b799-d222f4af4a75" is meant to be a hdd right? Not sure it is - I think it might be some sort of btrfs id. @JorgeB is likely to know for certain. Quote Link to comment
warwolf7 Posted February 25 Author Share Posted February 25 first, Thank you for looking at my problem. I really appreciate it a lot. Correct, the docker.img and libvirt.img are on a btrfs partition on a nvme drive. using blkid I get this (if this can help figure out the uuid question I have) /dev/sda1: LABEL_FATBOOT="UNRAID" LABEL="UNRAID" UUID="2054-2B8B" BLOCK_SIZE="512" TYPE="vfat" /dev/loop1: TYPE="squashfs" /dev/sdf1: UUID="E04F-7A11" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="^FM-XM-^@M-mM-^AM-^[^AM-oM-2M-^PM-PM-\"" PARTUUID="d8dc08b4-476d-4b27-9fa5-c0c9afbdba3a" /dev/sdf2: BLOCK_SIZE="512" UUID="01D6BAD60A80CFF0" TYPE="ntfs" PARTLABEL="M-aM-5M-^\M-iM-)M-&^E" PARTUUID="67dc4043-2e07-42f6-8950-7f35db9608ca" /dev/nvme0n1p1: UUID="4f4d49b1-cf2c-4043-8ac8-5d5a2787ecd8" UUID_SUB="8a7e058f-a0ba-4872-a5ab-89447e85079b" BLOCK_SIZE="4096" TYPE="btrfs" /dev/sdo1: LABEL="Chia_Plot6" UUID="b16cac7b-70f1-4351-8f98-9692e12ac770" BLOCK_SIZE="4096" TYPE="xfs" PARTUUID="8b453002-d47e-40e5-98cd-28c01b94d9b5" /dev/sdd1: UUID="67ff4bbe-aa12-4dd4-969b-41066e725c21" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="400c85ba-3b66-4aa8-8210-2c0aed761d1b" /dev/md2p1: UUID="84d9a213-f984-4041-9daf-969ab82fc1fb" BLOCK_SIZE="512" TYPE="xfs" /dev/sdm: LABEL="Chia_Plot8" BLOCK_SIZE="512" UUID="205BF6D701F727BE" TYPE="ntfs" /dev/sdb1: UUID="345078c8-eebd-49c2-b8dc-52c47395acc8" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="73055663-5848-4059-a745-f61416725a84" /dev/md5p1: UUID="345078c8-eebd-49c2-b8dc-52c47395acc8" BLOCK_SIZE="512" TYPE="xfs" /dev/sdk1: UUID="84d9a213-f984-4041-9daf-969ab82fc1fb" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="71ecb223-83ae-47f9-a4bb-d39af9bf0522" /dev/sdi1: UUID="7345a548-0fd5-4a77-9da0-bd7bcf468d1e" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="1f4b9897-a42c-412d-9a22-0ea803beebb5" /dev/md1p1: UUID="543c965e-17bd-4737-9c24-0714a6089066" BLOCK_SIZE="4096" TYPE="xfs" /dev/md4p1: UUID="67ff4bbe-aa12-4dd4-969b-41066e725c21" BLOCK_SIZE="512" TYPE="xfs" /dev/loop0: TYPE="squashfs" /dev/sde1: UUID="b5d6290f-015b-4569-b7e7-d8d0a2b5e491" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="204051fd-ff72-4bcb-b32f-5d80a5a9b4d5" /dev/sdn1: LABEL="Chia_Plot2" BLOCK_SIZE="512" UUID="514C8EE3245D66AC" TYPE="ntfs" PARTUUID="495769c3-478a-4607-b396-513074d28d37" /dev/sdc1: UUID="c1c8dcff-7ed7-4019-89e8-8c0e2afc9c52" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="972bd0a5-87bc-4fac-80d3-01e86e2c8a1b" /dev/sdl: LABEL="Chia_Plot7" BLOCK_SIZE="512" UUID="12EC49946D53211A" TYPE="ntfs" /dev/nvme1n1p1: UUID="6790d0c5-bc2f-499f-a9cb-5b5bf6f03d07" UUID_SUB="425c9a10-1680-4bb4-9db6-09a7457e8dbc" BLOCK_SIZE="4096" TYPE="btrfs" /dev/md28p1: UUID="b5d6290f-015b-4569-b7e7-d8d0a2b5e491" BLOCK_SIZE="512" TYPE="xfs" /dev/sdj1: UUID="543c965e-17bd-4737-9c24-0714a6089066" BLOCK_SIZE="4096" TYPE="xfs" PARTUUID="8dd33739-1207-4e57-8804-bc2e1849b1e6" /dev/md3p1: UUID="7345a548-0fd5-4a77-9da0-bd7bcf468d1e" BLOCK_SIZE="512" TYPE="xfs" /dev/md6p1: UUID="c1c8dcff-7ed7-4019-89e8-8c0e2afc9c52" BLOCK_SIZE="512" TYPE="xfs" /dev/sdp1: LABEL="Chia_Plot4" BLOCK_SIZE="512" UUID="56C15B304ACE8A26" TYPE="ntfs" PARTUUID="5fead960-cfb1-456e-8ec7-9d2cc6b2142b" /dev/loop2: UUID="37054439-aa79-4407-98df-7c5b2c5851f4" UUID_SUB="2dc8080e-b2e6-4a03-be4f-b4fa82d40d27" BLOCK_SIZE="4096" TYPE="btrfs" /dev/loop3: UUID="b4112bdf-b443-4437-8c49-732da8465892" UUID_SUB="3276b63f-b557-47ac-8808-dc5d7570b172" BLOCK_SIZE="4096" TYPE="btrfs" /dev/sdg1: PARTUUID="237b19a2-c834-40e6-9565-e37cf01140c2" /dev/sdh1: PARTUUID="528b7fd5-fb01-4dee-a41a-e61e04557662" Quote Link to comment
warwolf7 Posted February 28 Author Share Posted February 28 (edited) I tried the command "btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75" it returned : ERROR: not a valid btrfs filesystem: 32f96308-aa9a-4c86-b799-d222f4af4a75 I also ran the command immediatly after spinning down the drive and they spun right back up (although sequentially and took a bit more than a minute) I'm really thinking this is command is related to my problem. This command is also seen in every minute, which fits my problem as well. Additionnally, I had an btrfs array with two nvme in mirror mode. They got corrupted due to bad ram. (this info was not in my first post, I guess I should've added that) I replaced the ram, and changed the 1 x btrfs array from 2nvme in mirror to 2xarray of 1nvme. I formatted the 2x nvme but I'm not sure if I reused the existing array or created 2 new ones. That are the only btrfs disk that ever existed in this server. By looking in "/boot/config/pools" I found the config detail of my 2 x 1nvme array. surprise, I see this, the uuid that matches the command that I saw in iotop. Why does it says it's not a valid filesystem? Why does the second array have no diskUUID? vm-dbs-docker.cfg (modified date feb25th2024 from last reboot I guess) diskUUID="32f96308-aa9a-4c86-b799-d222f4af4a75" diskFsType="btrfs" diskFsProfile="single" diskFsWidth="0" diskFsGroups="0" diskNumMissing="0" diskCompression="off" diskAutotrim="on" diskWarning="" diskCritical="" diskComment="" diskShareEnabled="yes" diskShareFloor="" diskExport="-" diskFruit="no" diskSecurity="public" diskReadList="" diskWriteList="" diskVolsizelimit="" diskCaseSensitive="auto" diskExportNFS="-" diskExportNFSFsid="0" diskSecurityNFS="public" diskHostListNFS="" diskId="WDC_WDS100T2B0C-00PXH0_2041BA804150" diskIdSlot="-" diskSize="976761560" diskType="Cache" diskSpindownDelay="-1" diskSpinupGroup="" the second array backup_of_vm-dbs-docker.cfg (modified date feb25th2024 from last reboot I guess) diskUUID="" diskFsType="btrfs" diskFsProfile="single" diskFsWidth="0" diskFsGroups="0" diskNumMissing="0" diskCompression="off" diskAutotrim="on" diskWarning="" diskCritical="" diskComment="" diskShareEnabled="yes" diskShareFloor="0" diskExport="-" diskFruit="no" diskSecurity="public" diskReadList="" diskWriteList="" diskVolsizelimit="" diskCaseSensitive="auto" diskExportNFS="-" diskExportNFSFsid="0" diskSecurityNFS="public" diskHostListNFS="" diskId="WDC_WDS100T2B0C-00PXH0_21146K803754" diskIdSlot="-" diskSize="976761560" diskType="Cache" diskSpindownDelay="-1" diskSpinupGroup="" while in the gui, If I click on the array 1 and 2 I see those details. UUID doesn't math the info from the config !?! I'm way above my league here. I know very little about how the config files should be looking like. Any idea what is going on? Thank you Edited February 28 by warwolf7 additionnal info Quote Link to comment
Veah Posted February 28 Share Posted February 28 (edited) Check your controller (HBA) settings and turn off all your disk power save options within that LSI controller. Let Unraid manage it. If you are running some RAID array on the controller, those drives will remain spun up as Unraid will not be able to control those drives within the controller's array individually. Hope this helps. Edited February 28 by Veah Quote Link to comment
warwolf7 Posted February 28 Author Share Posted February 28 7 minutes ago, Veah said: Check your controller (HBA) settings and turn off all your disk power save options within that LSI controller. Let Unraid manage it. If you are running some RAID array on the controller, those drives will remain spun up as Unraid will not be able to control those drives within the controller's array individually. Hope this helps. Hi, thank you for your response. I will look into those settings when I can, but I can't get into the lsi card right now as I have to It would be quite weird that as setting like that got magically switched as disk were properly spin down before until end of dec/jan. The lsi card is in it mode. thus not using any hardware array. They were able to spin down. I tried to get into the card config, but ctrl+c is not working, I tried a few different settings in the bios, but I could get into the card's config. I think I had to swap that card to another computer to be able to configure it. I hope to not have to do that. I hope to continue to go down the path I'm going at the moment, if that ends up being a dead end, I'll look into that. Another point : On 2/25/2024 at 4:20 PM, warwolf7 said: some of my hdd are connected directly to sata port on server, most others hdd are through an lsi card Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05). They all get woke up <1min after being spun down. (so I don't believe it is due to the card) Quote Link to comment
Veah Posted February 28 Share Posted February 28 Hope you get it sorted out. Had symptoms just like that which turned out to be my Adaptec flavored card. Curious to know what it is once you isolate it. Quote Link to comment
warwolf7 Posted March 10 Author Share Posted March 10 On 2/28/2024 at 5:57 AM, Veah said: Hope you get it sorted out. Had symptoms just like that which turned out to be my Adaptec flavored card. Curious to know what it is once you isolate it. One thing I would like to sort out, if you have a btrfs pool, can you look in your "/boot/config/pools" folder and look for your config files and compare the uuid to the uuid in the pool by going in the "main" tab and then select "pool devices" then select your devices, finaly in the First tab, under "Scrub Status" Does your UUID match the config file? Because, currently, in my server, they dont Quote Link to comment
Veah Posted March 10 Share Posted March 10 Short answer -No, they do not match for me. Pulled this file for cache pool (attached) ... diskUUID="" Here's the scrub section: UUID: c5ef186b-86ad-43c8-a9f8-878617df410d no stats available Total to scrub: 90.33GiB Rate: 0.00B/s Error summary: no errors found main_cache.cfg Quote Link to comment
warwolf7 Posted March 16 Author Share Posted March 16 On 3/9/2024 at 10:03 PM, Veah said: Short answer -No, they do not match for me. Pulled this file for cache pool (attached) ... diskUUID="" Here's the scrub section: UUID: c5ef186b-86ad-43c8-a9f8-878617df410d no stats available Total to scrub: 90.33GiB Rate: 0.00B/s Error summary: no errors found main_cache.cfg 628 B · 1 download hmmm... Ok, so I guess that it is a dead end. I changed aggressive LPM support to enable in the bios, That didn't changed anything. I then, reset the motherboard bios to default settings. That didn't change anything. and the iotop output I was getting, I noticed it is happening every minute. But when the hdd is spinned down, the output is delayed 15second, which is the time it takes for the hdd to spin up. I looked at the little light in the supermicro tray of that drive and it light up at exactly hh:mm:01 which is the time the btrfs filesystem show happens. 15:03:02 19475 be/4 root 13779.76 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 65614.34 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 15609.48 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 15784.98 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 14842.28 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 73454.09 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 196640.07 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 132946.68 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 201293.48 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 66717.12 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 3187.30 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 66092.91 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 69398.06 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 62686.95 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 167341.36 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 33581.05 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:02 19475 be/4 root 129263.85 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:03:32 6167 be/4 root 0.00 K/s 766.04 K/s ?unavailable? [kworker/u12:0-flush-8:0] 15:04:01 22035 be/4 root 13813.99 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:01 22035 be/4 root 65362.57 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 15383.88 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 15744.94 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 14884.63 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 68517.50 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 3746.03 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 197760.15 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 987.48 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 197501.69 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:02 22035 be/4 root 63786.35 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:15 22035 be/4 root 33670.51 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:15 22035 be/4 root 32075.12 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:15 22035 be/4 root 66182.80 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:15 22035 be/4 root 33284.72 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:16 22035 be/4 root 99792.56 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:16 22035 be/4 root 4333.78 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:16 22035 be/4 root 224846.02 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:16 22035 be/4 root 164364.76 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 15:04:16 22035 be/4 root 68529.48 K/s 0.00 K/s ?unavailable? btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 why would it be spin up at hh:mm:01 if it was spin down at hh:mm:38? Mar 16 15:03:38 ChapelleSixtine emhttpd: spinning down /dev/sdq Mar 16 15:04:15 ChapelleSixtine emhttpd: read SMART /dev/sdq To me, I feel like the btrfs filesystem command triggered by I don't know what, is problematic. I did found this in the last unassigned plugin change log. I will update and see >Fix: Zpool operation that kept disks from spinning down. that didn't change anything, while I don't have zfs, it was worth a try. by running pstree command, I fell on this ├─crond───sh───monitor───sh─┬─btrfs │ └─grep From this I understand that there is a cron job running every minute. I then tried to find where a cron job could actually been trigged and wondered if that was the problem $nano /etc/crond.d/root In that file, I did find a few entries that were scheduled every minutes. And decided to make a copy of that file and modify them to happen once every 10min. This way it would give me time to call them myself and see if any of them actually wakes the hdd. But by changing the file, it did not change the frequency at which the btrfs filesystem output appears. They changed back automatically. So I removed all the dynamix plugin and restarted the server. The dynamix entry was still there in the cron job file. However, there were a lot less ouput from iotop at every minute. I looked into the folder /usr/local/emhttp/plugins/dynamix/scripts/monitor, but why is there a dynamix folder still there even after I uninstalled the plugin? I see some btrfs_balance and check and scrub script... So anyway, I commented out that entry for the /etc/cron.d/root file. rebooted the server but then, it reappared. Are those systems plugin that we cant uninstall? gotta end todays test, no more time Other options not explored yet - downgrade unraid back to an earlier version - change my lsi card to another one (:-( I don't have another one). Maybe I could simply remove it and see what happens and keep the hdd connected directly to the mobo through sata. Not starting the array, that for sure. See if it still happens. That would rule out the lsi card. - create a new config in another computer to see. - uninstall all plugins one by one to see if it solves my problem Quote Link to comment
itimpi Posted March 16 Share Posted March 16 The monitor task has been a standard part of Unraid for many years and is needed for things to work correctly. Quote Link to comment
Veah Posted March 17 Share Posted March 17 I did not notice if you tried disabling all the power saving options in the lsi card. If you get to the point of going through the cost and trouble of replacing, highly recommend turning off those settings first; let unraid manage that. Done kicking that horse now. Best of luck. Quote Link to comment
warwolf7 Posted March 18 Author Share Posted March 18 6 hours ago, Veah said: I did not notice if you tried disabling all the power saving options in the lsi card. If you get to the point of going through the cost and trouble of replacing, highly recommend turning off those settings first; let unraid manage that. Done kicking that horse now. Best of luck. Hi, Thank you for your help. I completely removed the lsi card. The disk spins down after 15min, and then spins right back up exactly like before. The card does not seems to be the source of the problem here. I'm going to try safemode with as much as possible disable before downgrading unraid. Quote Link to comment
warwolf7 Posted March 18 Author Share Posted March 18 (edited) unraid safemode, no plugins no gui.. I couldn't run the "iotop -bktoqqq -d .1" command, but the syslog shows the same behavior, no matter at what second the hdd gets spinned down, it's always spinned back up at the exact same second in all the test in this thread and the read smart that happens after appears at the same seconds, it always hh:mm:08 or 09. Even if the disk was spun down at hh:mm:03 or hh:mm:41, the read smart is always at the exact same time. To me it sound like something happening on a strick schedule is triggering this. I confirmed that the cron schedule was active by using the pstree command and this output still appears at every minute ├─crond───sh───monitor───sh─┬─btrfs │ └─grep I did another interesting test. I could see in the earlier test that I would see the command "btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75" being sent every minute, which is originating from dynamix monitor plugin. As soon As I saw the spinning down /dev/sdc entry in syslog, I sent the same command ( dynamix monitor was going to send it anyway 50sec later) Tadaaa, it spun the disk right back up before returning the response to the command. #disk is spun down after 15minutes SYSLOG : Mar 17 21:53:03 ChapelleSixtine emhttpd: spinning down /dev/sdc #I immediatly send the command btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75 # I can hear the disk spin up. And I get this output only after the disk is completely spun up ERROR: not a valid btrfs filesystem: 32f96308-aa9a-4c86-b799-d222f4af4a75 #SYSLOG shows this right after #Usually this would have happened at 21:54:08 Mar 17 21:53:28 ChapelleSixtine emhttpd: read SMART /dev/sdc I was able to wake the disk by the same command that the dynamix monitor script is sending. Also that command always returns an error. (the only ever existing devices in btrfs in my server are the nvme and they were always accessible during all of those tests) So here I am trying to understand the code in this file "/usr/local/emhttp/plugins/dynamix/scripts/monitor" First thing I see, is the line leading to "/var/local/emhttp/disk.ini". In that file I notice that 2 pool that each contains a nvme. One of them has an uuid (the first pool that I created), but the second pool doesn't have a uuid. That UUID is "32f96308-aa9a-4c86-b799-d222f4af4a75". Why one would have a UUID, and not the other one. Also no other disks have that value, I guess that's a btrfs or a Pool specificity. back to monitor script This line is calling the uuid if (exec("/sbin/btrfs filesystem show "._var($disk,'uuid')." 2>/dev/null|grep -c 'missing'")>0) { But is only calling it if there is a uuid in the disk.ini // check file system of cache pool $item = 'pool'; if (in_array($name,$pools) && strpos(_var($disk,'fsType'),'btrfs')!==false && _var($disk,'uuid')!=="") { $attr = 'missing'; if (exec("/sbin/btrfs filesystem show "._var($disk,'uuid')." 2>/dev/null|grep -c 'missing'")>0) { if (empty($saved[$item][$attr])) { exec("$notify -l '/Main' -e ".escapeshellarg("Unraid $text message")." -s ".escapeshellarg("Warning [$server] - Cache pool BTRFS missing device(s)")." -d ".escapeshellarg("$info")." -i \"warning\" 2>/dev/null"); $saved[$item][$attr] = 1; } } elseif (isset($saved[$item][$attr])) unset($saved[$item][$attr]); $attr = "profile-$name"; if (exec("/sbin/btrfs filesystem df /mnt/$name 2>/dev/null|grep -c '^Data'")>1) { if (empty($saved[$item][$attr])) { exec("$notify -l '/Main' -e ".escapeshellarg("Unraid $text message")." -s ".escapeshellarg("Warning [$server] - $pool pool BTRFS too many profiles (You can ignore this warning when a pool balance operation is in progress)")." -d ".escapeshellarg("$info")." -i \"warning\" 2>/dev/null"); $saved[$item][$attr] = 1; } } elseif (isset($saved[$item][$attr])) unset($saved[$item][$attr]); } } So, here is I believe where this leads to . If I run the command btrfs filesystem show $btrfs filesystem show Label: none uuid: 4f4d49b1-cf2c-4043-8ac8-5d5a2787ecd8 Total devices 1 FS bytes used 262.94GiB devid 1 size 931.51GiB used 286.02GiB path /dev/nvme0n1p1 Label: none uuid: 6790d0c5-bc2f-499f-a9cb-5b5bf6f03d07 Total devices 1 FS bytes used 349.50GiB devid 1 size 931.51GiB used 372.02GiB path /dev/nvme1n1p1 Label: none uuid: 37054439-aa79-4407-98df-7c5b2c5851f4 Total devices 1 FS bytes used 4.17GiB devid 1 size 20.00GiB used 6.52GiB path /dev/loop2 Label: none uuid: b4112bdf-b443-4437-8c49-732da8465892 Total devices 1 FS bytes used 1.28MiB devid 1 size 1.00GiB used 126.38MiB path /dev/loop3 Those are two valide uuid. The uuid in disks.ini is not show here ("32f96308-aa9a-4c86-b799-d222f4af4a75".) I conclude that there is a bug within unraid somewhere that doesnt properly register the uudi in the disks.ini. Can someone tell me which file I should modify to correct this problem? Should I just remove the uuid in the "/boot/config/pools/" within the file that contains a wrong uuid? This is what I'm going to try right now. I'll see what happens Edited March 18 by warwolf7 added btrfs filesystem show command while array running Quote Link to comment
Solution warwolf7 Posted March 18 Author Solution Share Posted March 18 (edited) SOLVED : CONCLUSION : my feb25th hypothesis was correct The command "btrfs filesystem show 32f96308-aa9a-4c86-b799-d222f4af4a75" was causing all disks to spinup because it couldn't find the UUID requested. That command was issued by /usr/local/emhttp/plugins/dynamix/scripts/monitor that took that information from the "/var/local/emhttp/disks.ini" file that probably gets populated from the files in this directory /boot/config/ The dynamix monitor is called every minute from a cron job in "/etc/cron.d/root" The UUID in the config file in /boot/config/pools/ are left empty, why? That would be my question. In my case, my older pool config had an uuid that became invalid (or always have been I don't know) when the array failed due to bad ram and I reformatted 1 of the 2mirror drive in that pool and removed the other drive. That UUID was left in the config file of the pool. When I removed that uuid, the monitor script stopped waking all disk every minutes. It is my understanding that unraid created that problem somewhere during one of those manipulation because I have never played with any of those files prior to this investigation 🔍 Proposition 1 to unraid Look at what happens when you create a pool array with 2 devices. remove one device, reformat the other one and look at the UUID in the config file. (I can't recreate that, I don't have unused disk right now) Why is there not UUID in the btrfs pool config file? Why did I have one before but not my new one. Was there a changes in the code from previous versions? Also look for other cases that can create that type of wrong UUID, there might be more that just mine. Proposition 2 to unraid The monitor script also fails to recognized the error and send a notification regarding that error. if (exec("/sbin/btrfs filesystem show "._var($disk,'uuid')." 2>/dev/null|grep -c 'missing'")>0) { we can see here that it only looks for 'missing' but when a filesystem is not found, it actually return ERROR: ERROR: not a valid btrfs filesystem: 6790d0c5-bc2f-499f-a9cb-5b5bf6f03d0z This should be changed to 'ERROR' But since this script execute every minute, and the next section creates a warning on an error notification. it could potentially flood an unraid admin with warning email. Which could be too much. (I wish I could create a pullrequest, but I couldn't find the repo on github, the only one I found was 8years old and the code did not match the one in unraid current release) However, proposition1 has to be implemented first, otherwise, a lot of users might get wrong warnings. notes: @foo_fighter do you use the unassigned plugin? if so, I suggest you update it. This is in the changelog >Fix: Zpool operation that kept disks from spinning down. Thank you Thank you everyone that took a look at my thread and for help and guidance along my route. finally Big hurray to me, I have very very basic knowledge of any of this. So congrats to me for finding it. It can potentially affect a lot of users. Please unraid acknowledge in this thread that something will be done. Edited April 15 by warwolf7 added information 1 Quote Link to comment
foo_fighter Posted March 24 Share Posted March 24 Yes, I do use the unassigned devices plugin. I'll update it and see if that helps. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.