Addy90

Members
  • Posts

    37
  • Joined

  • Last visited

Everything posted by Addy90

  1. Yes, definitely. I believe Limetech has loads of automated test patterns for all edge cases they know to be sure about changes that can cause data loss. Of course this is nothing one can implement quickly, but I think as it is possible to pause and continue a parity check already during operation, it should also be safe after reboot. Especially because a lot of conditions are checked after reboot if the array and all disks are available and there were no changes to the array. But it is not for me to assume too much here, it is just a Feature Request thread and I wanted to ask for something that I would love to see and that I think is not impossible to do (much easier than double parity or multi-streaming and other features we have seen recently). It is just a fair feature request in my opinion, and some suggestions about possible implementations because you made me to write about it. The rest is hope and patience, from which I have both. I prefer a solid tested feature than any other feature, like most of us here do; this is one reason for us to use Unraid, because it works. Don't agree?
  2. Ah, yes, of course it is nearly impossible for plugin development! I know you have it on the Wish List for your great Parity Check Tuner plugin! But it is exactly as you describe it there (copy from your Parity Check Tuner page) The offset ability has to be implemented by Limetech! This is why I ask in the Unraid Feature Request thread and not in your plugin thread, because I know that it is not possible for you and that you need that feature. But I think for Limetech it is not very difficult, because this is what I think: There is a function md_do_sync() with a variable mddev->curr_resync in the md daemon that holds the position of the current parity check run. This variable is incremented (mddev->curr_resync += sectors) within a while loop until the maximum sector (mddev->recovery_running) is reached, which marks the end of the parity check operation. Parity check can already resumed by giving options to check_array() function which revives the mddev->recovery_thread that continues with the current mddev->curr_resync position held in memory. I think it is easily possible to read the content of this variable (and others if needed, maybe with a checksum, just to be sure no modification happened). The status_md() function prints out the content of this sync-position variable, for example (seq_printf(seq, "mdResyncPos=%llu\n", mddev->curr_resync/2);). Maybe it is already possible to get the contents of the needed variables with this function. So when you can get the content of this variable, the check_array() function only needs an additional parameter for resuming operation at a specific position. It then sets the mddev->curr_resync, wakes up the sync-thread, and operation continues where it left off after reboot! This parameter can be read from a file, controlled by the checksum and the mdcmd tool can be improved by accepting this value for parity check via command line. Everything should be possible via mdcdm calls via UI, the functions permit the setting of the current position and the UI can save the current value to a file (with checksum just to be sure) and read it back after reboot and only delete the position file in case the operation is running and write it when it was paused or array shut down manually, so that when there is an unclean shutdown or the check ended successfully, the file is not there and next start check if there is a file and if not, starts from the beginning. I think, for Limetech, it is not very difficult to permit this operation. I would LOVE to see it Would made mine and some other days, also @itimpi's days brighter PS: I don't mind pausing and resuming parity sync over multiple reboots. It is as easy as just shut down the array, file with position is saved, and after reboot, parity check tells in UI that the last check was not finished and can be resumed (or ask to start from beginning by checkmark, which calls mdcmd without position argument). Not difficult for the user, but very handy for the use-case! PPS: As parity check is also used for rebuilding disks as far as I understood, it should either be possible to resume rebuild after reboot, but in this case, I would certainly warn about that to do or gray out the option to shut down, but it should be possible technically. In case of unclean shutdown, we have to restart either way.
  3. What I would find really great would be the possibility to pause parity sync over multiple reboots! I have two reasons for this: First is: In my situation, I am living in a flat and normally I have all doors open during night for better air support and because I have no radiator in the sleeping room, so I get my heat from other rooms. Unfortunately, parity sync takes longer than 24h and this is the only time in month I have to close doors because of the noise, and then it gets cold, too. Second reason is that normally I am only syncing backups from my other systems several times a week up to once a day, or I need some older data stored on the array that is not on my live systems some times in month, means that my array is shut down most of the day and only online for some hours, maybe half a day, because of noise and energy costs. I could use that time when I need it sparely, but ready, to continue parity sync during that time. This saves a lot of energy, because the only time during month when it is running for more than 24h, even during night when I don't access it, is for parity sync. So what I would appreciate is that the parity sync operation writes some stat file when paused explicitely and/or clean system shutdown to flash drive and that it gives the possibility to continue where it left off before (or restart from beginning, for example with an additional check-mark). Like this, parity sync could be paused and resumed whenever the system has not to be shut down for whatever reason. I don't think this is difficult because parity sync operation knows always where it is and I know there is parity check tuning plugin that lets run parity sync in intervals, but it does not survive reboot, which is critical for my use case. I hope this can be possible! Thank you! BTW this was already requested, but maybe the importance was not given as much as in my text:
  4. Replace it with a disk at least as large as the current disk. So you need a 6 TB with no less sectors than the current one, or you use a bigger disk and are able to upgrade / replace another data disk later with a bigger one, too.
  5. Errors there are read errors that were corrected on the fly, so at that moment, no files are damaged. Your Parity sdg disk definitely has read errors - critical medium error: Mar 1 09:15:52 Jewel kernel: sd 7:0:5:0: [sdg] tag#158 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Mar 1 09:15:52 Jewel kernel: sd 7:0:5:0: [sdg] tag#158 Sense Key : 0x3 [current] [descriptor] Mar 1 09:15:52 Jewel kernel: sd 7:0:5:0: [sdg] tag#158 ASC=0x11 ASCQ=0x0 Mar 1 09:15:52 Jewel kernel: sd 7:0:5:0: [sdg] tag#158 CDB: opcode=0x88 88 00 00 00 00 01 d1 17 f7 38 00 00 01 d0 00 00 Mar 1 09:15:52 Jewel kernel: blk_update_request: critical medium error, dev sdg, sector 7802976056 op 0x0:(READ) flags 0x0 phys_seg 58 prio class 0 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802975992 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976000 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976008 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976016 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976024 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976032 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976040 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976048 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976056 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976064 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976072 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976080 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976088 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976096 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976104 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976112 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976120 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976128 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976136 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976144 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976152 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976160 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976168 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976176 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976184 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976192 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976200 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976208 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976216 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976224 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976232 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976240 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976248 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976256 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976264 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976272 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976280 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976288 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976296 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976304 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976312 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976320 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976328 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976336 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976344 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976352 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976360 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976368 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976376 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976384 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976392 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976400 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976408 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976416 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976424 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976432 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976440 Mar 1 09:15:52 Jewel kernel: md: disk0 read error, sector=7802976448 If you want, make a second Parity sync. Check if more errors come up after the run. If yes, it can be the cable or the disk. For me it looks like the disk because of Pending sectors, no cable should cause this. Be extra sure that there are not Party sync errors but only read errors on the disk. Parity sync errors means data can not be rebuild correctly, luckily your Parity disk does not contain data, but if another disk failes, you could end up with lost data because the Parity disk can not rebuild from the error sectors. This means, the corrected read errors are rebuild from the remaining disks on the fly. If another disk failes, these read errors will result in data loss as you only have one parity. If you want to be extra sure, replace cable and do another Parity sync (or replace cable before the second Parity sync). If there are any more read errors and it is not the cable: Replace disk. Nothing is worse than data corruption because a failing disk was there for too long. Really, replace it. You cannot rebuild your data from a faulty parity disk. I had this too, replaced my disk as it always showed errors during each Parity sync and now the errors are gone. Never had a faulty cable I have to admit...
  6. I would definitely recommend to preclear both disks first, then when this is finished, you add the disk you want to add, then check if everything is up and running and THEN you replace your other disk and let it rebuild from parity. The adding of the disk should only take a minute or so when it is precleared, but the replacement is what takes time because of the parity rebuild. This is why I would definitely do it in this order, then the downtime is at minimum, because during preclear, everything is up and running and during rebuild of replacement disk, the content is emulated and up and running. Do not skip preclear of both disks for one run. Nothing is more painful than adding a disk or rebuilding a disk that crashes within hours because it was not tested first.
  7. When I built my Unraid at home, I had a bunch of WD Green drives, they are currently all in my array and so far, I had two defects but could always rebuild the data. They never were dropped, they just had read errors that were corrected on the fly. But of course, I got rid of such disks when the errors started showing. I also have a Purple disk that was previously used for Video Editing of multiple HD streams (works great for this, but was replaced with a big SSD now). Works very fine, though the random access is a bit slower than with Red drives, this is because these disks are made for parallel streams, not for single access patterns like a RAID does, so they are slower than RED with normal RAID use. The rest is all Red drives since I built the array and bought more disks, and they behave fine, too. Then I have an old Samsung and Hitachi disk, too. I once had two Seagate disks, but both got reallocated sectors, on one disk so much that it started to corrupt files, so I got rid of them. As cache, I have two Samsung SSDs as RAID1 BTRFS. I can say, it works! Of course, disk speed of old disks is slower than of new disks, but it was cheaper not to replace all the old disks. So currently, Green, Purple, Red, works all! But of course, as @Michael_P says, Red drives are made for RAID with TLER time-limited error recovery. Purple has TLER, too, btw. Some older benchmark: https://www.hwcooling.net/en/test-of-six-hhds-from-wd-which-colour-is-for-you/ But the principle is comparable. Though I have a Gold drive in my desktop, too, and it is not as fast as this benchmark tells with random access... who knows. Nevertheless, the Gold drive is a nice single disk.
  8. You can fake this over File Integrity plugin. There you get txt files with each file from each drive with a hash sum. Your plus here is that when you have drive failure and rebuild a drive, you can check the rebuilt files for validity with the previously exported hash sums. You can also automate the process of calculating hash sums of new files and export them as txt file. It is no tree, but it has each file in it. If not automatically, you export it manually (I do it manually, because automatically drastically drops write speed). Not perfect, but maybe good enough for you?
  9. I did not read the whole article, but when you seek for an enterprise solution, use something like CEPH. https://ceph.io/ I am using Unraid in my home setup as it is an incredible flexible and cheap solution for backup data and archiving. Nothing is as flexible and cheap in setup in my opinion. At work, I setup a CEPH cluster, here you can have hundreds of drives on dozens of servers with failover and parity over multiple servers. You can reboot parts of the cluster for updates with no down time and have virtual machines use CEPH as underlying network storage, for example with Proxmox (built-in). If you need additional backups, use a second technology like GlusterFS that also can run over multiple disks on multiple servers. If there is a problem with CEPH, your second storage on GlusterFS is not affected. Never confuse backups with failover! And then you have your scalable, enterprise ready system with multiple parity disks > 2, many data disks > 28, server and even location failover and backup. Nothing of this can be given by Unraid and Unraid should not be designed for this, as it would complicate the setup for every home and small business user - and it would compete with CEPH and GlusterFS, but for no reason as these systems already solve the problems you seem to seek for. CEPH also supports multiple pools with different redundancy settings, different synchronization strategies, distinct SSD and HDD pools on same infrastructure for high performance and high storage pools. You can choose which disks you want in which pool, you can choose the parity calculation, and more... you can use CEPH as block storage and file storage with CephFS. CEPH also has integrity verification built in (scrubbing). And and and... everything enterprise ready storage needs! I am sure it is possible to run SMB sharing over CephFS pools, though I am not sure about SMB failover, so maybe the gateway running SMB might have downtimes on reboot. Maybe SMB can be made high availability via keepalived (VRRP) and conntrackd, then the SMB shares would be failover, too, like the whole storage system. I don't want to make advertising for these technologies. I am not involved in any development of them, but I use them all where applicable. Unraid at home, CEPH for high performance, high availability, high scalable enterprise VM/Container storage at work, GlusterFS for backups of CEPH VMs/Containers at work. keepalived for failover of some services (but not SMB), conntrackd on VyOS firewalls for connection failover in case of reboots (but not with SMB). I just want to say, there are solutions out there! @shEiD Use the right tool for the right task. My opinion! Learn Linux administration, all the tools I mentioned are free of charge, Unraid costs money, CEPH not; but professional support can be bought. Proxmox has a GUI for CEPH management. CEPH also has an own Dashboard if you want to use CEPH standalone, but for advanced management, you need to learn Linux administration. For performance, see this: https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark PS: yes the thread is old, but for someone returning and being unhappy with Unraids limits (for example like @Lev), I would like to have some solutions written here to be sure nobody is unhappy with Unraid not knowing there is another tool for the task. Unraid is great, but I cannot repeat it enough: Use the right tool for the right task. If you have a hammer, everything looks like a nail. But screws need a screwdriver.
  10. I meant the client timeout: https://linux.die.net/man/5/dhclient.conf, not the router lease time. I thought the client timeout was set to 10 seconds in Unraid during boot, but I cannot confirm now as I have set to static. Because yes, I have a bonded interface. Now with DHCP enabled on bonded interface, the 10 seconds client timeout were too short for obtaining an IP address. As you say, it is because the interfaces are shortly put down since 6.8.0, this makes totally sense now why I could not get an IP address. In my case, I did 2. and will wait for 3. 😄 For the original author of this topic, I suspected that as the network connection is lost after some time, it looks like a dhcp client timeout like I had. Maybe it is something else then as you say this issue is only while initializing the network interfaces. This sounds like this should not happen on renewal, but I cannot confirm, maybe the author knows about this... Thanks @boniel for clarification!
  11. Are you using DHCP or do you have a static IP for your Unraid system? I encountered issues with DHCP, I don't know why but sometimes the server did not get an IP address and had no network connectivity after booting. As DHCP leases are renewed after some time (normally one day), it could be possible that the release sometimes does not work and the network stops as the Unraid system has no valid IP anymore. I suspect the timeout for the DHCP in Unraid is too low, but I did not investigate but set a static IP (had a static DHCP lease anyway) and I never had any network connectivity drops since then anymore. By the way, this started with 6.8.0 and was never before with any older version, my DHCP server of course did not change.
  12. This would be a very good feature. I normally shut down my array when I don't need it, especially to safe power. To be able to shutdown and (in best case automatically) continue the parity check on the next boot until it is finished would be best. Also it would be best to be able to automatically start parity checks that were scheduled but not started due to being shut down. So Unraid just should be able to check if a scheduled check has to be run, or if a current session is still unfinished, after it is started. When the run is finished, next parity check should be started after schedule was reached, either directly when it is online or when it is booted next time and did not start before, and of course continue until it is finished after every reboot. Only exception: Manual intervention. Manual stop should stop of course, until next scheduled check. That would be the perfect solution. Would not bother the people running it 24/7 but would be most comfortable for everyone else.
  13. @limetech Turning off hard link support indeed increases listening performance by a factor of more than ten as far as I can sense. My backup listing now took around two minutes instead of half an hour with hard link support enabled. This is much better! I will leave it disabled, I never used hard links in my Unraid storage anyway. Thank you for this suggestion!
  14. @limetech What happens when you compare a recursive listing of folders with hundreds of subfolders with each hundreds of files between 6.8.0 and 6.7.2. In my case, I am listing backup directories with thousands of directories and several hundred thousands of files and this is significantly slower by factors than on 6.7.2. Right now, while I was sending new backups to my Unraid and wanted to browse a folder, everything hung up. Transfer stopped for minutes, Unraid had mover running during transfer and browsing via explorer did timeout. Shouldn't the new multi-stream IO handle such cases? Might the new multi-stream IO be the culprit? Can we disable it and test the legacy behaviour with current Unraid to compare? It is difficult to tell you exactly what is going wrong here because I have no direct comparison, but something is definitely not right with the performance like never before...
  15. I can confirm the problem. In my case, I have large Backup directories that are synced with a 3rd-Party Software between my computer and my Unraid system. With 6.7.2, recursive directory listing via SMB of thousands of subdirectories with tens of thousands of files took seconds to minutes, now it takes minutes to hours with 6.8.0 stable. Looks like it is already very slow when you make a right click -> settings on a dictionary when Windows starts to sum up the size of all files inside the dictionary. This is definitely much slower than before, you can nearly see it sum up file by file. It also looks like the data transfer is much more than before. I have constant 20-30 kb/s transfer while listing directories. When I Wireshark the SMB2 packets, they take quite some time inbetween Requests and Responses. BTW: I have the Cache dirs plugin active, but it does not seem to have any impact whether enabled or disabled.
  16. Thank you very much, that is very nice from you!
  17. I am not sure if this was asked already, but I would like to suggest to be able to disable logging and history completely. The idea is that when you have a fully encrypted array with your business and private documents (e.g. invoices, customer data like works, ...) then your metadata and files are safe on the encrypted array against theft or drive failure warranty processes, but when you use unbalance to spill or gather the mentioned data, the metadata of the files is written on the unprotected usb flash drive within the history and the log file, thus leaking business and personal data to an unencrypted area. It would be much safer to be able to disable the log and history or hold it completely in memory until reboot but not write it towards the flash drive. Currently, manual deletion is the only way to get rid of these traces.
  18. Yes this will be corrected in the next release. Alright, good enough for me Thank you!
  19. My unRAID just started automatically though my parity disks were missing which killed my parity information that needs to be rebuild now. The automatic mounting of the disks only should be done when everything is fine, nothing is missing, nothing changed it's IDs, shouldn't it?
  20. Thank you boniel - tried your script! But still no valid SMART reports. Only the flash usb-drive is called (so a txt file only exists for the flash drive), but it gives the following output: smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.6-unRAID] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org /dev/sda: Unknown device type '3ware' =======> VALID ARGUMENTS ARE: ata, scsi, sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test <======= Use smartctl -h to get a usage summary that is understandable, because the flash usb drive surely has no SMART - in the WebUI, it does not have. But the other disks are not added, no txt files for any other disk... Nevertheless, I am not sure if the SMART values have anything to do with the problem that the entire(!) filesystem stops responding. No disk is accessible anymore. Every Terminal, WebUI, Read/Write Command freezes when accessing a disk. Even when I copy files between two quite new disks without any SMART errors, everything stops responding. All disks have been passed in SMART. There have been the one or other problems in the past, but the disks did not change, only the unRAID system. The problem is, that as soon as the system freezes while copying or doing a write and read at the same time (so one does not need to copy, it is enough that I read a file from one disk and write to another one at the same time!), the file that is been written to is immediatly corrupt as nothing gets written anymore. As seen in the screenshot. File one is copied from disk 1, File 2 is copied to disk 2 (in this case old disks, never had any problem). Started only some seconds one after the other, system freezed until I push reset. The SMART values of the three disks involved in this test are appended. Disk 1 is the Hitachi, Disk 2 the Samsung and Parity the WD RED. But as I already said, it does not matter which disks are involved (well parity obviously always is, but I do not have this problem when I only write and do not read at the same time from another disk - so it cannot be a disk problem as it only happens when at least two disks are involved with simultaneously writing AND reading. Parity check with 15 disks reading at the same time is no problem, works flawless until the end - so it cannot be a power or disk or cabling problem as the parity check would not work flawless for a whole day. There must be something wrong with the software. Original report here: https://lime-technology.com/forum/index.php?topic=47875.msg460521#new UPDATE: For a test I went back to a clean 6.1.9 install - everything works again. I can write at the same time and read, which always lead to the freeze in 6.2-beta and everything works. No hanging at all. I will try to update to 6.2-beta again with a clean install. So 6.1.9 works(!) flawless with my setup. It MUST be a software problem. UPDATE 2: Fresh clean 6.2-beta seems to work. So it must have been something with the configuration. I will reconfigure now and try to find out if I can reproduce. In any case I will search for the differences between the new and the old config to find out what caused this issue. UPDATE 3: 6.2-beta freezed again. Investigating now. I only changed the time zone yet. But it still does not work after putting it back - perhaps my previous test was too fast... Deactivated everything not absolutely necessary, like Docker, VMs, Network Bonding/Bridge - still freezes with the exactly same symptoms in the same procedure. And as usually nothing to see in syslog. UPDATE 4: Back to 6.1.9 which again works flawless. I will skip 6.2-beta and with a bit bad luck I have to skip every new release when it is something deep (like in kernel or parity write). Seems I have to stay on 6.1.9 from now on. As I always can reproduce the problem with 6.2 and have it working with 6.1 it is a software problem of 6.2. I have no idea which problem it could be as no log files are created about it. I had only 1 parity disk and no cache drives. Alltogether 15 active drives with parity included. smart.zip
  21. I updated my post with the following information: UPDATE: There seem to be some misunderstandings: The logfile was created while the system was hanging for 4 minutes already. So the error happened before the diagnostics were created. That was a requirement for creating an error post in this thread, to generate the diagnostics after the system error happened and before the system was rebooted. Indeed there are NO signs of the error at all in the logfile. I even started samba-logging, but it just stops logging and no error is visible. In syslog there also is nothing to see, but the disks all are unresponsive. I was asked for SMART values. One can see in the logfiles that I set the SMART configuration correctly for my controller card. I can see all SMART values in the WebUI, can do tests and see all raw values. When diagnostics do not export the values though I configured the SMART correctly for every disk, this is obviously a bug in the diagnostics software that does not honor the SMART configuration. Nevertheless, all disks are running and have a green thumb up.
  22. Since 6.2 I have a very annoying bug that leads to unresponsiveness of unRAID in a dimension that I have to do a reset of the computer. No normal shutdown is possible anymore, not at all. unRAID OS Version: 6.2.0-beta20 Description: Filesystem hangs completely. Local file operations also are not possible anymore. Nothing that writes or reads files on the disks does work anymore. Shutdown impossible, WebGUI still responsive until one uses a command that writes or reads a disk, like clicking on a FileBrowser Icon. Only hard reset possible. SSH login still works, but any shutdown commands that try to sync or unmount the filesystem immediately hang. How to reproduce: Safe mode used! No plugins loaded. Copy a file from one disk to another (not same disk, different disks!). The problem ONLY occurs, when reading and writing appear on the same time. Parity Check works so far. Best results with copying a file via network (like SMB) - it takes about 10 seconds until the throughput drops to zero and the filesystems hang. So at the beginning, the copying works, but after about 10 seconds it stops completely. What worked so far was copying via SSH from one disk to another - but when you access another file from a third disk (for example), after about 10 seconds the filesystem start to hang - so I tried to copy a big file between two disks via SSH and read another big file from a third disk - hangs. Or copy via network between two disks hangs. Expected results: No hanging at all. Actual results: Hangs until hard reset and reboot. Other information: Logfiles added. The problem did not exist in 6.1 as long as I remember. Reading one file at a time or writing one file at a time works flawless. UPDATE: There seem to be some misunderstandings: The logfile was created while the system was hanging for 4 minutes already. So the error happened before the diagnostics were created. That was a requirement for creating an error post in this thread, to generate the diagnostics after the system error happened and before the system was rebooted. Indeed there are NO signs of the error at all in the logfile. I even started samba-logging, but it just stops logging and no error is visible. In syslog there also is nothing to see, but the disks all are unresponsive. I was asked for SMART values. One can see in the logfiles that I set the SMART configuration correctly for my controller card. I can see all SMART values in the WebUI, can do tests and see all raw values. When diagnostics do not export the values though I configured the SMART correctly for every disk, this is obviously a bug in the diagnostics software that does not honor the SMART configuration. Nevertheless, all disks are running and have a green thumb up. More updates here: https://lime-technology.com/forum/index.php?topic=47875.msg460756#msg460756 tower-diagnostics-20160331-2347.zip
  23. Hi boniel, thanks for your great work - we nearly have it! What is working now are all ports, so this is the first time I could show the SMART values of all disks that are attached to my RAID controller - very great! What also is working - beside the SMART details - are the self-tests! I can start them and see the result - worked on all disks (did a short test on all disks). Also great! One very last thing: See attached screenshot - the place at "Last SMART test result" still shows "disk spun down" instead of the last smart test result which obviously can be seen correctly in the history. (The other details below are capped but all valid!) So perhaps "disk spun down" should only be shown when the disk really is spun down and not when the state is "unknown". So same case as with the self-test buttons that all work now So except of this, everything else is working very perfect now - I have to say a big thanks!!! A really big one
  24. Wow okay up to 128 is kinda much, but who knows if and when unRAID will support more than 25 disks - so it should be good to go that high just for sure. At least when using a RAID-controller, state "unknown" could be added. It did work when the disks were plugged in the mainboard, but does not work with the controller card - so when using a controller card, "unknown" could be ignored at least for the self-tests - guess that should not be a problem for being a little bit incorrect here
  25. Hi boniel, okay, I guess setting it up to 31 should solve the problem - I guess no RAID card would have higher settings... Btw: Your argument for the user-friendly approach with the port numbers is very fine! Well the hdparm command might never work as far as I see... I get the following output: root@Tower:~# hdparm -C /dev/sdb /dev/sdb: drive state is: unknown It makes no difference if the drive is really active or not... I have deactivated spin-down completely as it does not work for me. But nevertheless, the smart self-tests should be possible to execute without hdparm, right?