Tezmo

Members
  • Posts

    42
  • Joined

  • Last visited

Everything posted by Tezmo

  1. Do you mean that you physically move the drive without shutting down the server? I stop the array, pull the drive, relocate the drive, restart the array. So when I hover over the red X, the tooltip says "Stop Preclear" (obviously there isn't one ongoing) and clicking it refreshes the page, but doesn't actually change anything that I can see - I've still no option to start a preclear. Pretty sure by now that Im missing something obvious, but not sure what it is! Is it mounted by Unassigned Devices? I just got some help from gfjardim via PM - my problem was that I was missing the Unassigned Devices plugin - with it I was able to kick off my preclear as I wanted. I'd just like mostly to give thanks to gfjardim for his help here - there wasn't a bug, rather a misunderstanding on my part, and the support I got was the kind of top notch you'd not dare hope for even if you'd parted with a great deal of money. Really above and beyond the call of duty and I am very thankful. Outstanding As a follow up to this, I've established that it does count as a very minor bug - the problem is that the 'stop pre clear' Red X button doesn't clear down the status of a device if you click it in Safari. It works just as intended in Chrome. Just submitting for clarity!
  2. Do you mean that you physically move the drive without shutting down the server? I stop the array, pull the drive, relocate the drive, restart the array. So when I hover over the red X, the tooltip says "Stop Preclear" (obviously there isn't one ongoing) and clicking it refreshes the page, but doesn't actually change anything that I can see - I've still no option to start a preclear. Pretty sure by now that Im missing something obvious, but not sure what it is! Is it mounted by Unassigned Devices? I just got some help from gfjardim via PM - my problem was that I was missing the Unassigned Devices plugin - with it I was able to kick off my preclear as I wanted. I'd just like mostly to give thanks to gfjardim for his help here - there wasn't a bug, rather a misunderstanding on my part, and the support I got was the kind of top notch you'd not dare hope for even if you'd parted with a great deal of money. Really above and beyond the call of duty and I am very thankful. Outstanding
  3. Do you mean that you physically move the drive without shutting down the server? I stop the array, pull the drive, relocate the drive, restart the array. So when I hover over the red X, the tooltip says "Stop Preclear" (obviously there isn't one ongoing) and clicking it refreshes the page, but doesn't actually change anything that I can see - I've still no option to start a preclear. Pretty sure by now that Im missing something obvious, but not sure what it is! Is it mounted by Unassigned Devices? This is it in unassigned devices: The tooltip over its blue icon shows "New Device" but it looks as though the only available action is to spin it down. Thank you for replying to my questions, gfjardim - it is very much appreciated.
  4. Do you mean that you physically move the drive without shutting down the server? I stop the array, pull the drive, relocate the drive, restart the array. So when I hover over the red X, the tooltip says "Stop Preclear" (obviously there isn't one ongoing) and clicking it refreshes the page, but doesn't actually change anything that I can see - I've still no option to start a preclear. Pretty sure by now that Im missing something obvious, but not sure what it is!
  5. A bug report of sorts to raise - I just popped a new disk in (fresh out of a virgin anti static bag) and went to use the plugin. I have previously used it to pre clear another disk, which (perhaps unrelated) was the same make/model/capacity. The disk shows as pre clear having already completed, so there is no option to start it. Similarly in the main unraid dashboard it shows in Unassigned Devices as "Preclear in progress... Preclear Finished Successfully!" I suspect this is because I have a permanently empty bay which I use for preclearing, then when its complete I move the drive to its 'correct' location. This means (again, speculative) that the new drive has come up with the same sdX as the previous successful pre clear, and the plugin thinks its already been dealt with. If I run the old-style preclear_disk.sh script with -l, it does indeed identify this new one as being in need of a pre clear, and this serves as my stopgap method using the old screen approach. Im not sure if a reboot would fix this, and its not a show stopper by any means, but I thought it couldn't hurt to raise it. Thank you for your ongoing efforts, gfjardim
  6. I want to use GnuPG; I've pulled a plugin from slackware: gnupg-1.4.20-x86_64-1.txz when I come to execute gpg, I get the following error: root@Stash:~# gpg gpg: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory Any ideas how I can get libreadline.so.6 installed so that gnupg can run? I see that libreadline.so.5.2 and libreadline.so.5 are baked in, but all variations of gnupg that I can find depend upon 6. All suggestions gratefully received!
  7. Interesting. That is not a "normal" private IP range for consumer routers, and I think is one of the ranges used by the internal docker network. I assume you set it up that way on purpose? ...but still a perfectly valid private IP range, and Docker is supposed to select a range that is not in use on the host machine (which in my similar case typically results in 10.0.x.x). Very true. I was just looking things over and that was one of the details that jumped out at me. My next suggestion is to remove reiserfs from the equation. I had some similar stuff happening (100% CPU shfs, box is unresponsive) and moving to xfs solved the issue. It just sits on the LAN at work, and such is the range that was pre-existing. I am actually in the process of an xfs migration right now, in an attempt to knock this on the head. It would be nice to get something concrete though, I guess if it never happens again after the change to xfs, it'd only qualify anecdotally as a fix. Thank you both!
  8. Can affirm I have indeed mate (through the GUI) with no ill effects reported on any devices.
  9. I have now managed to tie this to a particular event. It is triggered by attempt to delete a file. Now running 6.1.6. here is the output from syslog: Jan 21 16:52:09 Stash kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 Jan 21 16:52:09 Stash kernel: IP: [<ffffffff811538d4>] __discard_prealloc+0x98/0xb3 Jan 21 16:52:09 Stash kernel: PGD 112bb0067 PUD 12ada1067 PMD 0 Jan 21 16:52:09 Stash kernel: Oops: 0002 [#1] PREEMPT SMP Jan 21 16:52:09 Stash kernel: Modules linked in: md_mod xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat e1000e ptp mvsas ahci libsas i2c_i801 libahci scsi_transport_sas pps_core [last unloaded: md_mod] Jan 21 16:52:09 Stash kernel: CPU: 6 PID: 24670 Comm: shfs Not tainted 4.1.13-unRAID #1 Jan 21 16:52:09 Stash kernel: Hardware name: ASUS All Series/Z87-PLUS, BIOS 2103 08/15/2014 Jan 21 16:52:09 Stash kernel: task: ffff88041d3cdc20 ti: ffff880002bc8000 task.ti: ffff880002bc8000 Jan 21 16:52:09 Stash kernel: RIP: 0010:[<ffffffff811538d4>] [<ffffffff811538d4>] __discard_prealloc+0x98/0xb3 Jan 21 16:52:09 Stash kernel: RSP: 0018:ffff880002bcbb68 EFLAGS: 00010246 Jan 21 16:52:09 Stash kernel: RAX: ffff8800254d3ca8 RBX: ffff8800254d3c80 RCX: 0000000000000000 Jan 21 16:52:09 Stash kernel: RDX: 0000000000000000 RSI: ffff8800254d3c80 RDI: ffff880002bcbca8 Jan 21 16:52:09 Stash kernel: RBP: ffff880002bcbb98 R08: 0000000000000293 R09: 000000000000e3dd Jan 21 16:52:09 Stash kernel: R10: ffffffff00001001 R11: ffffffff8116106d R12: ffff880002bcbca8 Jan 21 16:52:09 Stash kernel: R13: ffff8800254d3d20 R14: ffff880002bcbca8 R15: 0000000000000000 Jan 21 16:52:09 Stash kernel: FS: 00002b33b5e0e700(0000) GS:ffff88042fb80000(0000) knlGS:0000000000000000 Jan 21 16:52:09 Stash kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 21 16:52:09 Stash kernel: CR2: 0000000000000008 CR3: 0000000414396000 CR4: 00000000001406e0 Jan 21 16:52:09 Stash kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 21 16:52:09 Stash kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jan 21 16:52:09 Stash kernel: Stack: Jan 21 16:52:09 Stash kernel: ffff8803aa92c000 ffff880002bcbca8 ffffc900077da000 ffffc900077fa1e8 Jan 21 16:52:09 Stash kernel: ffff880002bcbca8 ffff8800069d3000 ffff880002bcbbc8 ffffffff81153953 Jan 21 16:52:09 Stash kernel: ffff880002bcbca8 ffff88041d3cdc20 ffffc900077da000 ffffc900077da000 Jan 21 16:52:09 Stash kernel: Call Trace: Jan 21 16:52:09 Stash kernel: [<ffffffff81153953>] reiserfs_discard_all_prealloc+0x44/0x4e Jan 21 16:52:09 Stash kernel: [<ffffffff811701cc>] do_journal_end+0x4e7/0xc78 Jan 21 16:52:09 Stash kernel: [<ffffffff81170ebc>] journal_end+0xae/0xb6 Jan 21 16:52:09 Stash kernel: [<ffffffff811576f8>] reiserfs_unlink+0x1de/0x23f Jan 21 16:52:09 Stash kernel: [<ffffffff811062b2>] vfs_unlink+0xc5/0x165 Jan 21 16:52:09 Stash kernel: [<ffffffff8110a063>] do_unlinkat+0x107/0x24c Jan 21 16:52:09 Stash kernel: [<ffffffff81101863>] ? SyS_newlstat+0x25/0x2e Jan 21 16:52:09 Stash kernel: [<ffffffff8110a7b0>] SyS_unlink+0x11/0x13 Jan 21 16:52:09 Stash kernel: [<ffffffff815f69ae>] system_call_fastpath+0x12/0x71 Jan 21 16:52:09 Stash kernel: Code: 1c 75 bb 0f 0b 85 c0 74 12 48 8b 93 e8 00 00 00 4c 89 ee 4c 89 e7 e8 be 6e 00 00 48 8b 4b 28 44 89 7b 1c 48 8d 43 28 48 8b 53 30 <48> 89 51 08 48 89 0a 48 89 43 28 48 89 43 30 58 5b 41 5c 41 5d Jan 21 16:52:09 Stash kernel: RIP [<ffffffff811538d4>] __discard_prealloc+0x98/0xb3 Jan 21 16:52:09 Stash kernel: RSP <ffff880002bcbb68> Jan 21 16:52:09 Stash kernel: CR2: 0000000000000008 Jan 21 16:52:09 Stash kernel: ---[ end trace 2ca03a34b3b066f1 ]--- I've seen various references to this null pointer dereference in other forum posts, all of which point to it being a problem that's been fixed. Does anyone have any great ideas?
  10. Thanks mate - I am currently migrating to XFS (long process!) to see if this knocks it on the head.
  11. Exactly the same symptom here, with a flat unraid and no dockers or anything running. Ive been experiencing this for a while and have slowly stripped back additions and it still happens. Might take a month, might take two days, but it always results in 100% shfs, IO lockup, and is always triggered by a write.
  12. Thats it! Thank you itimpi! Doesn't solve my broader instability problems but thats one more thing ruled out! Thank you
  13. I am seeing exactly the same thing, they appear in chunks like so:
  14. But now its back! And this time I have a syslog. CPU is stuck at 100% as before. I can't access the array when telnet'd in - can't run ls, for example. Web interface is unresponsive. All the esoteric stuff in my initial post is now gone - I have a straight up 21 drive array, plus parity. Running nothing move than latest version of unraid. I believe it packed in at 13:33. Here is the tail end of my syslog: Oct 23 12:58:11 Stash avahi-daemon[6433]: Received response from host 172.16.20.60 with invalid source port 57544 on interface 'eth0.0' Oct 23 13:33:50 Stash udevd[25024]: timeout 'ata_id --export /dev/sdc' Oct 23 13:33:50 Stash udevd[25024]: timeout 'scsi_id --export --whitelisted -d /dev/sdc' Oct 23 13:34:43 Stash kernel: sdc: sdc1 Oct 23 19:10:19 Stash in.telnetd[3163]: connect from 172.16.22.12 (172.16.22.12) It looks like these timeouts are to blame? here is the top of my top output: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11811 root 20 0 4827m 20m 800 S 100 0.1 434:17.83 shfs 9852 root 20 0 0 0 0 S 4 0.0 92:18.90 unraidd here is the smart report on sdc: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 2 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 125 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 80 194 Temperature_Celsius 0x0022 122 120 000 Old_age Always - 30 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 1) Can I bring this thing back to life without rebooting? Physical access is tricky. 2) Is SDC to blame here? My array is set to never power down drives. Thank you all for your thoughts, Adrian.
  15. Ultimately put this down to RAM - having swapped it out for something newer (and bigger!) the problem has gone away entirely.
  16. Thank you mate - will strip out all the extraneous bollocks - fwiw just checked the contents of my copy operation and there aren't any binaries in there - just bash scripts. Doing this process now and will update with another stability report.
  17. Would appreciate some general ideas with regard to my problems - because they involve a hard lock I can't retrieve any log files but would value any suggestions nonetheless. Running unraid 6.0.1 with 21 drives in the array and 1 parity drive - this is an array migrated as a 'clean' install from 5 and the drives are formatted with RFS. Various sizes from 2 to 6 TB. I have a further 2TB drive outside the array (ext4 formatted) which I mount in my go script, which looks like this: root@Stash:/mnt# cat /boot/config/go #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp & mkdir /mnt/scratch mount /dev/disk/by-id/ata-WDC_WD20EARX-00PASB0_WD-WMAZA4999308-part1 /mnt/scratch cp /boot/bin/* /usr/bin/ The last line is to make available a few binaries I use like unrar etc. The server also runs plugins for sabnzbd, sickbeard, and plex server. The symptom is that I am experiencing a full IO lockup of unraid - anecdotally it seems related to copying/moving data from my drive outside the array to my array - this could be triggered either manually by me in a console, or by one of my plugins. It corresponds to a 100% load on the shfs process, and any attempt to read from the array sits there in perpetuity. I cannot unmount the devices nor stop the array elegantly. The web interface is inaccessible - I suspect it still works fine, its just that it wants to perform some array IO. The drive outside of the array remains readable and writeable. The network stays up so I can telnet/ssh in but cannot perform any array IO. The diagnostics command just hangs when I run it to generate logs as per the Read me first! instructions. All I can do to recover the situation is an unclean power down. Symptomatically it looks a lot like the ReiserFS kernel panic issue (https://lime-technology.com/forum/index.php?topic=35788.0) but I am told that has been fixed. Any pointers would be very much appreciated!
  18. Good to know Jon, thank you - I was sure this was it because the symptoms sound identical but I will begin gathering log files the next time it happens!
  19. Has there been any movement on a fix for this issue in reiserfs? I'm getting the very same difficulty on a bare metai install of 6.0.1 having upgraded from a perfectly working install of 5, and the prospect of up changing filesystem to XFS for 76TB over 22 drives does not sound great. Thank you all!
  20. I have a 21 disk + parity array, everything checks out fine with no errors, no smart problems, running unraid 5.0.5. Suddenly and spontaneously this morning, disk15 has been completely erased. There is nothing in the log that relates to this. Here is a snippet: the wiping happened some time between 10:21 and 13:53 (when I stopped and started the array to try and bring it back). The only other piece of information I can think that would be of note is that this disk more than any was approaching being full - it had about 70GB free at the time of the incident. I can't even rebuild with parity since it seems to unraid that everything is as it should be. Any suggestions would be very gratefully received indeed.
  21. Just another thank you gary as I come back to mark the question as solved. I have calculated hashes for everything on my drive and given the nature of the data was able to compare it to known values; its now a force of habit of mine to routinely check my CRCs just as I check parity every week. Thank you again for answering my wooly + non-specific question!
  22. Thank you Gary - if the errors occurred during a parity check, whilst no shares were set to be exported - would it be safe to say that the errors likely were all read errors? Ie the parity process never writes back to data drives? I suspect the read errors all occurred at the time the PSU let the drives down - they spontaneously showed as missing - does this sound plausible? Good thinking as regards checksums - shall make a habit of this going forwards.
  23. After I added two more drives to my unraid 5 rc box, for a total of 15 2&3 TB drives, I had problems with preclears and parity checks failing to complete, followed by errors on four disks and a red dot on one (DISK_INVALID) which I rebuilt when I'd established I had 30A worth of drive on one of four 18A rails; i.e. all my problems related to power. Now that everything appears fixed, and my parity has been tested and passed, can I be 100% about the integrity of the data that's on there? I suppose my question is quite general really, in so far as I'm unsure of what's happening under the hood. How does unraid decide that what's on disk is correct, and if it says everything's okay, can I believe it? Thank you all!
  24. Thank you Joe; I shall perform a pre-clear and assign it to the array upon completion! Very helpful and much appreciated