cpetro45

June 17, 2022

Thanks to you both, OK, I'll take a look at the plugin and also run memtest.

Jorge parity check was started manually pretty much directly after the non-correcting check was canceled.

June 16, 2022

6 hours ago, JorgeB said:

Because there wasn't one, all checks were non correct.

Hey Jorge,

Thanks. The last full check shown below (16hr, 20 min, 57 sec) did have correcting enabled. I was expecting something to show up in the log like "parity check started" and "parity check completed." You are saying there is basically no identification for this, and things in the log would only show up if errors were found and corrected (or not corrected as the previous run shows in the log)?

Thanks,

Chris

June 15, 2022

3 hours ago, JorgeB said:

If it was a bit flip it might be hard to catch any issues with memtest, still worth running a couple of passes, if no errors are found see how the next parity checks go.

Alright, will give it a go, thanks. I didn't see anything related to full parity check with correction in the syslog... did I need to have debug turned on or something?

Thanks

June 15, 2022

2 minutes ago, trurl said:

Have you done memtest?

I have not

June 15, 2022

1 hour ago, JorgeB said:

Log snippet doesn't show the full correcting check, assuming it did run until the end without finding any errors it suggests the previous error found was unrelated to unclean shutdown, and possibly something like a RAM bit flip, unclean shutdown related errors, when they exist, mostly exist in the beggining of the disks, which were where the metadata is stored, assuming XFS filesystem for the array.

Thanks so much Jorge, let me get the whole log. All disks are XFS.

I attached the whole diagnostic stack. I appreciate your reply and help.

Let me know if I can provide anymore information. I also provided screenshot for times of the 1 sync error vs clean run.

tower-diagnostics-20220615-1543.zip

June 15, 2022

Hey there everyone,

I'm just wondering how / why this would occur. 8 disk array, dual parity, no cache. There was an unclean shutdown due to power issues (I have UPS, but don't have the automation setup to shutdown yet). There were no writes going on and disks were spun down.

The next day, I ran parity check with correct disabled and it found 1 error about 60% of through. I read these forums and decided to cancel and restart the parity check with correct enabled.

The parity check ran with no errors detected.

What could a reason be why this happened? Should I be concerned?

Here are lines related from the log: (suspect line Jun 13 07:06:23 Tower kernel: md: recovery thread: PQ incorrect, sector=807021208)

Jun 13 06:05:27 Tower kernel: mdcmd (42): check 
Jun 13 06:05:27 Tower kernel: md: recovery thread: check P Q ...
Jun 13 06:05:50 Tower kernel: mdcmd (43): nocheck Cancel
Jun 13 06:05:50 Tower kernel: md: recovery thread: exit status: -4
Jun 13 06:05:55 Tower kernel: mdcmd (44): check nocorrect
Jun 13 06:05:55 Tower kernel: md: recovery thread: check P Q ...
Jun 13 06:08:57 Tower nmbd[6513]: [2022/06/13 06:08:57.865107,  0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2)
Jun 13 06:08:57 Tower nmbd[6513]:   *****
Jun 13 06:08:57 Tower nmbd[6513]:   
Jun 13 06:08:57 Tower nmbd[6513]:   Samba name server TOWER is now a local master browser for workgroup HOME on subnet 192.168.122.1
Jun 13 06:08:57 Tower nmbd[6513]:   
Jun 13 06:08:57 Tower nmbd[6513]:   *****
Jun 13 06:08:57 Tower nmbd[6513]: [2022/06/13 06:08:57.865228,  0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2)
Jun 13 06:08:57 Tower nmbd[6513]:   *****
Jun 13 06:08:57 Tower nmbd[6513]:   
Jun 13 06:08:57 Tower nmbd[6513]:   Samba name server TOWER is now a local master browser for workgroup HOME on subnet 172.17.0.1
Jun 13 06:08:57 Tower nmbd[6513]:   
Jun 13 06:08:57 Tower nmbd[6513]:   *****
Jun 13 07:06:23 Tower kernel: md: recovery thread: PQ incorrect, sector=807021208
Jun 13 13:03:31 Tower kernel: mdcmd (45): nocheck Cancel
Jun 13 13:03:31 Tower kernel: md: recovery thread: exit status: -4
Jun 13 13:03:40 Tower kernel: mdcmd (46): check 
Jun 13 13:03:40 Tower kernel: md: recovery thread: check P Q ...
Jun 13 13:03:58 Tower kernel: mdcmd (47): nocheck Cancel
Jun 13 13:03:58 Tower kernel: md: recovery thread: exit status: -4
Jun 13 13:04:02 Tower kernel: mdcmd (48): check nocorrect
Jun 13 13:04:02 Tower kernel: md: recovery thread: check P Q ...
Jun 13 13:04:10 Tower kernel: mdcmd (49): nocheck Cancel
Jun 13 13:04:10 Tower kernel: md: recovery thread: exit status: -4
Jun 13 13:04:14 Tower kernel: mdcmd (50): check 
Jun 13 13:04:14 Tower kernel: md: recovery thread: check P Q ...
Jun 13 13:05:29 Tower kernel: mdcmd (51): nocheck Cancel
Jun 13 13:05:29 Tower kernel: md: recovery thread: exit status: -4
Jun 13 13:12:29 Tower kernel: mdcmd (52): check 
Jun 13 13:12:29 Tower kernel: md: recovery thread: check P Q ...

Any thoughts much appreciated.

Thanks

February 26, 2022

I've been running Unraid for 10 years. I have a 6 disk array that at one time was 9 or 10. All has been well. As you can imagine drive failures here and there, etc over 10 years. I updated disks and consolidated a few years ago. I have 8TB parity and 5 Data disks 2 - 4 GB each. I never updated the H/W. This thing has been running on Core 2 Duo E6600 or something with 2 GB of ram...ancient Gigabyte board and an 8800GTX , still no issues besides regular disk maintenance...til now.

Log in to WebUI the other day and see a disabled disk...I'm pretty annoyed as nothing has changed and don't even write to this that much. I didn't grab the logs. The SMART report on disabled disk says UDMA CRC error count. I read all about that all over this forum. In the meantime, I think I stopped the array and shutdown to replace SATA cables. I ordered new SATA cables from Amazon. I replaced all of them (they are old). I didn't care about the sata plug / disk order cause unraid 6 no need to worry. I think I was concerned so I hooked up a monitor as well to check booting. Well, I thought it got stuck on loading bzroot since it was taking forever (more than 3 - 4 min). I got sidetracked and ended replacing the usb drive (14 years old) thinking that was bad (it wasn't). I had to use the HP Media preparation since this MOBO is real picky. Did all that BS (figuring that out again that it wouldn't boot unless specially formatted with a 15 year old tool), finally got it booting from new USB thumb drive. BTW, it was taking like 10 minutes to boot (first hint, since it never usually took that long)...remember this thing is still booting up. I didnt think it was booting and thought usb drive was bad. I used the new UNRAID 6 usb drive prep first, obviously, but it definitely needed that HP tool (thanks to this forum...again... I was able to find that info ...again and re-download it).

Next, the forums say, oh, just check the file system of the disabled disk, and it's probably that. I started array in maintenance mode (after 10 minute boot or so), ran the file system check, nothing really came up on -n option. I figured, what the hell, run it without -n. Next thing you know, I get a UDMA CRC error count warning on PARITY disk, during the File system check of disabled disk. I'm like , oh man, parity better not get disabled. There were I think 21 writes in the Parity disk column and 20 errors. I started getting real nervous. Somehow the retries must have been successful (I have that syslog) I cancelled the file system check on disabled disk. I had screenshots from config before and after I moved the disks around. It wasn't the same SATA port that threw the UDMA CRC error count on data disk and parity disk. At this point, I knew something on the board was toast.

Luckily, I have another system (literally only 2 years newer) but with 8GB ram and Core 2 Xtreme 3000 quad core (1200$ processor back in it's day , bought on ebay for like 250 or something like 8 years ago). Anyway, huge upgrade compared to the 2GB and Core 2 Duo LOL. We're talking DDR 2 here.

So, conveniently enough, the "new" computer had a removable MOBO tray, so right now, I'm running the array off the new board, rebuilding the failed disk on top of itself. Something went real bad on in the old CPU. Added some pics....hope this was a fun blast from the past for some, and also BACK UP YOUR DATA because you never know...

Ultimately, I do feel like a few bits may be off (I hope not). I definitely going to backup all the rest of my important items once array is back up completely.

August 2, 2020

6 hours ago, quincyg said:

Preclear post read failed. Any suggestions here. I got a failure at the end of pre clear. This drive is connected via an onboard sata connector that I don't normally utilize. The WD red drive is running pretty hot as well 47-50C. log attached.

preclear_disk_log.txt 7.42 kB · 1 download

Jul 31 09:43:51 preclear_disk_VDHX6KUK_30153: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --notify 3 --frequency 1 --cycles 1 --no-prompt /dev/sdb
Jul 31 09:43:51 preclear_disk_VDHX6KUK_30153: Preclear Disk Version: 1.0.16
Jul 31 09:43:52 preclear_disk_VDHX6KUK_30153: S.M.A.R.T. info type: default
Jul 31 09:43:52 preclear_disk_VDHX6KUK_30153: S.M.A.R.T. attrs type: default
Jul 31 09:43:52 preclear_disk_VDHX6KUK_30153: Disk size: 8001563222016
Jul 31 09:43:52 preclear_disk_VDHX6KUK_30153: Disk blocks: 1953506646
Jul 31 09:43:52 preclear_disk_VDHX6KUK_30153: Blocks (512 bytes): 15628053168
Jul 31 09:43:52 preclear_disk_VDHX6KUK_30153: Block size: 4096
Jul 31 09:43:52 preclear_disk_VDHX6KUK_30153: Start sector: 0
Jul 31 09:43:57 preclear_disk_VDHX6KUK_30153: Pre-Read: dd if=/dev/sdb of=/dev/null bs=2097152 skip=0 count=8001563222016 conv=notrunc,noerror iflag=nocache,count_bytes,skip_bytes
Jul 31 10:53:15 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 10% read @ 188 MB/s
Jul 31 12:03:47 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 20% read @ 181 MB/s
Jul 31 13:16:54 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 30% read @ 174 MB/s
Jul 31 14:33:20 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 40% read @ 167 MB/s
Jul 31 15:53:54 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 50% read @ 161 MB/s
Jul 31 17:19:43 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 60% read @ 146 MB/s
Jul 31 18:52:27 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 70% read @ 138 MB/s
Jul 31 20:34:15 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 80% read @ 121 MB/s
Jul 31 22:29:41 preclear_disk_VDHX6KUK_30153: Pre-Read: progress - 90% read @ 104 MB/s
Aug 01 00:45:46 preclear_disk_VDHX6KUK_30153: Pre-Read: dd - read 8001565319168 of 8001563222016.
Aug 01 00:45:47 preclear_disk_VDHX6KUK_30153: Pre-Read: elapsed time - 15:01:47
Aug 01 00:45:47 preclear_disk_VDHX6KUK_30153: Pre-Read: dd exit code - 0
Aug 01 00:45:51 preclear_disk_VDHX6KUK_30153: Zeroing: emptying the MBR.
Aug 01 00:45:51 preclear_disk_VDHX6KUK_30153: Zeroing: dd if=/dev/zero of=/dev/sdb bs=2097152 seek=2097152 count=8001561124864 conv=notrunc iflag=count_bytes,nocache,fullblock oflag=seek_bytes
Aug 01 00:45:51 preclear_disk_VDHX6KUK_30153: Zeroing: dd pid [3609]
Aug 01 02:49:23 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 10% zeroed @ 111 MB/s
Aug 01 04:35:33 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 20% zeroed @ 146 MB/s
Aug 01 06:13:56 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 30% zeroed @ 127 MB/s
Aug 01 07:57:00 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 40% zeroed @ 113 MB/s
Aug 01 09:56:06 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 50% zeroed @ 110 MB/s
Aug 01 11:51:07 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 60% zeroed @ 117 MB/s
Aug 01 13:41:27 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 70% zeroed @ 130 MB/s
Aug 01 15:19:57 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 80% zeroed @ 136 MB/s
Aug 01 17:07:26 preclear_disk_VDHX6KUK_30153: Zeroing: progress - 90% zeroed @ 113 MB/s
Aug 01 19:13:15 preclear_disk_VDHX6KUK_30153: Zeroing: dd - wrote 8001563222016 of 8001563222016.
Aug 01 19:13:16 preclear_disk_VDHX6KUK_30153: Zeroing: elapsed time - 18:27:23
Aug 01 19:13:17 preclear_disk_VDHX6KUK_30153: Zeroing: dd exit code - 0
Aug 01 19:13:18 preclear_disk_VDHX6KUK_30153: Writing signature: 0 0 2 0 0 255 255 255 1 0 0 0 255 255 255 255
Aug 01 19:13:22 preclear_disk_VDHX6KUK_30153: Post-Read: verifying the beggining of the disk.
Aug 01 19:13:22 preclear_disk_VDHX6KUK_30153: Post-Read: cmp /tmp/.preclear/sdb/fifo /dev/zero
Aug 01 19:13:22 preclear_disk_VDHX6KUK_30153: Post-Read: dd if=/dev/sdb of=/tmp/.preclear/sdb/fifo count=2096640 skip=512 conv=notrunc iflag=nocache,count_bytes,skip_bytes
Aug 01 19:13:23 preclear_disk_VDHX6KUK_30153: Post-Read: verifying the rest of the disk.
Aug 01 19:13:23 preclear_disk_VDHX6KUK_30153: Post-Read: cmp /tmp/.preclear/sdb/fifo /dev/zero
Aug 01 19:13:23 preclear_disk_VDHX6KUK_30153: Post-Read: dd if=/dev/sdb of=/tmp/.preclear/sdb/fifo bs=2097152 skip=2097152 count=8001561124864 conv=notrunc iflag=nocache,count_bytes,skip_bytes
Aug 01 20:34:20 preclear_disk_VDHX6KUK_30153: Post-Read: progress - 10% verified @ 169 MB/s
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd - read 1527356166144 of 8001563222016.
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: elapsed time - 2:33:21
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd command failed, exit code [1].
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 1518178664448 bytes (1.5 TB, 1.4 TiB) copied, 9128.02 s, 166 MB/s
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 725089+0 records in
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 725088+0 records out
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 1520619749376 bytes (1.5 TB, 1.4 TiB) copied, 9142.07 s, 166 MB/s
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 726061+0 records in
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 726060+0 records out
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 1522658181120 bytes (1.5 TB, 1.4 TiB) copied, 9154.33 s, 166 MB/s
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 727177+0 records in
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 727176+0 records out
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 1524998602752 bytes (1.5 TB, 1.4 TiB) copied, 9168.43 s, 166 MB/s
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 728256+0 records in
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 728255+0 records out
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 1527261429760 bytes (1.5 TB, 1.4 TiB) copied, 9182.28 s, 166 MB/s
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: dd: error reading '/dev/sdb': Input/output error
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 728299+1 records in
Aug 01 21:46:45 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 728299+1 records out
Aug 01 21:46:46 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 1527354068992 bytes (1.5 TB, 1.4 TiB) copied, 9198.16 s, 166 MB/s
Aug 01 21:46:46 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 728299+1 records in
Aug 01 21:46:46 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 728299+1 records out
Aug 01 21:46:46 preclear_disk_VDHX6KUK_30153: Post-Read: dd output: 1527354068992 bytes (1.5 TB, 1.4 TiB) copied, 9198.16 s, 166 MB/s
Aug 01 21:46:48 preclear_disk_VDHX6KUK_30153: ssmtp: Authorization failed (535 5.7.0 (#AUTH005) Too many bad auth attempts.)
Aug 01 21:46:49 preclear_disk_VDHX6KUK_30153: S.M.A.R.T.: 5 Reallocated_Sector_Ct 6
Aug 01 21:46:49 preclear_disk_VDHX6KUK_30153: S.M.A.R.T.: 9 Power_On_Hours 77
Aug 01 21:46:49 preclear_disk_VDHX6KUK_30153: S.M.A.R.T.: 194 Temperature_Celsius 50
Aug 01 21:46:49 preclear_disk_VDHX6KUK_30153: S.M.A.R.T.: 196 Reallocated_Event_Count 6
Aug 01 21:46:49 preclear_disk_VDHX6KUK_30153: S.M.A.R.T.: 197 Current_Pending_Sector 16
Aug 01 21:46:49 preclear_disk_VDHX6KUK_30153: S.M.A.R.T.: 198 Offline_Uncorrectable 0
Aug 01 21:46:49 preclear_disk_VDHX6KUK_30153: S.M.A.R.T.: 199 UDMA_CRC_Error_Count 0
Aug 01 21:46:49 preclear_disk_VDHX6KUK_30153: error encountered, exiting...

warranty that thing, reallocated sectors

December 19, 2017

Just an update from my issues (see back a few posts) with this plugin and 6.3.5. On my 2nd 4TB drive, I've got the array stopped and I'm using the original (patched for latest version of unraid - see script thread) preclear_disk.sh script without issue. The server hasn't locked up at all like last time when using this plugin. Using screen is working well. I'm not saying I don't like the idea of this plugin at all, it just wasn't working for me.

Thanks

December 4, 2017

On 12/2/2017 at 5:41 PM, Frank1940 said:

If it locks up again, you could install the 'Tips and Tweaks' plugin. Then goto the Tweaks page and set the Disk Cache 'vm.dirty_background_ratio' (%): parameter to 1 and the Disk Cache 'vm.dirty_ratio' (%): parameter to 2 This will free up a big block of memory without any observable effect on performance. You can read a bit more about these parameters by clicking on the Help function.

Hey Frank, thanks. I implemented this setting and it seems like maybe it did something. However, looks like the preclear job is mostly locking up on the Pre-read phase and pegging 1 cpu at 100% with the timer stopping. I've nursed it through 2 cycles, it's working on the 3rd but I think for my 2nd HD, I may spend the time to figure out how to do it via ssh. I'm not going to spend too much time as I believe my system resources are not constrained, and the script is just bombing out for whatever reason. I do appreciate the help and input!

Chris

December 2, 2017

1 hour ago, Frank1940 said:

The E6600 should be up to the task. How much memory do you have installed? It takes between 2GB and 4GB of RAM with ver-6.3.5 to have a system that is not RAM constrained if you are only running the basic NAS function, the 'usual plugins', and perhaps, a Docker or two. The preclear function does require a fair amount of RAM so that might be an issue.

Thanks, yeah it's got 2GB of ram, a little light for sure. It didn't seem like ram usage was pegged tho, about 50% versus higher cpu usage. Its prereading now, it locked up earlier (no ip, no gui) but now it seems like it's moving again...after I reset it. We shall see.... Thanks

December 2, 2017

Is there a minimum CPU system requirement to get this script to run successfully all the way through?

I've never used it before, and I have a pretty old Dual Core Duo E6600 2.4Ghz. Anyway, I've got two new 4TB disks hooked up to a PCI Ex Ver1 X1 SATA controller. I tried both at the same time, realized that was a big mistake as it was saturating the PCI bus. Now I'm trying 1 at a time, and I had pre clear stop responding on Post-Read with CPUs pegged at 100%. The WebGUI was still responding. I uninstalled the plugin, rebooted, reinstalled it, and rebooted again and am attempting preclear once again with 1 drive and the array stopped. Hopefully it makes it all the way through this time, I'll definitely post back. This is with the latest version 2017.11.14 and Unraid 6.3.5. I'm not really using any of the new features like docker containers yet. Any thoughts? If this doesn't work, I understand I should try it manually without the plugin. Also, I could get the disks off this crappy PCIEv1 X1 controller (which I plan to do after I remove some other disks (MOBO has 8SATA ports), but I figured I'd be OK with the preclear.

Thanks !

Chris

User Customizations · January 2, 2014

This is a known issue with SF. Furthermore, SF is NOT compatible with unraid v5.0.x. So, solution is simple, remove SF and use the stock gui, unmenu, or limetech's webgui instead.

Wow, I definitely didn't do my research here... Thanks for the concise info that I'm sure is well documented on this site (I apologize for that).

Thanks again,

Chris

User Customizations · January 2, 2014

Hi All,

I've been running unraid 4.7 for a few years and have now upgraded to 5.0.4. All is good but I've recently installed some plugins (simplefeatures and unmenu). I had a few drive problems but all is resolved.

I'm running a parity check and with 4.7 usually would get 60-65MB/s which is definitely normal and good. Now, with 5.0.4 and simplefeatures here is the behavior.

Parity Check Running:

Access regular simplefeatures UI (not unmenu).

Parity check speed is reported at 19-21MB/s consistently. The disks in the array seem like they are "churning" (making a bit of noise) which I thought is normal. However, I know this isn't right compared to previous performance and I've been messing around with it. If I access the unmenu interface and not simplefeatures UI then the speed hops up to 60-65MB/s and the "churning" noise goes away.

Has anyone ever heard of that? I don't know if it has something to do with my flash drive read speed which tested at : Timing buffered disk reads: 60 MB in 3.06 seconds = 19.61 MB/sec. This is pretty similar to the speed I get when accessing simple features UI. I wonder if somehow the simplefeatures UI is causing the parity check to have to go through the most limiting factor thumb drive speed? I don't know, I'm confused. All other drives are at 80-135MB/s read.

I attached latest system log. It's definitely something with simplefeatures. Like I said, access simplefeatures UI, parity check speed choked to 20MB/s. Leave simplefeatures UI and check unmenu, it hops back up to 65+MB/s

Perhaps I'll just uninstall simplefeatures as it seems unmenu is more than sufficient.

Any thoughts?

Thanks,

Chris

syslog-2014-01-02.txt

December 21, 2013

When power is lost, data not yet flushed to the disks is lost.

The file-system on the data disks is a journaled file-system. It is able to deal with the loss by re-playing the journaled.

The parity disk has no such mechanism to recover. as you said, it is most likely to be in error.

Joe - what I'm hearing from you is you think I'm OK since there wasn't any read errors on the disks? Could I have created this when I manually unmounted disks when the array was potentially already doing a parity check (the first reboot after the power loss). In order to prevent an automatic parity recheck upon power loss, I'd basically have to set the array not to start on power up. I'm not sure I want to do that, but is there way to not have it do an error correcting parity check automatically?

Thanks again for your thoughts.

Chris

December 21, 2013

It won't be at exactly the same point in the process, but you can add "beep;beep;beep" at the end of your GO script and you'll hear 3 beeps when it executes

FWIW, I definitely consider a UPS a mandatory accessory for ALL of my computers ... especially my UnRAID servers

Thanks again. I started looking at UPS's today. It's been on the list for awhile, they aren't as expensive as I thought. I'm looking at the APC Back UPS Pro 1500.

December 20, 2013

I use the md5deep program that I install from unMenu to generate my checksums.

Thank you Bob, I will look at that.

The best way to confirm that all is okay is to compare the data on UnRAID to your backups.

If you don't have backups, then there's really no way to know for sure. I can say that the few times I've had sync errors in the last 5-6 years that I've used UnRAID, they have ALL been legitimate sync errors on the parity disc, and never an error in the actual data. [i've run a complete compare against my backups every time that happened]

If you don't have a backup strategy, and/or don't think your data is important enough to bother with backups; then you can at least create checksums so you'll have a way to identify any files that have been corrupted. I do this in addition to backups these days. I use Corz excellent checksum utility: http://corz.org/windows/software/checksum/ You simply install it on a Windows box; then point to a share on the server; right-click; and select "Create checksums". This will, of course, run for a long time -- but when it's done you'll have checksums for every file on that share. Repeat for each share (or do it for each disk, whichever you prefer). Then in the future, you can easily check to see if any files have changed by just right-clicking and selecting "Verify checksums".

Gary, Thank you as well. I've seen your other posts about this. I feel a little bit better, but, I don't have backups of EVERYTHING on the array but perhaps the stuff I do have backups of I'll do a check sum compare to be sure.

I really appreciate all this. I definitely need a better strategy as power outages on sunny days do happen occasionally and so does "operator error". One thing that I liked about 4.7 vs. 5.0.4 is that when the array came up successfully, I'd get 2 very quick beeps from the PC speaker. I'd say to myself, great, let me login to the WebGUI. With 5.0.4 it seems this is gone so I really have no way to tell when the WebGUI is assessible besides hitting the IP a few times.

Thanks again,

Chris

December 20, 2013

Unfortunately there is no way to identify the files that are affected.

This is why many believe in creating CRC checksums of all the files on the disks to protect against data 'bit rot'. You can then periodically repeat this to see if any file checksums have changed.

Thanks for the reply. Interesting...not what I was hoping for but I had a feeling. In your opinion, should I be concerned? I mean 60 sectors not matching the parity disk 2 days after a parity check that was perfect concerns me. Nothing was being written to the disk at either time (the power outage or the hard reset when it was likely doing the parity check).

What is the best tool to map out CRC checksums in your opinion? I have about 4TB of data, I imagine this mapping would take a long time and lots of CPU cycles?

Anyone else have any thoughts?

Thanks again. I feel like I will always have a doubt now; I need to get a UPS urgently.

Chris

December 20, 2013

Hi All,

This is my 2nd topic in the last week as the unRAID has been through some issues (Box is now stable, running 5.0.4, no bad disks, good parity 2 days ago).

I parity checked 2 days ago with no errors. Yesterday I came home to no power for whatever reason (I don't have a UPS yet but it was a clear day). Power came back on; I started up my machines and the unRAID box. After about 5 minutes, the WEBGUI wasn't coming up but I could Telnet into the box. It was showing HD activity. I hastefully tried to do a manual powerdown in Telnet. It wasn't successful and ultimately I unmounted a few disks but it said some weren't mounted, etc. Anyway, a hard power off had to occur when I believe the box came up and had initiated a parity check (I don't know what happened on first boot, likely everything was fine and I just had to wait a bit longer for WEBGUI).

The parity check completed with 60 sync errors. I don't see any read errors in the log but the sectors that had the errors are spread out. I was wondering if there was anyway I could check to see what data was effected and "test" it to see if it's corrupt. Should I be worried about this?

Here is truncated piece of log:

Dec 19 19:28:37 PetroNAS kernel: md: correcting parity, sector=87817552

Dec 19 19:37:47 PetroNAS kernel: md: correcting parity, sector=159118360

Dec 19 20:29:32 PetroNAS kernel: md: correcting parity, sector=565978800

Dec 19 20:33:52 PetroNAS kernel: md: correcting parity, sector=598843768

Dec 19 21:05:12 PetroNAS kernel: md: correcting parity, sector=812384432

Dec 19 21:42:43 PetroNAS kernel: md: correcting parity, sector=1045692616

Dec 19 21:42:43 PetroNAS kernel: md: correcting parity, sector=1045692808

Dec 19 21:42:46 PetroNAS kernel: md: correcting parity, sector=1046099800

Dec 19 21:42:47 PetroNAS kernel: md: correcting parity, sector=1046295368

As you can see, it's spread out all over. Do I need to look at this command reiserfsck from this thread : http://lime-technology.com/forum/index.php?topic=3309.0 (which btw has great explanation of parity). Based on that thread, it seems like I shouldn't be worried since I didn't have any read errors but I just want some expert opinions and if there is some way I can check what data is across these sectors?

Thank you all for your help and time.

Chris

syslog_parity.txt

December 17, 2013

Read the release notes for all v5.0.x. Tom did something with AD and noted how to set it up properly.

Thank you, I will take a look.

December 17, 2013

Update:

I switched SMB network config to Workgroup instead of active directory and now the WebGUI is superfast. I have an AD server here, I don't exactly understand it, I couldn't get it to join. Anyhow, I'm running the permissions utility again because I'm having trouble accessing any shares. I'll report back.

December 16, 2013

Jonathan,

I've attached the system log. I really appreciate you taking the time (whenever you have a minute) to take a look and see if you see anything funny.

The webui was not accessible at this point (it doesn't take long).

I should also note that if I go under the share settings under SMB and hit apply then the webui no longer responds at all.

Thanks,

Chris

syslog_2.txt

December 14, 2013

Jonathan,

Thanks, I'm traveling right now, I should be able provide the log by Monday... I know the directions to get the log via console are in this forum somewhere, if you have the link handy great, otherwise I'll find it and post the log asap when I get home, likely on Monday. Thanks again, have a good weekend.

Chris

December 14, 2013

Hey Jonathanm,

Thanks for the response.

I've confirmed likely not client.

I have 2 different pcs on the same network. When it stops responding on the 1st pc, it won't load on the 2nd pc. It seems like the service running the webui completely stops or freezes.

I've tried all different browsers (ff,ie, chrom) and have tried waiting overnight to see if webui comes back. It doesn't. Again, box still responds via console and network without issue.

This is fresh of 5.0.4, no mods, happens right away (Webui is never fast.)

One of my drives is making a slight click noise every once and awhile. The array parity checked 45 days ago, all smart looks ok. I'm thinking of unplugging all drives and doing blank install and seeing how the webui responds then.

You think this is the next best option to eliminate the box?

Thanks

December 14, 2013

Hi everyone,

I've been running unraid 4.7 for quite some time (a few years). I noticed that access to the WebUI is very slow lately. I remember when it was very fast (changes screens within 1-2 seconds). I'm having a lag of about 10-15 seconds per operation (e.g., click from main to shares). I figured it was time to upgrade to 5.0.4 so I did that. The stock WebUI is still very slow. I decided to do a clean install of 5.0.4, the WebUI is still painfully slow. Everything is hardwired, Gigabit network. If I ping the box, it responds in <1ms every time.

The WebUI will also lockup if I push it too hard. The box is always available via telnet and I can gracefully stop services and unmount disks.

I have 2GB of RAM and it's a bit older of a system. I've never had data problems and read and write speeds are within spec (60-75MB read, 15-20MB write (no cache).

There are 8 HDs (one 2TB, five 1.5 TBs, and two 500GB). Do you think I need more RAM? It's a Gigabyte board (P35C-DS3R) but I swear this thing was fine before. I can't remember if I switched MOBOs out or when the problem actually started.

My problem is: I can ping the box fine, I can read/write to the box at expected speeds, and everything has been stable. Why is the WebUI sooooo slow?

Thanks,

Chris

EDIT: I should add, that ultimately, the Web UI becomes unaccessible after playing with it for awhile. Server still responds to PING, Telnet, and share is accessible. I get the webpage not available error from browser. I appreciate any thoughts.

cpetro45

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by cpetro45

Unclean Powerdown, disks spun down, Parity Check detected 1 Error (correct disabled), canceled, reran with correct, no error found

Unclean Powerdown, disks spun down, Parity Check detected 1 Error (correct disabled), canceled, reran with correct, no error found

Unclean Powerdown, disks spun down, Parity Check detected 1 Error (correct disabled), canceled, reran with correct, no error found

Unclean Powerdown, disks spun down, Parity Check detected 1 Error (correct disabled), canceled, reran with correct, no error found

Unclean Powerdown, disks spun down, Parity Check detected 1 Error (correct disabled), canceled, reran with correct, no error found

Unclean Powerdown, disks spun down, Parity Check detected 1 Error (correct disabled), canceled, reran with correct, no error found

Onboard SATA ICH8R took a dump or board / processor silently failed - story time

Preclear plugin

Preclear plugin

Preclear plugin

Preclear plugin

Preclear plugin

Accessing simplefeatures WebUI during Parity Check chokes speed

Accessing simplefeatures WebUI during Parity Check chokes speed

Power outage, likely hard power off during parity check, 60 sync errors

Power outage, likely hard power off during parity check, 60 sync errors

Power outage, likely hard power off during parity check, 60 sync errors

Power outage, likely hard power off during parity check, 60 sync errors

Power outage, likely hard power off during parity check, 60 sync errors

Web UI extremely slow(and then stops responding)

Web UI extremely slow(and then stops responding)

Web UI extremely slow(and then stops responding)

Web UI extremely slow(and then stops responding)

Web UI extremely slow(and then stops responding)

Web UI extremely slow(and then stops responding)