papnikol

Members
  • Posts

    341
  • Joined

  • Last visited

Everything posted by papnikol

  1. Those are not a problem, they are the attributes unRAID is monitoring and will warn if they change. Thanks for the info, good to know
  2. I know, unfortunately, at that moment, unraid web interface was not working, so i got the syslog using the command line Read the link. It tells you how to get the diagnostics from the command line. Next time (hopefully not soon) I will know . btw, after fixing everything and rechecking the drive (reiserfs check shows no problems anymore) i got the attached SMART report which seems OK. But clicking on the disk from main I get this (screencap attached): 5 Reallocated sectors count 187 Reported uncorrectable errors 188 Command time-out 197 Current pending sector count 198 Uncorrectable sector count It is difficult to understand which one is correct, when I have to decide whether I should RMA my WD red 6TB. why this difference? WDC_WD60EFRX-68MYMN1_WD-WX31D743YHKP-20161107-1419.txt
  3. Thanks, doing that now, keeping my fingers crossed. There were errors according to the test, so i am running using the --rebuild-tree option I did not manually start it. But, come to think of it, I suspect that it was the result of using unmenu->Disk Management->Filesystem check I know, unfortunately, at that moment, unraid web interface was not working, so i got the syslog using the command line
  4. Hi everyone I have a problem with my Unraid server. It appeared after upgrading to 6.2.3 but it certainly could be unrelated. So, these are the symptoms/timeline: 1. 2 days ago I show a redball on disk6 of my 10 disk + parity array. After checking it (there was a write error) and checking the drive, I found it OK and rebuilt it. Everything went fine and all files were accessible. 2. I was planning for a parity check but hadn't started it yet, when I saw these messages: Nov 4 12:02:13 towerP shfs/user: err: shfs_readdir: fstatat: Dirk.Gentlys.Holistic.Detective.Agency.S01E02.720p.HDTV.x264-KILLERS.mp4 (2) No such file or directory Nov 4 12:02:13 towerP shfs/user: err: shfs_readdir: readdir_r: /mnt/disk6/-TV/-- NOT SEEN (2) No such file or directory Nov 4 12:02:39 towerP shfs/user: err: shfs_readdir: fstatat: Dirk.Gentlys.Holistic.Detective.Agency.S01E02.720p.HDTV.x264-KILLERS.mp4 (2) No such file or directory Nov 4 12:02:39 towerP shfs/user: err: shfs_readdir: readdir_r: /mnt/disk6/-TV/-- NOT SEEN (2) No such file or directory Nov 5 03:00:01 towerP shfs/user: err: shfs_readdir: fstatat: Dirk.Gentlys.Holistic.Detective.Agency.S01E02.720p.HDTV.x264-KILLERS.mp4 (2) No such file or directory Nov 5 03:00:01 towerP shfs/user: err: shfs_readdir: readdir_r: /mnt/disk6/-TV/-- NOT SEEN (2) No such file or directory Nov 5 10:30:02 towerP kernel: md: sync done. time=100688sec Nov 5 10:30:02 towerP kernel: md: recovery thread: completion status: 0 Nov 5 10:35:33 towerP kernel: REISERFS warning: reiserfs-5089 is_internal: free space seems wrong: level=3, nr_items=159, free_space=376 rdkey Nov 5 10:35:33 towerP kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 973198707. Fsck? Nov 5 10:35:33 towerP kernel: REISERFS (device md6): Remounting filesystem read-only Nov 5 10:35:33 towerP kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [4379 26687 0x0 SD] Nov 5 10:35:33 towerP kernel: REISERFS warning: reiserfs-5089 is_internal: free space seems wrong: level=3, nr_items=159, free_space=376 rdkey Nov 5 10:35:33 towerP kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 973198707. Fsck? Nov 5 10:35:33 towerP kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [4379 26640 0x0 SD] Then, this message keeps repeating: Nov 5 13:59:55 towerP kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 973198707. Fsck? Nov 5 13:59:55 towerP kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [4379 26663 0x0 SD] 3. I could access the array for a while (even files on disk 6) but now it is not accessible through the web interface but is accessible by command line. UPDATE: accessible through the web interface again 4. There is a process "reiserfsck" running so I dont want to reboot. Any input as to how i should proceed? (syslog - pruned because of size - attached) syslog.bak.zip
  5. In network settings, try setting a static DNS addresses of 8.8.8.8 and 8.8.4.4 and try again thanks, that solved the problem, although I do not understand why Google's DNS is helping
  6. Hi, I am tryin to install the plugin through the unRAID Plugin Manager I get the following: plugin: installing: https://raw.githubusercontent.com/theone11/serverlayout_plugin/master/serverlayout-package-2015.09.25.tar.gz plugin: downloading https://raw.githubusercontent.com/theone11/serverlayout_plugin/master/serverlayout-package-2015.09.25.tar.gz plugin: downloading: https://raw.githubusercontent.com/theone11/serverlayout_plugin/master/serverlayout-package-2015.09.25.tar.gz ... done Warning: simplexml_load_file(): /tmp/plugins/serverlayout-package-2015.09.25.tar.gz:1: parser error : Start tag expected, '<' not found in /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin on line 193 Warning: simplexml_load_file(): ??V in /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin on line 193 Warning: simplexml_load_file(): ^ in /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin on line 193 plugin: xml parse error Can someone help?
  7. Ok, I ran 9 passes of memtest with no errors (i wish it were the memory, the solution would have been easy).
  8. Thanks for the info, I had never seen this. I am trying it now, although I think it is not the perfect choice for my case, since errors, appear in different places of the HDDs. If the errors are in different places each time, it is more likely to be a memory problem, disk controller problem, or a power supply problem. Very first thing to check is to run a memory test, preferably overnight (or at least several full passes). As often as not, a bad memory strip is the issue. Joe L. It took me sometime, but I am back. Well, I tried a memory test (although unraid memtest allows only one pass, for some reason) and there were no errors. Just for good measure I changed back to an old SASLP-MV8 in place of a fairly recent SAS2LP-MV8 (the only extension card) and tried to run a non-correcting parity check. I let it get to around 7% twice and I still get errors but a very strange thing I noticed is that for the 2nd run, there are only 2 errors but they also happen to be the same with 2 out of 4 errors of the first run: run 1: Sep 20 17:34:51 towerP kernel: md: parity incorrect, sector=188020848 (Errors) Sep 20 17:49:47 towerP kernel: md: parity incorrect, sector=311953656 (Errors) Sep 20 17:54:02 towerP kernel: md: parity incorrect, sector=358760056 (Errors) Sep 20 17:59:23 towerP kernel: md: parity incorrect, sector=420290960 (Errors) run 2: Sep 20 19:01:26 towerP kernel: md: parity incorrect, sector=311953656 (Errors) Sep 20 19:10:50 towerP kernel: md: parity incorrect, sector=420290960 (Errors) This is REALLY strange, because if the reason of the problem were the RAM, the controller or the PSU, I would expect the errors to be erratic. UPDATE: I run the check for the 3rd time and the aforementioned 2 errors persist while other stochastic errors appear. I am thinking that whatever the error might be, I probably ran parity check once without disabling parity correct. This means that "wrong" corrections were written to the parity drive and are now found. Of course the problem of parity errors persists and I have yet to pinpoint the reason.
  9. Thanks for the info, I had never seen this. I am trying it now, although I think it is not the perfect choice for my case, since errors, appear in different places of the HDDs.
  10. Hi everyone, I have a problem with my unRAID server. I started a parity check and noticed it found quite a few errrors (about 10 errors at 10%). So I stopped it and restarted. It still finds errors some in the same, some in different positions. The number of errors does not seem to get higher after every run. - I performed a memcheck but there does not seem to be any problem. - SMART status seems OK for all disks. I am starting to fear it might be a controller problem (I have an AOC-SAS2LP-MV8 on an Asus P5Q Deluxe Mobo) although I would think that i would have more errors. What else should I check in order to pinpoint the problem? Thanks for any help. PS1: I attached the results of Tools/diagnostics PS2: I notice that the syslog does not mention the parity errors, probably because I run it without writing corrections to parity disk. But here are the sector errors from 2 consecutive runs up to 10% I run some time ago (red font highlights same sector in both runs): 1ST RUN sector=227271416 sector=326803560 sector=870691376 sector=1254335696 sector=1635813392 sector=2133668016 sector=2361685768 sector=2571393240 sector=2628717368 sector=2763282288 sector=3294123952 sector=3680661280 sector=4450802440 sector=5136242464 sector=5705459328 sector=6185627984 sector=8193815688 sector=9479063848 sector=9653427048 sector=1050839046 2ND RUN sector=187728488 sector=227271416 sector=247795216 sector=326803560 sector=747245168 sector=870691376 sector=247795216 sector=747245168 sector=949971664 sector=978378680 sector=999114088 sector=1034802856 sector=1142471208 sector=1170450440 sector=1328714912 towerp-diagnostics-20150902-0015.zip
  11. Hi everyone, I have 2 unraid servers, one of which is getting quite old (SATA1, 9 2tb drives - takes more than 30 hrs to perform parity sync). So, I was thinking that I would like to replace the core components. I mainly need a mobo+CPU+memory (I have a good PSU, case, controllers) and I need a few suggestions for a setup that is cheap but fulfills these requirements: [*]more than 4 SATA2/3 ports onboard (ideally maybe [*]2 PCI-express 8x gates that can be simultaneously used for additional hard disk controllers [*]onboard graphics (so i dont have to use additional graphics card) [*]a CPU with low consumption I live in Greece, but I guess most of the components can be found here. Thanks in advance for any help
  12. So I ran a preclear cycle with no problems. I will try to write some data after a parity check, which in my SATA I Mobo takes more than 25hrs even though all disks are 2TB only But I think everything is OK. It still is strange that the drive stopped responding but probably a little mystery in our life is a good thing
  13. Just adding my experience, I have a 7 disk array where there are 4 6Tb HDDs (one of which is parity) and three older 3TB WD Green HDDs (but without a cache disk). I have not encountered any problems of this type.
  14. I rebooted the PC after securing the files of the problematic disk. The smart tatus does not show any problems: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 154 130 021 Pre-fail Always - 9258 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 663 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 8089 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 234 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 132 193 Load_Cycle_Count 0x0032 195 195 000 Old_age Always - 16118 194 Temperature_Celsius 0x0022 114 105 000 Old_age Always - 38 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 I will run a preclear cycle and get back to you (it takes long, it is a SATA I Mobo)
  15. I did: smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: /5:0:2:0 Product: User Capacity: 600,332,565,813,390,450 bytes [600 PB] Logical block size: 774843950 bytes Physical block size: 3099375800 bytes Lowest aligned LBA: 14896 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >> Terminate command early due to bad response to IEC mode page A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Seems I have a 600Petabyte HDD btw, smart works for my other drives
  16. I get this: smartctl -a -d ata /dev/sdh smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org Read Device Identity failed: Input/output error A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. I guess the drive stopped responding and I need to reboot in order to check (I am currently copying data that exists in this disk - as it is virtualized by the rest if the array actually - back to my PC and then I will reboot). But does fact that it does not respond imply something about the nature of the problem?
  17. Hi everyone, I had just entered into my array a precleared drive and left it overnight while copying data to it. In the morning it was redballed. I noticed this error that coincides with the time around which I put the disk into the array: Nov 4 01:07:53 towerS emhttp: disk9 mount error: 32 (Errors) But then there are no more errors while files are being copied. After 7 hours there is a flood of errors (see attached file). The errors are of this type: Nov 4 08:33:31 towerS kernel: sas: sas_eh_handle_sas_errors: task 0xf4260e00 is aborted (Errors) Nov 4 08:33:31 towerS kernel: sas: ata7: end_device-5:2: cmd error handler (Errors) Nov 4 08:33:31 towerS kernel: ata7.00: exception Emask 0x0 SAct 0x80 SErr 0x0 action 0x6 frozen (Errors) Nov 4 08:33:31 towerS kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) (Errors) Nov 4 08:33:36 towerS kernel: sas: sas_ata_task_done: SAS error 8a (Errors) Nov 4 08:33:36 towerS kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x11) (Errors) Nov 4 08:33:44 towerS kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x5) (Errors) Nov 4 08:33:49 towerS kernel: sas: sas_ata_task_done: SAS error 8a (Errors) Nov 4 08:33:49 towerS kernel: ata7.00: disabled (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) Nov 4 08:33:49 towerS kernel: end_request: I/O error, dev sdh, sector 1969226936 (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) Nov 4 08:33:49 towerS kernel: end_request: I/O error, dev sdh, sector 1970817768 (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) and (predominantly) of this type Nov 4 08:33:49 towerS kernel: md: disk9 read error, sector=1969226872 (Errors) . . . Nov 4 08:33:49 towerS kernel: md: disk9 write error, sector=1970817784 (Errors) . . . The disk is on a SASLP-MV8 card using a SAS-to-4SATA cable. The same cable connect 2 more drives without problems. Additionally it is this kind of cables that have a clip so I doubt it was moved. Any suggestions as to whether this is a hard disk problem or something else? red_ball_4_nov_2014.txt
  18. I use fuser -mv /mnt/*/* lsof /mnt/* The only process shown is smbd Thanks, I will have that in mind, although I have a feeling that for some reason, when I have this problem, the drive does not stop being busy
  19. Actually, in this case, I had 2 drives mounted to SNAP, only one was being used but the other one was not allowed to unmount for some reason. When I stopped using 1st drive, I could unmount both of them. Still there have been some cases where the drive appears to be busy while it is not, so I cannot unmount...
  20. Hi everybody, I am using SNAP to mount some drives (ReiserFS drives on unRaid 5) in 2 different unRaid servers. When I am trying to unmount them, all choices on right click are greyed out (except for 'refresh status' and 'delete share name'. So i cannot unmount or reject unless I reboot. Should I stop Samba? Is there something obvious I am doing wrong? Basically, I would be happy if I could force an unmount action. Thanks in advance
  21. Just FYI I also recently started migrating to a new server so I tried 2 solutions that both worked: 1. mounting my hdd on the new server using an external docking station and SNAP 2. mounting my hdd on my windows machine using theexternal docking station and yareg (which allows windows machines to read reiserfs) The 1st solution was much easier and faster, obviously.
  22. I guess you mean that if I know each drive's position I can accept the parity as valid. But what if for some reason you cannot boot (e.g. mobo or cpu malfuntion) and I have no screenshot? (this is not my case, just wondering)
  23. Yes, but I am using the HX850 for my other unraid server Also, I know I could use a non-modular PSU, but it is nice not having all those cables in your case. additionally, I recently found these: http://i.ebayimg.com/00/s/NzUwWDEwNTA=/z/YTIAAOSwPe1T0UmH/$_57.JPG which means I can even further reduce the number of cables