Jump to content

cyrnel

Members
  • Posts

    560
  • Joined

  • Last visited

Everything posted by cyrnel

  1. I rebooted . The array starts on its own but Disk6 still comes up: Partition format: unknown File sytem type: reiserfs The other disks look normal with "Partition format: MBR: 4K-aligned" If I stop and start the array, Disk6 changes to format:MBR: 4K-aligned. A reiserfsck check on Disk6 says: No corruptions found There are on the filesystem: Leaves 494905 Internal nodes 2979 Directories 20601 Other files 178043 Data block pointers 469501241 (179201 of them are zero) Safe links 0 ########### reiserfsck finished at Tue Jun 18 14:07:58 2013 ########### Scratching my head here. I'm tempted to start another nc parity check but will wait for more experienced input. Last thing I want is more magic.
  2. garycase: array was running when disk6 format showed as unknown, hence my concern. Other drives showed normal MBR: 4K-aligned. BTW, drives are: 3TB parity, d1-d9 all 2TB, and a 500GB cache. Could a mod please move my post and replies?
  3. This is messy. Move to separate support post or leave here? Just upgraded RC13 to 15 last night and have slight chaos. Back story: RC13 had been working well though I had never tried shutting it down. Had to leave town for a week so shut it down via GUI without visible problem. Returned yesterday to see RC15 released. Powered up server (first time since trip) but smb wasn't responding. Shut down via gui, moved flash drive to desktop and copied new image/root/cfg directly. Returned flash drive to server and booted. Gui and SMB both responding normally. Started a nc parity check. Great speeds. Returned a bit ago to find the summary: Last checked on Tue Jun 18 09:25:46 2013 PDT, finding 2 errors. Yet in the drive list above, all of the drives show 0 errors. It's been a very long time since I've seen errors after a parity check. Should the individual drive counts reflect the errors? Looking in syslog I see many identical errors for md6: Jun 18 00:xx:xx Tower1 kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 14453855. Fsck? There are hundreds of lines for every advancing second, all for the same block, making for a 130MB syslog. I've trimmed most of it in the attached log. Checking disk6 settings from the GUI shows Partition Format: unknown. I've stopped the array and could use suggestions. Unmenu is running. Hardware: Supermicro MBD-X9SCL-F-O MB, i3-2100 CPU, 4GB RAM, Supermicro AOC-SASLP-MV8 card, medley of drives syslog.txt
  4. Of course. I wasn't questioning anything. I'd noticed the other posts, possibly mirroring my "wouldn't that be nice" optimism that the rebuild would somehow be smart about the last TB and only consider it once a data drive intruded into those sectors.
  5. The last 1TB would go very fast, as it would only be reading the parity disk and doing xor operations with the zeros simulated from where data disks are smaller than the parity disk. I'll bet you could see parity check speeds well over 100MB/s at that point. Joe L. Yes, it does the whole thing. I just did one of these weirdo upgrades after updating to rc11. This system's new parity drive is a 3TB Seagate ST3000DM001, with the rest a mix of 2TB drives. That last non-existent TB was surprisingly slow. Surprising to my initial hope it might jump to the end. Really, it ran at the raw speed of the drive on a MB SATA port, starting in the 140MB/s range and dropping near 100MB/s toward the end.
  6. Don't want to suggest the obvious or go all off-topic, but how are individual drive reads and writes on your system w/rc11? Best would be to see raw r/w speeds of the same drives connected via onboard vs SAS2LP ports, say via dd. And if you already have threads going trying to track this down then feel free to ignore me completely.
  7. A list of attached hardware might help, too. udev looks to be taking a wild ride.
  8. Very nice, Tom. This is acting somewhat faster than rc8. 5GB test file. Single test each, so FWIW. Copied locally via shell: -protected array drive -> cache drive 97MB/s -cache drive -> protected array drive 42MB/s SMB client: -protected array drive -> Win7 Pro SSD 105MB/s -Win7 Pro SSD -> protected array drive 49MB/s Thanks!
  9. Attached are my two latest syslogs (most recent saved and current). Sorry for the minimal information; drowning a bit here. syslog-20121208-033018.zip syslog-2012-12-11.zip
  10. I just ran into what looks like the same thing, though I was also preclearing a disk. In my case New Permissions was on disk 6 of 10 when emhttp stopped responding. Shell access still okay. All screen sessions got this output: root@Tower1:~# screen -r 917203+7 records in 917203+7 records out 1923519705088 bytes (1.9 TB) copied, 10398.5 s, 185 MB/s Wrote 1,923,519,705,088 bytes out of 3,000,592,982,016 bytes (64% Done) Message from syslogd@Tower1 at Mon Dec 10 02:20:48 2012 ... Message from syslogd@Tower1 at Mon Dec 10 02:20:48 2012 ... Tower1 kernel: Process scsi_eh_0 (pid: 853, ti=f2602000 task=f76806c0 task.ti=f2602000) Tower1 kernel: Stack: Message from syslogd@Tower1 at Mon Dec 10 02:20:48 2012 ... Tower1 kernel: Call Trace: Message from syslogd@Tower1 at Mon Dec 10 02:20:48 2012 ... Tower1 kernel: EIP: [<f851a7bf>] mvs_slot_task_free+0xf/0x118 [mvsas] SS:ESP 0068:f2603eb8 Message from syslogd@Tower1 at Mon Dec 10 02:20:48 2012 ... Tower1 kernel: Code: 02 00 ff 75 dc ff 70 10 89 d8 ff 96 c0 00 00 00 31 c0 5b 5e 8d 65 f4 5b 5e 5f 5d c3 55 89 e5 57 89 c7 56 89 d6 53 89 cb 83 ec 0c <83> 79 08 00 0f 84 f7 00 00 00 f6 42 14 05 75 44 8b 49 0c 85 c9 This is 5.0-rc8a AiO. Hate seeing mvsas.
  11. I saw this with earlier versions and just bumped up VLCs buffering as a band-aid. It feels like a QoS problem, where something in Unraid/stack/samba... is reordering or prioritizing requests for efficiency over responsiveness.
  12. It could be a simple matter of tuning. While the CPU may be idle the chipset could be experiencing a train wreck of interrupts and memory transfers from a non-optimal configuration. Yes, this is still speculation. I apologize for not even trying multiple configs on my i3. Too much else fighting for time right now.
  13. I suspect it's important with the lower powered processors to set processor affinity for IO code. Disk and network should probably each stick with a single core. This is speculation, but I'm thinking the effects of caches being invalidated could increase with user share layers, resulting in more work/less results. My smallest box is now an i3. Would any e350/450/Atom users care to test?
  14. Does it matter if the media being played is on the same share or same disk as where the file operation is taking place? Something must be suspending activity. Does deleting a large file cause the same problem? I'm thinking this might be another Reiserism.
  15. rsync tends to pretty well saturate a link but I don't see link problems in your log. Is this log from one of the times your system failed? Have you seen anything interesting on the console? A tail -f /var/log/syslog at the console can be very revealing if telnet sessions aren't showing you anything. Is there a difference in the contents of the disks that might contribute? When I was still fighting realtek nics in early versions of 5 I often found differences in the failures depending on the files I was rsyncing. Many small files had a better chance than large files. Lots of small files require more work by the cpu & disk systems while few large files are mostly waiting on the network. Of course my issues were the basic dropped packets & overruns so YMMV.
  16. WD Green? My 5.0 server has several drives with issues but the one that always seems to trip things up is a 2G-EARS that generates a hot-unplug/plug. Anyway, been running the new version since just after you posted. I'm dropping my spin-down timer to 15min again to see how things fare.
  17. Has anyone seen code to consolidate paths & files to as few disks as possible? Common situation with our UR servers I'm sure. You know, trade files around as space allows to keep things together. I could change my split levels and move things around but it's too warm to think clearly. I'll fund this one with pizza & beer. :thumbsup:
  18. Can you duplicate the problem when making sure none of the drives are sleeping? Can you duplicate the problem without the plugins? Just RC5.
  19. If some of your disks are smaller they'll finish first and the remaining disks can proceed more quickly.
  20. *&#(&@(&#!! Pardon my French. Finally another fatal BLK_EH_NOT_HANDLED. This time the server ran from 6/6 to 7/2. The error actually hit last night: Jul 1 21:22:21 Tower1 kernel: sas: command 0xf76a9000, task 0xf02f7040, timed out: BLK_EH_NOT_HANDLED We watched a movie tonight without any problem. Things only went south - and now the server is blocked - after I refreshed the web interface a few minutes ago.
  21. Your syslog shows couch potato and a bunch of plugins are being started. Disabling them as suggested is the first step. If it doesn't work with the extras disabled then that syslog would be helpful.
  22. Likewise, no problems here with rc4 with all my drives always spinning. I have a BLK_EH_NOT_HANDLED but nothing bad happened. The next activity was mover several hours later. System's been up since the 6th. sas_scsi_recover_host took 4 seconds to complete. Can't help but wonder what's happening with interrupts during that time, and what happens if hdparm or anything drive-related fires. Jun 18 01:40:27 Tower1 kernel: sas: command 0xf2654540, task 0xf0237680, timed out: BLK_EH_NOT_HANDLED (Drive related) Jun 18 01:40:27 Tower1 kernel: sas: Enter sas_scsi_recover_host (Drive related) Jun 18 01:40:27 Tower1 kernel: sas: trying to find task 0xf0237680 (Drive related) Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: aborting task 0xf0237680 (Drive related) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f74a0000 task=f0237680 slot=f74b1640 slot_idx=x2 (System) Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: querying task 0xf0237680 (Drive related) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5 (System) Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: task 0xf0237680 failed to abort (Minor Issues) Jun 18 01:40:27 Tower1 kernel: sas: task 0xf0237680 is not at LU: I_T recover (Drive related) Jun 18 01:40:27 Tower1 kernel: sas: I_T nexus reset for dev 0300000000000000 (Drive related) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001001 (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1011081 (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2278:plugin interrupt but phy3 is gone (System) Jun 18 01:40:29 Tower1 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis (Drive related) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2139:phy3 Attached Device (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x81 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x10000 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 3 attach dev info is 400 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 3 attach sas addr is 3 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 3 byte dmaded. (System) Jun 18 01:40:29 Tower1 kernel: sas: sas_form_port: phy3 belongs to port0 already(1)! (Drive related) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 (System) Jun 18 01:40:29 Tower1 kernel: sas: I_T 0300000000000000 recovered (Drive related) Jun 18 01:40:29 Tower1 kernel: sas: sas_ata_task_done: SAS error 8d (Errors) Jun 18 01:40:29 Tower1 kernel: ata7: sas eh calling libata port error handler (Errors) Jun 18 01:40:29 Tower1 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 t0 (Errors) Jun 18 01:40:29 Tower1 kernel: ata7.00: failed command: CHECK POWER MODE (Minor Issues) Jun 18 01:40:29 Tower1 kernel: ata7.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0 (Drive related) Jun 18 01:40:29 Tower1 kernel: res 01/04:04:b8:07:a7/00:00:16:00:00/40 Emask 0x3 (HSM violation) (Errors) Jun 18 01:40:29 Tower1 kernel: ata7.00: status: { ERR } (Drive related) Jun 18 01:40:29 Tower1 kernel: ata7.00: error: { ABRT } (Errors) Jun 18 01:40:29 Tower1 kernel: ata7: hard resetting link (Minor Issues) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x11081 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2278:plugin interrupt but phy3 is gone (System) Jun 18 01:40:31 Tower1 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis (Drive related) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2139:phy3 Attached Device (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x81 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x10000 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 3 attach dev info is 400 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 3 attach sas addr is 3 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 3 byte dmaded. (System) Jun 18 01:40:31 Tower1 kernel: sas: sas_form_port: phy3 belongs to port0 already(1)! (Drive related) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 (System) Jun 18 01:40:31 Tower1 kernel: sas: sas_ata_hard_reset: Found ATA device. (Drive related) Jun 18 01:40:31 Tower1 kernel: ata7.00: configured for UDMA/133 (Drive related) Jun 18 01:40:31 Tower1 kernel: ata7: EH complete (Drive related) Jun 18 01:40:31 Tower1 kernel: ata8: sas eh calling libata port error handler (Errors) Jun 18 01:40:31 Tower1 kernel: ata9: sas eh calling libata port error handler (Errors) Jun 18 01:40:31 Tower1 kernel: ata10: sas eh calling libata port error handler (Errors) Jun 18 01:40:31 Tower1 kernel: ata11: sas eh calling libata port error handler (Errors) Jun 18 01:40:31 Tower1 kernel: sas: --- Exit sas_scsi_recover_host (Drive related)
  23. Could post a syslog that included activity from when you notice the slow read speeds? i.e.: Boot Do your read test, preferably using large files After confirming things were slow, submit syslog Apologies if that's what you already did, but at first glance there wasn't much interesting in the syslog I downloaded.
  24. Yes, though it defeats one of unraid's main benefits. In my case the failures seemed to follow this sequence: 1. system fine, I leave 2. BLK_EH_NOT_HANDLED 3. normal timer-based drive spin down 4. I'm back. The first SMB access times out. hdparm will have gone zombie, presumably when waking drives. Most drive access blocked. Telnet still functional. My rc4 system has been up since the 6th. In that time syslog shows two BLK_EH_NOT_HANDLED errors but the system has remained up. I don't know if it's luck, these errors have been recoverable, or because drives aren't being spun down/up. And I don't have enough failures saved to be certain of the sequence. When the drives went away previously I didn't make the effort to copy syslog to the telnet session.
  25. This was without any other change in hardware or unraid config? Hate it when that happens.
×
×
  • Create New...