ShangHangin

Members
  • Posts

    102
  • Joined

  • Last visited

Everything posted by ShangHangin

  1. After 3 days running with no reboots (had added a new drive the other day), my issue returned. Running 5.0b14. At the time, was running a TM back-up from my desktop Mac to a dedicated drive/user share. Also accessing music files in user shares for iTunes from the Media Center when everything locked up - both unMenu and unRaid GUI. Powered down via hard power down, but with script installed, the array stopped cleanly. The system never fully powered down - had to do a physical reset. When I restarted the system, the array required a manual restart through the GUI. It's off on a parity check now. SMB is accessible, but AFP is slow to give access. The array shows in a Mac Finder window, but none of the shares are accessible. Within a few minutes they become accessible. [is this a usual behavior that AFP is not accessible during the start of parity check?] System: ASUS P8Z68-V LX mother board with 6 onboard SATA; running latest 3703 BIOS Intel i5-2300, 2.8Ghz; 2-2G RAM Supermicro AOC SASLP MV8 Antec 750W PSU 6- 2TB data, 1- 2TB Parity, 1- 500G Cache - all Seagate drives Antec 300 case Sony 4GB flash drive Drive connections: Motherboard SATA: 5 data + 1 parity (4 data in user shares, 1 data dedicated to TM back up) SuperMicro SATA: 1 data, 1 cache (1 data slated for future TM back up - not used yet) Have checked all the cables, seem tight and well connected. SuperMicro card is well seated. Installed: unMenu Additional Packages: "C" Compiler Mail & ssmtp Monthly Parity Apple . file remover Hourly mail update Network and disk performance Powerdown script Thoughts and guidance appreciated. SH Here is an excerpt - full syslog attached + a syslog after the hard power down was initiated. May 3 19:43:02 HAL_9000 vsftpd[18661]: connect from 192.168.11.1 (192.168.11.1) (Routine) HERE IS WHERE EVERYTHING WENT SIDEWAYS: May 3 19:44:20 HAL_9000 kernel: irq 16: nobody cared (try booting with the "irqpoll" option) (Errors) May 3 19:44:20 HAL_9000 kernel: Pid: 0, comm: swapper Not tainted 3.1.1-unRAID #1 (Errors) May 3 19:44:20 HAL_9000 kernel: Call Trace: (Errors) May 3 19:44:20 HAL_9000 kernel: [<c104fa8c>] __report_bad_irq+0x1f/0x95 (Errors) May 3 19:44:20 HAL_9000 kernel: [<c104fc39>] note_interrupt+0x137/0x1a8 (Errors) May 3 19:44:20 HAL_9000 kernel: [<c104e776>] handle_irq_event_percpu+0xef/0x100 (Errors) May 3 19:44:20 HAL_9000 kernel: [<c1050152>] ? handle_edge_irq+0xcb/0xcb (Errors) May 3 19:44:20 HAL_9000 kernel: [<c104e7ab>] handle_irq_event+0x24/0x3b (Errors) May 3 19:44:20 HAL_9000 kernel: [<c1050152>] ? handle_edge_irq+0xcb/0xcb (Errors) May 3 19:44:20 HAL_9000 kernel: [<c10501bb>] handle_fasteoi_irq+0x69/0x82 (Errors) May 3 19:44:20 HAL_9000 kernel: <IRQ> [<c1003566>] ? do_IRQ+0x37/0x90 May 3 19:44:20 HAL_9000 kernel: [<c130c669>] ? common_interrupt+0x29/0x30 (Errors) May 3 19:44:20 HAL_9000 kernel: [<c11ddd89>] ? acpi_idle_enter_bm+0x22a/0x25e (Errors) May 3 19:44:20 HAL_9000 kernel: [<c12734f0>] ? cpuidle_idle_call+0x75/0xbd (Errors) May 3 19:44:20 HAL_9000 kernel: [<c1001a5f>] ? cpu_idle+0x39/0x5a (Errors) May 3 19:44:20 HAL_9000 kernel: [<c12fbd40>] ? rest_init+0x58/0x5a (Errors) May 3 19:44:20 HAL_9000 kernel: [<c145172d>] ? start_kernel+0x28c/0x291 (Errors) May 3 19:44:20 HAL_9000 kernel: [<c14510b0>] ? i386_start_kernel+0xb0/0xb7 (Errors) May 3 19:44:20 HAL_9000 kernel: handlers: May 3 19:44:20 HAL_9000 kernel: [<f84750c9>] mvs_interrupt May 3 19:44:20 HAL_9000 kernel: Disabling IRQ #16 May 3 19:44:35 HAL_9000 kernel: sas: command 0xdfeb7e40, task 0xf2acf2c0, timed out: BLK_EH_NOT_HANDLED (Drive related) May 3 19:44:35 HAL_9000 kernel: sas: command 0xdff5fb40, task 0xdf9b6140, timed out: BLK_EH_NOT_HANDLED (Drive related) May 3 19:44:35 HAL_9000 kernel: sas: Enter sas_scsi_recover_host (Drive related) May 3 19:44:35 HAL_9000 kernel: sas: trying to find task 0xf2acf2c0 (Drive related) May 3 19:44:35 HAL_9000 kernel: sas: sas_scsi_find_task: aborting task 0xf2acf2c0 (Drive related) May 3 19:44:35 HAL_9000 kernel: drivers/scsi/mvsas/mv_sas.c 1678:mvs_abort_task() mvi=dfa40000 task=f2acf2c0 slot=dfa516a0 slot_idx=x0 (System) May 3 19:44:35 HAL_9000 kernel: sas: sas_scsi_find_task: task 0xf2acf2c0 is aborted (Drive related) May 3 19:44:35 HAL_9000 kernel: sas: sas_eh_handle_sas_errors: task 0xf2acf2c0 is aborted (Errors) May 3 19:44:35 HAL_9000 kernel: sas: trying to find task 0xdf9b6140 (Drive related) May 3 19:44:35 HAL_9000 kernel: sas: sas_scsi_find_task: aborting task 0xdf9b6140 (Drive related) May 3 19:44:35 HAL_9000 kernel: drivers/scsi/mvsas/mv_sas.c 1678:mvs_abort_task() mvi=dfa40000 task=df9b6140 slot=dfa516d4 slot_idx=x1 (System) May 3 19:44:35 HAL_9000 kernel: sas: sas_scsi_find_task: task 0xdf9b6140 is aborted (Drive related) May 3 19:44:35 HAL_9000 kernel: sas: sas_eh_handle_sas_errors: task 0xdf9b6140 is aborted (Errors) May 3 19:44:35 HAL_9000 kernel: ata5: sas eh calling libata port error handler (Errors) May 3 19:44:35 HAL_9000 kernel: ata6: sas eh calling libata port error handler (Errors) May 3 19:44:35 HAL_9000 kernel: sas: --- Exit sas_scsi_recover_host (Drive related) May 3 19:44:53 HAL_9000 kernel: sas: command 0xdca7af00, task 0xf2b6fcc0, timed out: BLK_EH_NOT_HANDLED (Drive related) May 3 19:44:53 HAL_9000 kernel: sas: command 0xea103300, task 0xdf9b6140, timed out: BLK_EH_NOT_HANDLED (Drive related) May 3 19:44:53 HAL_9000 kernel: sas: Enter sas_scsi_recover_host (Drive related) May 3 19:44:53 HAL_9000 kernel: sas: trying to find task 0xf2b6fcc0 (Drive related) May 3 19:44:53 HAL_9000 kernel: sas: sas_scsi_find_task: aborting task 0xf2b6fcc0 (Drive related) May 3 19:44:53 HAL_9000 kernel: drivers/scsi/mvsas/mv_sas.c 1678:mvs_abort_task() mvi=dfa40000 task=f2b6fcc0 slot=dfa516d4 slot_idx=x1 (System) Syslog_20120503_@_Powerdown.txt.zip Syslog_20120503.txt.zip
  2. I was adding a new drive yesterday. Precleared, added to the array, array did its format routine...all no issues. I then went to create a user share on the drive (dedicated TM), but it would not create the share. I had my console on the server and it had some message (which I forgot to write down or capture) about "no room on disk /mnt/user/" (not verbatim). It hit me I needed to run the permissions script, which I did. Then I could assign the user share no issue. Is this normal behavior? Is the sequence of steps documented anywhere? If not, is there a 5.0 unofficial user guide to be edited? SH
  3. http://lime-technology.com/forum/index.php?topic=19819.0 Since I have adopted patience and waited for the GUI to come back rather then reloading, I have had no issues. From Tom's comment in the 5.0rc1 thread above, GUI performance is a known issue. Will continue using 5.0b14 until rc2 or later are released. I will mark as resolved for the moderators. Lesson learned: Have patience, good things come to those who wait! SH
  4. Moved to a SuperMicro AOC SASLP MV8 card. 4 days and so far so good. Have marked SOLVED. Will keep monitoring for further issues. Lesson learned: Don't buy a cheap Chinese SiI card - most likely a fake. SH
  5. I found my issue. I went back to my MB to check the BIOS version and settings. I thought I had the latest BIOS based on the January 2012 build date. But on further investigation, found that the BIOS rev I had was not even listed on the ASUS website for my MB (how this BIOS got on the MB is a mystery to me). But after a reflash of the BIOS to the latest, all is good. The lesson learned: Do not assume your BIOS is correct or up to date. Check out all your hardware and BIOS versions first. I was pleased that ASUS had a very easy built-in tool to reflash from the USB. SH
  6. dgaschk, Rob L - thanks. That is the direction I have been going. I had already re-flashed the BIOS on the SiI 3114 card to 5.4.0.3. But this was not too successful. I saw that there was 5.5.0.0 version BIOS, but not sure on the compatibility - any thoughts?. Currently, I have pulled my Cache drive from the array and connected the other 6 drives to the SATA ports on the MB to run and stabilize for a bit. I am looking to add-on another card - Any thoughts on LSI 3081E compatibility? (There are limited quality add-on cards here in Shanghai - lots of gaming gear, but no server gear) Thanks, SH
  7. After another day and a anther successful parity rebuild, same results. Any idea why the system disables IRQ17? Thanks, SH Apr 22 04:20:34 HAL_9000 kernel: irq 17: nobody cared (try booting with the "irqpoll" option) Apr 22 04:20:34 HAL_9000 kernel: Pid: 0, comm: swapper Not tainted 3.1.1-unRAID #1 Apr 22 04:20:34 HAL_9000 kernel: Call Trace: Apr 22 04:20:34 HAL_9000 kernel: [<c104fa8c>] __report_bad_irq+0x1f/0x95 Apr 22 04:20:34 HAL_9000 kernel: [<c104fc39>] note_interrupt+0x137/0x1a8 Apr 22 04:20:34 HAL_9000 kernel: [<f8492832>] ? sil_interrupt+0x1d/0x67 [sata_sil] Apr 22 04:20:34 HAL_9000 kernel: [<c104e776>] handle_irq_event_percpu+0xef/0x100 Apr 22 04:20:34 HAL_9000 kernel: [<c1050152>] ? handle_edge_irq+0xcb/0xcb Apr 22 04:20:34 HAL_9000 kernel: [<c104e7ab>] handle_irq_event+0x24/0x3b Apr 22 04:20:34 HAL_9000 kernel: [<c1050152>] ? handle_edge_irq+0xcb/0xcb Apr 22 04:20:34 HAL_9000 kernel: [<c10501bb>] handle_fasteoi_irq+0x69/0x82 Apr 22 04:20:34 HAL_9000 kernel: <IRQ> [<c1003566>] ? do_IRQ+0x37/0x90 Apr 22 04:20:34 HAL_9000 kernel: [<c130c669>] ? common_interrupt+0x29/0x30 Apr 22 04:20:34 HAL_9000 kernel: [<c11ddd89>] ? acpi_idle_enter_bm+0x22a/0x25e Apr 22 04:20:34 HAL_9000 kernel: [<c12734f0>] ? cpuidle_idle_call+0x75/0xbd Apr 22 04:20:34 HAL_9000 kernel: [<c1001a5f>] ? cpu_idle+0x39/0x5a Apr 22 04:20:34 HAL_9000 kernel: [<c12fbd40>] ? rest_init+0x58/0x5a Apr 22 04:20:34 HAL_9000 kernel: [<c145172d>] ? start_kernel+0x28c/0x291 Apr 22 04:20:34 HAL_9000 kernel: [<c14510b0>] ? i386_start_kernel+0xb0/0xb7 Apr 22 04:20:34 HAL_9000 kernel: handlers: Apr 22 04:20:34 HAL_9000 kernel: [<f8492815>] sil_interrupt Apr 22 04:20:34 HAL_9000 kernel: Disabling IRQ #17 Message from syslogd@HAL_9000 at Sun Apr 22 04:20:34 2012 ... HAL_9000 kernel: Disabling IRQ #17 Apr 22 04:21:04 HAL_9000 kernel: ata2: lost interrupt (Status 0x58) Apr 22 04:21:04 HAL_9000 kernel: ata2: drained 512 bytes to clear DRQ Apr 22 04:21:04 HAL_9000 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Apr 22 04:21:04 HAL_9000 kernel: ata2.00: failed command: IDENTIFY DEVICE Apr 22 04:21:04 HAL_9000 kernel: ata2.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Apr 22 04:21:04 HAL_9000 kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Apr 22 04:21:04 HAL_9000 kernel: ata2.00: status: { DRDY } Apr 22 04:21:04 HAL_9000 kernel: ata2: hard resetting link Apr 22 04:21:05 HAL_9000 kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 22 04:21:10 HAL_9000 kernel: ata2.00: qc timeout (cmd 0x27) Apr 22 04:21:10 HAL_9000 kernel: ata2.00: failed to read native max address (err_mask=0x4) Apr 22 04:21:10 HAL_9000 kernel: ata2.00: HPA support seems broken, skipping HPA handling Apr 22 04:21:10 HAL_9000 kernel: ata2.00: revalidation failed (errno=-5) syslog_2012422.txt.zip
  8. I am looking into everything to try to find why things keep going upside-down. I have an ASUS P8Z68-V LX mother board. I was going through the BIOS. There are 3 options for SATA configuration [AHCI Mode default] : 1. IDE Mode: Set to [iDE Mode] when you wan tot use the Serial ATA hard disk drives as a parallel ATA physical storage device. 2. AHCI Mode (Default) : Set to [AHCI Mode] when you wan to the SAT hard disk drives to use the AHCI (Advance Host Controller Interface). The AHCI allows the onboard storage driver to enable advances Serial ATA features that increases storage performance on random warlords by allowing the drive to internally optimizes the order of commands 3. RAID Mode: Set to the [RAID Mode] win you want to create a RAID configuration from the SATA hard disk drives. In the [iDE Mode], you can set: Serial ATA Controller 0 [Enhanced] Disabled Mode: Disables SATA function Enhanced: Supports form than four SATA devices Compatible: When using Windows 98/NT/2000/MS Dos Serial ATA Controller 1 [Enhanced] Disabled Mode: Disables SATA function Enhanced: Supports form than four SATA devices I also have a PCI 4 port SATA card based on SiI 3114 chipset. A question for the community on the SATA BIOS settings: At present, they are set to AHCI Mode, which was default in the BIOS. Should they be set to something else - i.e. IDE mode?? Advice appreciated SH
  9. After all my webGUO issues (http://lime-technology.com/forum/index.php?topic=19560.0), I reformatted my flash, started over with 5.0b14. All was running well, rebuilding parity, all complete and happy ...then. Apr 21 12:11:58 HAL_9000 kernel: md: sync done. time=56692sec Apr 21 12:11:58 HAL_9000 kernel: md: recovery thread sync completion status: 0 Apr 21 12:34:55 HAL_9000 kernel: irq 17: nobody cared (try booting with the "irqpoll" option) Apr 21 12:34:55 HAL_9000 kernel: Pid: 0, comm: swapper Not tainted 3.1.1-unRAID #1 Apr 21 12:34:55 HAL_9000 kernel: Call Trace: Apr 21 12:34:55 HAL_9000 kernel: [<c104fa8c>] __report_bad_irq+0x1f/0x95 Apr 21 12:34:55 HAL_9000 kernel: [<c104fc39>] note_interrupt+0x137/0x1a8 Apr 21 12:34:55 HAL_9000 kernel: [<f8492832>] ? sil_interrupt+0x1d/0x67 [sata_sil] Apr 21 12:34:55 HAL_9000 kernel: [<c104e776>] handle_irq_event_percpu+0xef/0x100 Apr 21 12:34:55 HAL_9000 kernel: [<c1050152>] ? handle_edge_irq+0xcb/0xcb Apr 21 12:34:55 HAL_9000 kernel: [<c104e7ab>] handle_irq_event+0x24/0x3b Apr 21 12:34:55 HAL_9000 kernel: [<c1050152>] ? handle_edge_irq+0xcb/0xcb Apr 21 12:34:55 HAL_9000 kernel: [<c10501bb>] handle_fasteoi_irq+0x69/0x82 Apr 21 12:34:55 HAL_9000 kernel: <IRQ> [<c1003566>] ? do_IRQ+0x37/0x90 Apr 21 12:34:55 HAL_9000 kernel: [<c130c669>] ? common_interrupt+0x29/0x30 Apr 21 12:34:55 HAL_9000 kernel: [<c11ddd89>] ? acpi_idle_enter_bm+0x22a/0x25e Apr 21 12:34:55 HAL_9000 kernel: [<c12734f0>] ? cpuidle_idle_call+0x75/0xbd Apr 21 12:34:55 HAL_9000 kernel: [<c1001a5f>] ? cpu_idle+0x39/0x5a Apr 21 12:34:55 HAL_9000 kernel: [<c12fbd40>] ? rest_init+0x58/0x5a Apr 21 12:34:55 HAL_9000 kernel: [<c145172d>] ? start_kernel+0x28c/0x291 Apr 21 12:34:55 HAL_9000 kernel: [<c14510b0>] ? i386_start_kernel+0xb0/0xb7 Apr 21 12:34:55 HAL_9000 kernel: handlers: Apr 21 12:34:55 HAL_9000 kernel: [<f8492815>] sil_interrupt Apr 21 12:34:55 HAL_9000 kernel: Disabling IRQ #17 Apr 21 12:35:10 HAL_9000 kernel: ata2: lost interrupt (Status 0x50) The system can be access via telnet or the console, not web access (unRaid or unMenu - even with a re-start of both), AFP is off line, can still get to the flash drive via SMB. Full syslog attached. Help appreciated. SH syslog.txt.zip
  10. I have reformatted my USB, started clean with a 5.0b14 - still the same issue; The webGUI hangs, and then I can not get it back. Again, I ran the following to restart: root@HAL_9000:~# killall emhttp root@HAL_9000:~# nohup /usr/local/sbin/emhttp & [1] 6930 root@HAL_9000:~# nohup: ignoring input and appending output to `nohup.out' [1]+ Segmentation fault nohup /usr/local/sbin/emhttp root@HAL_9000:~# A bit of the syslog Apr 20 19:24:44 HAL_9000 unmenu-status: Starting unmenu web-server Apr 20 19:25:11 HAL_9000 vsftpd[6291]: connect from 192.168.11.1 (192.168.11.1) (Routine) Apr 20 19:27:07 HAL_9000 emhttp: unRAID System Management Utility version 5.0-beta14 (Lime Tech) Apr 20 19:27:07 HAL_9000 emhttp: Copyright © 2005-2011, Lime Technology, LLC (Lime Tech) Apr 20 19:27:07 HAL_9000 emhttp: Pro key detected, GUID: 054C-05B8-2211-108174003136 (Other emhttp) Apr 20 19:27:07 HAL_9000 emhttp: get_config_idx: fopen /boot/config/flash.cfg: No such file or directory - assigning defaults (Other emhttp) Apr 20 19:27:07 HAL_9000 emhttp: rdevName.22 not found (Other emhttp) Apr 20 19:27:07 HAL_9000 emhttp: diskFsStatus.1 not found (Other emhttp) Apr 20 19:27:07 HAL_9000 kernel: emhttp[6861]: segfault at 0 ip b748a760 sp bf9fde40 error 4 in libc-2.11.1.so[b7411000+15c000] (Errors) Apr 20 19:28:55 HAL_9000 vsftpd[6928]: connect from 192.168.11.1 (192.168.11.1) (Routine) Apr 20 19:28:55 HAL_9000 emhttp: unRAID System Management Utility version 5.0-beta14 (Lime Tech) Apr 20 19:28:55 HAL_9000 emhttp: Copyright © 2005-2011, Lime Technology, LLC (Lime Tech) Apr 20 19:28:55 HAL_9000 emhttp: Pro key detected, GUID: 054C-05B8-2211-108174003136 (Other emhttp) Apr 20 19:28:55 HAL_9000 emhttp: get_config_idx: fopen /boot/config/flash.cfg: No such file or directory - assigning defaults (Other emhttp) Apr 20 19:28:55 HAL_9000 emhttp: rdevName.22 not found (Other emhttp) Apr 20 19:28:55 HAL_9000 emhttp: diskFsStatus.1 not found (Other emhttp) Apr 20 19:28:55 HAL_9000 kernel: emhttp[6930]: segfault at 0 ip b74a1760 sp bfcf88b0 error 4 in libc-2.11.1.so[b7428000+15c000] (Errors) Apr 20 19:29:51 HAL_9000 vsftpd[6942]: connect from 192.168.11.1 (192.168.11.1) (Routine) The menu is stopped, but will not restart. Any other ideas?? Thanks, syslog-2012-04-20.txt.zip
  11. root No password. You can set all that in the GUI.
  12. Yes, it that case I posted, I did type the wrong directory. But even drilling into the directory, I still get a Segmentation fault. root@HAL_9000:/usr/local/sbin# emhttp Segmentation fault root@HAL_9000:/usr/local/sbin# Now - here is a very stupid question, as I have no Linux experience. Were does the /usr directory live? Is it on a disk or in virtual memory. Seems I could just recopy the file if it lived in on a physical disk. (Guess I will have to reformat the flash).
  13. If you are using 5.0B14, did you run the permissions script on the Utilities tab? I had the same problem and this cleared for my Mac environment.
  14. Back in business - all green balls. Message at the top of Unmenu - interesting STARTED, 6 disks in array. Parity is Valid:. Last parity check 15446 days ago with no sync errors. Does the community have any thoughts on the cause? And how I can avoid the issues again? Thanks in advance.
  15. === START OF INFORMATION SECTION === Device Model: ST2000DL001-9VT156 Serial Number: 5YD0WX92 Firmware Version: CC96 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Apr 16 08:47:35 2012 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 612) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 128 100 006 Pre-fail Always - 1766439400 3 Spin_Up_Time 0x0003 093 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 294 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 071 069 030 Pre-fail Always - 14267327 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1790 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 71 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 9 189 High_Fly_Writes 0x003a 098 098 000 Old_age Always - 2 190 Airflow_Temperature_Cel 0x0022 072 034 045 Old_age Always In_the_past 28 (1 27 28 28) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19 193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 8433 194 Temperature_Celsius 0x0022 028 066 000 Old_age Always - 28 (0 22 0 0) 195 Hardware_ECC_Recovered 0x001a 128 100 000 Old_age Always - 1766439400 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 19 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 175728586917572 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1759028975 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1928648207 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t]
  16. I woke this morning to a system in a coma. Running 5.0b14 Background: Yesterday, I lost power and on restart the system said parry was not valid, so I restarted a parity check. That seem to complete at midnight. the mover started at 3:40, along the way: Message from syslogd@HAL_9000 at Mon Apr 16 03:52:24 2012 ... HAL_9000 kernel: Disabling IRQ #17 The mover kept on working till al hell broke loose: Apr 16 03:52:59 HAL_9000 kernel: ata1: lost interrupt (Status 0x50) Apr 16 03:52:59 HAL_9000 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Apr 16 03:52:59 HAL_9000 kernel: ata1.00: failed command: READ DMA Apr 16 03:52:59 HAL_9000 kernel: ata1.00: cmd c8/00:08:a8:4a:00/00:00:00:00:00/e0 tag 0 dma 4096 in Apr 16 03:52:59 HAL_9000 kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Apr 16 03:52:59 HAL_9000 kernel: ata1.00: status: { DRDY } Apr 16 03:52:59 HAL_9000 kernel: ata1: hard resetting link Apr 16 03:52:59 HAL_9000 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 16 03:53:04 HAL_9000 kernel: ata1.00: qc timeout (cmd 0x27) Apr 16 03:53:04 HAL_9000 kernel: ata1.00: failed to read native max address (err_mask=0x4) Apr 16 03:53:04 HAL_9000 kernel: ata1.00: HPA support seems broken, skipping HPA handling Apr 16 03:53:04 HAL_9000 kernel: ata1.00: revalidation failed (errno=-5) Apr 16 03:53:04 HAL_9000 kernel: ata1: hard resetting link Apr 16 03:53:05 HAL_9000 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 16 03:53:05 HAL_9000 kernel: ata1.00: configured for UDMA/100 Apr 16 03:53:05 HAL_9000 kernel: ata1.00: device reported invalid CHS sector 0 Apr 16 03:53:05 HAL_9000 kernel: ata1: EH complete Full syslog attached I have not done anything to the system. I do have command line access and restarted unmenu (which freezes up) to see a message: STARTED, 6 disks in array. PARITY NOT VALID: DISK_DSBL I would like recommendations on how to gracefully bring the system down at the command line (I can spell Linux, but that is about it). And of course bring back up to diagnose the issues. Thanks in advance System: ASUS P8Z68-V LX mother board with 6 onboard SATA; Intel i5-2300, 2.8Ghz; 2-2G RAM Additional SATA card (4 port) - no name chinese card Antec 750W PSU 5- 2TB data, 1- 2TB Parity, 1- 500G Cache - all Seagate drives Antec 300 case Sony 4GB flash drive Installed: unMenu Additional Packages: "C" Compiler Mail & ssmtp UnRaid-Web syslog_20120416.txt.zip
  17. Joe, Running all 2TB Seagate drive; 1 Parity, 5 Data; 1-500M Seagate Cache. The PSU I am running is as follows (cut an paste from spec sheet):? ?Antec HCG-750 HIGH CURRENT GAMER SERIES ??SPECIFICATIONS: • 750watts Continuous Power • NVIDIA®SLI®-Ready certified, ATICrossFireTMcertified • 80PLUS®Bronzecertified–up to 88%efficient • QuadHighCurrent+12V rails • Quiet135mm double ball bearing cooling fan • Fourgold-plated8-pin(6+2)-pinPCI-Econnectorsformultiple-graphicscardconfigurations • AllJapanese-brandcapacitorsforreliability • Gold-platedHighCurrentterminalsforoptimalconductivity • UniversalInput–worksonany100V-240Vgrid • ActivePFCwithPF:0.99 • MTBF:100,000hours • MeetsErPLot6:2010requirement:5Vsb<1W • AQ5AntecQuality5-yearlimitedwarrantyonpartsandlabor INPUT: ?Input Voltage 100 ~ 240 Vac ± 10% Input Frequency Range 47 Hz ~ 63 Hz Efficiency Up to 88% ?Operating Temperature 0°C – 50°C OUTPUT: Max Load +3.3V 25A +5V 25A +12V1 40A +12V2 40A +12V3 40A +12V4 40A -12V 0.5A +5Vsb 3A Everything starts up OK. It is only at shut down, or spinning down the disks - always when the last disk is spun down, it's like the reset button is hit. The system goes "black" and restarts.
  18. I don't have another PSU - will have to get one. It seems counter-intuitive that the PSU would drop out and re-boot when the drives spin down (lowest power usage). It always happens when the last drive is given the command to spin down - I can spin down the drives one by one using Disk Management tab in unMenu. I have check the MB BIOS and turned off all energy management, thinking that the MB goes into some sleep mode when the drives doing down. I have set the disk spin down delay to "never" at this point, but not ideal for energy consumption (though there are different schools of thought on running continuously or spinning down a drive for life). I am at a loss for were to turn next
  19. Did that, had some orphaned clusters - repairs. GUI works OK, but seems a bit intermittent. Still getting the same series of messages on my terminal even though the GUI is working. Is there a way to reload the code for it?
  20. Checked the cables, swapped out a couple fro good measure. Spun down the drives on by one. Once the last drive spin down command is given - BOOM, whole system powers down and reboots. I have tried spinning down in different orders, same result. Every time the command to spin the last drive down, the system drops power and resets. Also - time to rename this thread. I am at a complete loss. Apr 14 16:52:05 HAL_9000 login[4810]: ROOT LOGIN on '/dev/pts/0' from '192.168.11.10' Apr 14 16:52:43 HAL_9000 kernel: NTFS driver 2.1.30 [Flags: R/W MODULE]. Apr 14 16:53:24 HAL_9000 kernel: mdcmd (39): spindown 5 Apr 14 16:53:24 HAL_9000 kernel: Apr 14 16:53:34 HAL_9000 kernel: mdcmd (40): spindown 4 Apr 14 16:53:34 HAL_9000 kernel: Apr 14 16:53:41 HAL_9000 kernel: mdcmd (41): spindown 3 Apr 14 16:53:41 HAL_9000 kernel: Apr 14 16:53:47 HAL_9000 kernel: mdcmd (42): spindown 2 Apr 14 16:53:47 HAL_9000 kernel: Apr 14 16:53:52 HAL_9000 kernel: mdcmd (43): spindown 1 Apr 14 16:53:52 HAL_9000 kernel: Apr 14 16:53:58 HAL_9000 kernel: mdcmd (44): spindown 0 Apr 14 16:53:58 HAL_9000 kernel:
  21. A couple of my own learnings: 1. Using the power button does not properly shut the array down and unmount the disks. unRaid will run a parity check upon restart. 2. For the menus not running, telnet into your system and run the following at the command line to restart the unRaid webGUI killall emhttp nohup /usr/local/sbin/emhttp & 3. For unMenu: cd /boot/unmenu uu Hope it helps, as I have had the same problems at times SH
  22. I have been having issues with the 5.0b14 webGUI. When I restarted my system, the GUI was not accessible. On the terminal I have on the server, I had a series of errors like this: PHP Notice: Undefined index: Icon in user/local/emhttp/plugins/webGUI/template.php on line 128 all there PHP Notice pointing at the template.php file, not only icon, but Title, etc. as well as different lines in the template.php file I tried to restart the webGUI - see command line below: root@HAL_9000:/# killall emhttp root@HAL_9000:/# nohup /user/local/sbin/emhttp & [1] 8014 root@HAL_9000:/# nohup: ignoring input and appending output to `nohup.out' nohup: failed to run command `/user/local/sbin/emhttp': No such file or directory [1]+ Exit 127 nohup /user/local/sbin/emhttp root@HAL_9000:/# Syslog log below root@HAL_9000:~# tail -f /var/log/syslog Apr 14 11:27:25 HAL_9000 unmenu-status: Starting unmenu web-server Apr 14 11:28:23 HAL_9000 emhttp: unRAID System Management Utility version 5.0-beta14 Apr 14 11:28:23 HAL_9000 emhttp: Copyright © 2005-2011, Lime Technology, LLC Apr 14 11:28:23 HAL_9000 emhttp: Pro key detected, GUID:xxxx Apr 14 11:28:23 HAL_9000 emhttp: rdevName.22 not found Apr 14 11:28:23 HAL_9000 emhttp: diskFsStatus.1 not found Apr 14 11:28:23 HAL_9000 kernel: emhttp[5680]: segfault at 0 ip b754f760 sp bfb21270 error 4 in libc-2.11.1.so[b74d6000+15c000] Apr 14 11:31:53 HAL_9000 in.telnetd[6237]: connect from 192.168.11.10 (192.168.11.10) Apr 14 11:31:54 HAL_9000 login[6238]: invalid password for 'UNKNOWN' on '/dev/pts/1' from '192.168.11.10' Apr 14 11:31:56 HAL_9000 login[6238]: ROOT LOGIN on '/dev/pts/1' from '192.168.11.10' Apr 14 11:45:05 HAL_9000 login[3728]: ROOT LOGIN on '/dev/tty1' Apr 14 11:45:52 HAL_9000 unmenu-status: Starting unmenu web-server Apr 14 11:46:48 HAL_9000 emhttp: unRAID System Management Utility version 5.0-beta14 Apr 14 11:46:48 HAL_9000 emhttp: Copyright © 2005-2011, Lime Technology, LLC Apr 14 11:46:48 HAL_9000 emhttp: Pro key detected, GUID: Apr 14 11:46:48 HAL_9000 emhttp: rdevName.22 not found Apr 14 11:46:48 HAL_9000 emhttp: diskFsStatus.1 not found Apr 14 11:46:48 HAL_9000 kernel: emhttp[8234]: segfault at 0 ip b764a760 sp bfa82ba0 error 4 in libc-2.11.1.so[b75d1000+15c000] attached s the full syslog. Aside from restarting, which has not seemed to help, any advice? syslog-2012-04-14-2.txt
  23. OK - I think I found the the source of my problem. All had been running well on 5b14 until the disk started to spin down. The log from last night/this am, the disks were spinning down when out of use. I was using a couple disks until 11:30PM, thus the 3:30am spin down (4 hour lag). System reset itself. This morning, I tried a manual spin down, thinking that when the disks all spin down, the system was resetting for some reason - yes it did. Also, if I use the power down command in unMenu/user scripts - the system powers down cleanly, but automatically restarts. Now the question is why? Is it a BIOS issue, or a is it something else? I am not sure why the power supply would fall over when the disks are spinning down. Thoughts? SYSLOG from early this AM Apr 13 22:59:33 HAL_9000 unmenu-status: Starting unmenu web-server Apr 13 23:56:58 HAL_9000 kernel: mdcmd (52): spindown 1 Apr 13 23:56:59 HAL_9000 kernel: mdcmd (53): spindown 4 Apr 14 02:14:14 HAL_9000 kernel: mdcmd (54): spindown 3 Apr 14 03:26:31 HAL_9000 kernel: mdcmd (55): spindown 0 Apr 14 03:31:53 HAL_9000 kernel: mdcmd (56): spindown 2 SYSTEM reset and went into a parity check upon start up root@HAL_9000:~# tail -f /var/log/syslog Apr 14 04:40:01 HAL_9000 syslogd 1.4.1: restart. Apr 14 10:06:28 HAL_9000 in.telnetd[12875]: connect from 192.168.11.10 (192.168.11.10) Apr 14 10:06:30 HAL_9000 login[12876]: invalid password for 'UNKNOWN' on '/dev/pts/0' from '192.168.11.10' Apr 14 10:06:32 HAL_9000 login[12876]: ROOT LOGIN on '/dev/pts/0' from '192.168.11.10' Apr 14 10:07:21 HAL_9000 kernel: md: sync done. time=23674sec Apr 14 10:07:23 HAL_9000 kernel: md: recovery thread sync completion status: 0 Apr 14 10:07:32 HAL_9000 emhttp: Spinning down all drives... Apr 14 10:07:32 HAL_9000 kernel: mdcmd (39): spindown 0 Apr 14 10:07:33 HAL_9000 kernel: mdcmd (40): spindown 1 Apr 14 10:07:34 HAL_9000 kernel: mdcmd (41): spindown 2 Apr 14 10:07:35 HAL_9000 kernel: mdcmd (42): spindown 3 Apr 14 10:07:36 HAL_9000 kernel: mdcmd (43): spindown 4 Apr 14 10:07:37 HAL_9000 kernel: mdcmd (44): spindown 5 Apr 14 10:07:38 HAL_9000 emhttp: shcmd (59): /usr/sbin/hdparm -y /dev/sdb &> /dev/null Apr 14 10:07:38 HAL_9000 kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen Apr 14 10:07:38 HAL_9000 kernel: ata5.00: irq_stat 0x00400040, connection status changed Apr 14 10:07:38 HAL_9000 kernel: ata5: SError: { PHYRdyChg 10B8B DevExch } Apr 14 10:07:38 HAL_9000 kernel: ata5.00: failed command: STANDBY IMMEDIATE Apr 14 10:07:38 HAL_9000 kernel: ata5.00: cmd e0/00:00:00:00:00/00:00:00:00:00/40 tag 0 Apr 14 10:07:38 HAL_9000 kernel: res 50/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) Apr 14 10:07:38 HAL_9000 kernel: ata5.00: status: { DRDY } Apr 14 10:07:38 HAL_9000 kernel: ata5: hard resetting link This was a manual spin down via the GUI - after the disks all were down, the system reset.
  24. This is probably a very basic question, but... Now having migrated to 5.0b14 from 4.7, I have a question on using AFP v. SMB. I am running an all Mac environment. My original shares were created and shared via SMB in 4.7. Is there any reason to continue to use SMB, or can I just start exporting AFP and stop exporting SMB? Thanks,