July 13, 201114 yr Hi gang, I'm pretty sure I've had a drive die as it is now "Red", though theoretically will still spin-up. My question is how to verify - I tried unMenu commands - checking Smart Status, HDParm etc. and now unMenu fails to load at all. All I'm receiving is a Windows "Diagnose Connection Problem" which states "the remote device or resource won't accept the connection". I discovered this problem trying to access one of my accounting programs, so it is imperative that I get the system back up. Tried to obtain the Sys-Log from unMenu but got the above. Last parity check ~ 50 days ago w/o any errors or sync problems. Replacing drive is not a problem (aside from getting a new one installed), but I want to make sure I'm not seeing a cascading problem. Direct access to Tower via unRaid menu is still available which is where I discovered the "Red" disk #2 error. Suggestion please? Thanks, Dave
July 14, 201114 yr If direct access is availble, then you should be able to get syslog. Don't need unMenu for that. Did you try this link? http://tower/log/syslog Can you ping the tower or telnet to it from your system? Also, see this. Shawn
July 15, 201114 yr Author Hello Shawn: Thanks, just tried that and go the following: Jul 15 04:40:01 Tower syslogd 1.4.1: restart. Jul 15 04:40:07 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:40:07 Tower kernel: mdcmd (31765): spindown 2 Jul 15 04:40:07 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:40:17 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:40:17 Tower kernel: mdcmd (31766): spindown 2 Jul 15 04:40:17 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:40:27 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:40:27 Tower kernel: mdcmd (31767): spindown 2 Jul 15 04:40:27 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:40:37 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:40:37 Tower kernel: mdcmd (31768): spindown 2 Jul 15 04:40:37 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:40:47 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:40:47 Tower kernel: mdcmd (31769): spindown 2 Jul 15 04:40:47 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:40:57 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:40:57 Tower kernel: mdcmd (31770): spindown 2 Jul 15 04:40:57 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:41:07 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:41:07 Tower kernel: mdcmd (31771): spindown 2 Jul 15 04:41:07 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:41:17 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:41:17 Tower kernel: mdcmd (31772): spindown 2 Jul 15 04:41:17 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:41:27 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:41:27 Tower kernel: mdcmd (31773): spindown 2 Jul 15 04:41:27 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:41:37 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:41:37 Tower kernel: mdcmd (31774): spindown 2 Jul 15 04:41:37 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:41:47 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:41:47 Tower kernel: mdcmd (31775): spindown 2 Jul 15 04:41:47 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:41:57 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:41:57 Tower kernel: mdcmd (31776): spindown 2 Jul 15 04:41:57 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:42:07 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:42:07 Tower kernel: mdcmd (31777): spindown 2 Jul 15 04:42:07 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 Jul 15 04:42:17 Tower emhttp: mdcmd: write: Input/output error Jul 15 04:42:17 Tower kernel: mdcmd (31778): spindown 2 etc. nothing before today and when I go to "tower" for me since I have a fixed ip (10.0.0.50) I can see main and performs "refresh" etc. But now all of my "Shares are x'd out" on the network. Help Dave
July 16, 201114 yr Author Hi all, Quick follow-up to the Sys Log post this morning. I just looked at my drive, its my 1st (original 500GB IDE drive - ST3500841A_3PM03VX2) and is listed as hda. I've just looked at NewEgg/Fry's and the largest IDE available is 320GB. Question, can my "Parity" drive re-build my drive if I change to a SATA? How would I replace hda as the Sata would become sdc (I have a Parity Sata (listed as sda) & 1 additional Sata drive (sdc))? Total of 6 IDE's & 1 SATA (plus Parity). Dave :'(
July 16, 201114 yr Just to further what dgaschk said, those drive assingments, sda, sdb, etc. can actually change from boot to boot. All you need to do is add in your new SATA drive and assign it to the slot that the old, failed drive was assigned to. Then unRaid will rebuild. unRaid doesn't care whether it is IDE or SATA, just that the same serial numbered drive is always in the same slot (connected to the same port).
July 16, 201114 yr Author Just to further what dgaschk said, those drive assingments, sda, sdb, etc. can actually change from boot to boot. All you need to do is add in your new SATA drive and assign it to the slot that the old, failed drive was assigned to. Then unRaid will rebuild. unRaid doesn't care whether it is IDE or SATA, just that the same serial numbered drive is always in the same slot (connected to the same port). Thank You Both! To clarify: I still need a drive that is as "big or bigger" than the failed drive and I need to add it to the slot that is currently identified as my failed drive in the array I really can't afford to mess up that data so want to verify the process since this is really the 1st drive failure I've had since I created my unRaid in 2006. I appreciate all the help everyone has offered. Dave PS - When I try to see the unRaid, none of my "drive shares" are available. I also cannot access any individual disk or the flash, even though I can get to the main Lime Technology screen/etc. My windows program is now reporting "Your computer appears to be correctly configured, but the device or resource (tower) is not resonding". Is this a "bigger problem" than just a dead drive???
July 17, 201114 yr I still need a drive that is as "big or bigger" than the failed drive and I need to add it to the slot that is currently identified as my failed drive in the array Correct PS - When I try to see the unRaid, none of my "drive shares" are available. I also cannot access any individual disk or the flash, even though I can get to the main Lime Technology screen/etc. My windows program is now reporting "Your computer appears to be correctly configured, but the device or resource (tower) is not resonding". Is this a "bigger problem" than just a dead drive??? That is a bigger issue - even with a single, failed drive, your shares should be still available. You should post a syslog. Shawn
July 17, 201114 yr Author That is a bigger issue - even with a single, failed drive, your shares should be still available. You should post a syslog. Shawn Hi Shawn, I've tried your command to update my Syslog, but have not been able to access the flash in order to get a post. Sent a note under "Replace" (separate post) to Joe L. The only Syslog that I've been able to get (partially) was the one above, prior to shutting the unRaid down in order to remove the "bad" drive. For whatever reason, the array wants to start w/o the old drive and/or new drive inserted in the old drive's "slot". I "stopped" the array over 5 hours ago and this is still what I get through the LimeTech Main Screen: LIMEtechnology Server name: Tower Comments: Media server unRAID Server Pro Main | Users | Shares | Settings | Devices version: 4.7 Disk status Model / Serial No. Temperature Size Free Reads Writes Errors parity ST31500341AS_9VS37M0J * 1,465,138,552 - 135 183 0 disk1 ST31000528AS_9VP1XTQX * 976,762,552 Unmounting 83 16 0 disk2 Not installed - 488,386,552 - - - - disk3 Maxtor_6L300R0_L60VMZGG * 293,057,320 - 79 25 0 disk4 WDC_WD2500JB-00REA0_WD-WMANK1838271 * 244,198,552 - 80 27 0 disk5 WDC_WD2500JB-00REA0_WD-WMANK1917605 * 244,198,552 - 80 27 0 disk6 ST3500830A_9QG2T30C * 488,386,552 - 85 29 0 disk7 ST3500830A_9QG2ECSV * 488,386,552 - 86 29 0 Command area Stopping.... Per your comment, I may have a "bigge" problem and am not sure what to do next! Thanks for the thoughts and any recommendations! Dave
July 17, 201114 yr Ahh... ok, that is why you cannot access anything. You did not need to stop the array BTW, will run ok with a failed drive, which is why it started up. Just you want to replace that drive soon as possible, and also not write any data to the system while you have a failed drive. With the main screen showing "Stopping..." I assume you have refreshed it? Sounds like you have a process stuck that is accessing a drive which is stopping unRaid from unmounting the drive. Tell which process is busy: /usr/bin/fuser -mv /mnt/disk* /mnt/user/* Then you can kill that process. Shawn
July 17, 201114 yr Author Ahh... ok, that is why you cannot access anything. You did not need to stop the array BTW, will run ok with a failed drive, which is why it started up. Just you want to replace that drive soon as possible, and also not write any data to the system while you have a failed drive. With the main screen showing "Stopping..." I assume you have refreshed it? Sounds like you have a process stuck that is accessing a drive which is stopping unRaid from unmounting the drive. Tell which process is busy: /usr/bin/fuser -mv /mnt/disk* /mnt/user/* Then you can kill that process. Shawn Hi Shawn, I just sent a Syslog set to the other post for Joe L. - rather than do it twice, check this link - Archivist link=topic=14035.msg132979#msg132979 date=1310914053 Thanks for your help, yes the process definitely "hung" for over 5 hours, so I've killed the boot several times. I'll try your command to see if it will stop that process. Hopefully nothing else is wrong. Dave
Archived
This topic is now archived and is closed to further replies.