Says Drives Missing, yet fdisk -l shows all there?


Recommended Posts

So, my friend has been having issues with his unraid AGAIN... he shut it down for about a month while he was doing basement renovations, and when he turned it on last week, it ran for about an hour, and then took out his network (his words)

 

He brought it over, and I took a look at it.  the first 2 times i powered it on, it didn't really do anything... then i got it to post, and boot... wasn't reachable by the network, and the keyboard was working, but when it finished booting, it didn't.  hard reset.  got access to the server via web, and telnet... and keyboard was working again.

 

of the 5 data, 1 parity and 1 cache drives, 2 data drives were not shown... upon rebooting a few more times... i could get it to where 1 drive was missing... but never had all drives shown.

 

When I did `fdisk -l | grep "/dev"` (so many lines shown without the grep), it shows that all the drives are there.

 

Any ideas what I should be looking for/at?  The server is running 5.0 RC8, and it was running fine?  At first I thought maybe power supply was flaky, but since fdisk is showing that everything is there?

 

I am lost... :(  please help!

Link to comment

You mentioned that the machine was shut down while renovation was going on which suggests that it could have been moved around quite a bit so that the internals got disturbed.

 

I would check that any daughter boards are properly seated and then check that all the cabling appears to be securely plugged in.  Also check that there is not a lot of dust inside the system (in case it was not protected from this during renovation).  The fact that you were having trouble even getting the system to boot is a bit suspicious.

 

Link to comment

I have done that, i have reseated the raid card, triple checked all the sata and power connectors... and there is no dust inside the case, and i know 100% for a fact, it was not jostled around, or moved around (was put into another room out of harms way on purpose)... he is pretty anal about his computer stuff.

 

are you thinking maybe power issues as well?  (ie the power supply)

Link to comment

chkdsk the flash drive.

 

Can I ask why you gave this advice? I find it hard to believe that a program would be corrupt, yet work for all disks except one.

 

I'd love to expand my knowledge, this isn't me being an ass.

 

he mentioned drives would sometimes appear, sometimes not appear, etc during boot. my experience in the forums and with unraid is that a error on the flash drive (usb stick) can cause really funky things to happen.

 

its an easy check to perform.

Link to comment

Ok, checkdsk didn't find anything...  I had actually had all drives shown on my second last boot... then when i started the array (said it wasn't started due to improper shutdown) it started...a nd then hung... so, reboot... 1 drive not started...

 

I copied the dmesg and the syslog... here is the syslog:

http://pastebin.com/JP9tMvtR

 

looks like I have 1, possibly 2 drives that are failing/failed, which would explain why sometimes i have all, or missing up to 2 drives when i boot up...

 

[some of] the errors are: 

Mar  5 18:50:08 Tower kernel: sd 1:0:3:0: [sde] Unhandled error code
Mar  5 18:50:08 Tower kernel: sd 1:0:3:0: [sde]  Result: hostbyte=0x01 driverbyte=0x00
Mar  5 18:50:08 Tower kernel: sd 1:0:3:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 10 3f 00 00 08 00
Mar  5 18:50:08 Tower kernel: end_request: I/O error, dev sde, sector 4159
Mar  5 18:50:08 Tower kernel: Buffer I/O error on device sde1, logical block 4096
Mar  5 18:50:08 Tower kernel: Buffer I/O error on device sde1, logical block 4097
Mar  5 18:50:08 Tower kernel: Buffer I/O error on device sde1, logical block 4098
Mar  5 18:50:08 Tower kernel: Buffer I/O error on device sde1, logical block 4099
Mar  5 18:50:08 Tower kernel: Buffer I/O error on device sde1, logical block 4100
Mar  5 18:50:08 Tower kernel: Buffer I/O error on device sde1, logical block 4101
Mar  5 18:50:08 Tower kernel: Buffer I/O error on device sde1, logical block 4102
Mar  5 18:50:08 Tower kernel: Buffer I/O error on device sde1, logical block 4103
Mar  5 18:50:08 Tower kernel: sd 1:0:3:0: [sde] Synchronizing SCSI cache
Mar  5 18:50:08 Tower kernel: sd 1:0:3:0: [sde]  Result: hostbyte=0x01 driverbyte=0x00

 

Am I right to think that drives are failed/failing?  and I should prolly pull them out and recover (at least 1 of) them manually?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.