[Solved]New to unRaid. issue with disks missing

xisplo · October 11, 2015

Hello everyone,

First and for most I've been looking forward to using unRaid since I was shown its great storage capability.

I am extremely green to the unRaid operating system my knowledge is limited. However, I have been reading about it like a mad man.

I have a mix of WD Red, Black, and Seagate Barracudas -all 2TB.

My Parity drive is an HGST 4TB

unRaid: 6.1.3

Case: NORCO RPC-4020

SAS Card: SM AOC-SASLP-MV8

MB: SUPERMICRO MBD-X10SLM-F-O

CPU: Xeon E3-1246 v3 Haswell 3.5GHz

RAM: Crucial (2 x 8GB) DDR3 SDRAM ECC Unbuffered DDR3 1600 (PC3 12800)

2x 3ware CBL-SFF8087OCF-10M SFF-8087

PSU: CORSAIR CX750W

Purpose:

Plex media storage/download box/media maintenance

My issues:

Any time I stop the array, or reboot the box random drives will go missing.

I initially mounted and created the parity - took a while.

The I started creating my shares and moving data over for testing purposes.

Created 3 shares, only move ~140GB into one of them

I stopped the array.

I started the array and it took a couple of minutes to initiate.

Disk 2 missing.

Okay MAYBE the disk has gone bad.

Replaced. Same thing with the new disk. (i did follow the proper replacement instructions)

At this point my test data share was accessible with empty directories in it.

I read somewhere that another member had the similar issue so I followed his steps to fix. Stopped array and rebooted. started array.

Drive showed up as unmountable. clicking the format would do nothing. so I found another member with a formatting issue. solution was to stop array, reboot, start and format. that worked.

to try and trouble shoot my share with empty directories that I could not delete from UI I stopped the array and tried using SSL (putty) to manually rm the share. NFS issue.

when I looked back at the stopped array multiple disks were missing.

I have tried creating a new config. recreating everything from scratch (including recreating the parity that took 22hrs)

Initially I thought the cables were bad, but after trying multiple backplanes on the NORCO case and randomizing the SFF to disk locations disks still randomly go missing.

Anyone with any idea of what I should be aiming at now?

EDIT:

Syslog

Oct 11 09:41:18 MS emhttp: ckmbr error: -1

Oct 11 09:41:18 MS emhttp: ckmbr: read: Input/output error

Oct 11 09:41:18 MS emhttp: ckmbr error: -1

Oct 11 09:41:18 MS kernel: mdcmd (2): import 1 8,32 1953514552 WDC_WD2003FZEX-00Z4SA0_WD-WCC5CEASJ5AX

Oct 11 09:41:18 MS kernel: md: import disk1: [8,32] (sdc) WDC_WD2003FZEX-00Z4SA0_WD-WCC5CEASJ5AX size: 1953514552

Oct 11 09:41:18 MS kernel: Buffer I/O error on dev sdd, logical block 0, async page read

Oct 11 09:41:18 MS kernel: mdcmd (3): import 2 0,0

Oct 11 09:41:18 MS kernel: md: disk2 removed

Oct 11 09:41:18 MS kernel: Buffer I/O error on dev sde, logical block 0, async page read

Oct 11 09:41:18 MS kernel: mdcmd (4): import 3 0,0

Oct 11 09:41:18 MS kernel: md: disk3 missing

Oct 11 09:41:18 MS kernel: mdcmd (5): import 4 0,0

Oct 11 09:41:18 MS kernel: md: disk4 missing

Oct 11 09:41:18 MS kernel: mdcmd (6): import 5 8,96 1953514552 ST2000DL001-9VT156_5YD2V7HE

Oct 11 09:41:18 MS kernel: md: import disk5: [8,96] (sdg) ST2000DL001-9VT156_5YD2V7HE size: 1953514552

Oct 11 09:41:18 MS emhttp: ckmbr: read: Input/output error

Oct 11 09:41:18 MS emhttp: ckmbr error: -1

Oct 11 09:41:18 MS kernel: mdcmd (7): import 6 0,0

Oct 11 09:41:18 MS kernel: md: disk6 missing

Oct 11 09:41:18 MS kernel: mdcmd (: import 7 0,0

Oct 11 09:41:18 MS kernel: mdcmd (9): import 8 0,0

Oct 11 09:41:18 MS kernel: mdcmd (10): import 9 0,0

Oct 11 09:41:18 MS kernel: mdcmd (11): import 10 0,0

Oct 11 09:41:18 MS kernel: mdcmd (12): import 11 0,0

Oct 11 09:41:18 MS kernel: mdcmd (13): import 12 0,0

Oct 11 09:41:18 MS kernel: mdcmd (14): import 13 0,0

Oct 11 09:41:18 MS kernel: mdcmd (15): import 14 0,0

Oct 11 09:41:18 MS kernel: mdcmd (16): import 15 0,0

Oct 11 09:41:18 MS kernel: mdcmd (17): import 16 0,0

Oct 11 09:41:18 MS kernel: mdcmd (18): import 17 0,0

Oct 11 09:41:18 MS kernel: mdcmd (19): import 18 0,0

Oct 11 09:41:18 MS kernel: mdcmd (20): import 19 0,0

Oct 11 09:41:18 MS kernel: mdcmd (21): import 20 0,0

Oct 11 09:41:18 MS kernel: mdcmd (22): import 21 0,0

Oct 11 09:41:18 MS kernel: mdcmd (23): import 22 0,0

Oct 11 09:41:18 MS kernel: mdcmd (24): import 23 0,0

Oct 11 09:41:18 MS emhttp: import flash device: sda

Oct 11 09:41:19 MS avahi-daemon[4873]: Server startup complete. Host name is MS.local. Local service cookie is 596943745.

Oct 11 09:41:20 MS avahi-daemon[4873]: Service "MS" (/services/ssh.service) successfully established.

Oct 11 09:41:20 MS avahi-daemon[4873]: Service "MS" (/services/smb.service) successfully established.

Oct 11 09:41:20 MS avahi-daemon[4873]: Service "MS" (/services/sftp-ssh.service) successfully established.

Oct 11 10:12:46 MS emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog

Squid · October 11, 2015

A full diagnostics would be helpful in seeing what's going on.

Don't know about your power system, but initial thoughts are power system. (Weak supply / old supply, splitter issues, etc)

xisplo · October 11, 2015

thank you very much for the reply.

I have another powersupply i will throw in.

I am running no power splitters yet, molex straight from the PSU

I will update once i swap out the PSU.

and will provide a PRE swap diag

trurl · October 11, 2015

thank you very much for the reply.

I have another powersupply i will throw in.

I am running no power splitters yet, molex straight from the PSU

I will update once i swap out the PSU.

and will provide a PRE swap diag

The PSU in your OP is single rail 750W so unless it is defective it should be fine. What PSU are you planning to replace it with?

Tools - Diagnostics and post the zip. Should have been the first thing you did instead of all the other things you have tried.

I would be more inclined to consider the SATA ports. Are the missing drives on the SAS or on the motherboard or both?

xisplo · October 12, 2015

thank you very much for the reply.

I have another powersupply i will throw in.

I am running no power splitters yet, molex straight from the PSU

I will update once i swap out the PSU.

and will provide a PRE swap diag

The PSU in your OP is single rail 750W so unless it is defective it should be fine. What PSU are you planning to replace it with?

Tools - Diagnostics and post the zip. Should have been the first thing you did instead of all the other things you have tried.

I would be more inclined to consider the SATA ports. Are the missing drives on the SAS or on the motherboard or both?

Hey,

Thanks fir the reply.

The power supply im replacing it with in a thermaltake 600w single rail PSU. - still no luck

Attached is the diag .zip

Over night I created a new config, ran a clean on all drives and switched to XFS instead of btrFS.

everything was green and ready to go.

As soon as i started copying over test data (140GB) i got a red X on the 1st drive.

I started thinking it is the SFF breakout cable or the actual raid card.

The drives, except the parity drive, are connected to the SM Raid card so this does add the possibility of bad cable/card.

Thanks for all the input, looking forward to your findings in the Diag

At the time of the diag only disk 1 was showing the red X

I know one of the seagate drives is reporting bad SMART (came from a Lacie NAS)

ms-diagnostics-20151012-1247.zip

xisplo · October 13, 2015

Hello,

So I read a forum entry by Rajahal which stated a few things regarding the SM AOC-SASLP-MV8 Raid card link here

There is no raid on it, it simply says "JBOD" which is good. And no way to change it from JBOD to RAID (with .21 firmware)

I did disable INT 13 (was unsure of what it actually was) to my understanding it tries to make a disk, connected to the card, a bootable disk.

Once disabled I rebuilt the array and I have transferred over 2TB of data. Everything seems good as of right now.

Could the INT 13 have caused the issue I initially had?

Thanks

itimpi · October 13, 2015

Hello,

So I read a forum entry by Rajahal which stated a few things regarding the SM AOC-SASLP-MV8 Raid card link here

There is no raid on it, it simply says "JBOD" which is good. And no way to change it from JBOD to RAID (with .21 firmware)

You do not want the RAID setting - that would be incompatible with unRAID.

I did disable INT 13 (was unsure of what it actually was) to my understanding it tries to make a disk, connected to the card, a bootable disk.

Since (as you surmised) INT 13 is only related to whether a disk is seen as a protential boot candidate it should have no impact once you are past that point.

Once disabled I rebuilt the array and I have transferred over 2TB of data. Everything seems good as of right now.

Could the INT 13 have caused the issue I initially had?

The INT 13 setting will not have been relevant. Seems more likely to have been a cabling/bad connection type issue that you have managed to rectify.

xisplo · October 13, 2015

Hey,

Sorry if I was miss understood, yes I know it should be running as JBOD and NOT RAID.

Strange issue.

I have ordered another raid card and a few more SFF-8087 forward breakout cables just to have around.

Thank you for all of your help.

An analysis of the diag logs would be welcomed. Or pointing me to where I should be looking in them.

thanks

Hello,

So I read a forum entry by Rajahal which stated a few things regarding the SM AOC-SASLP-MV8 Raid card link here

There is no raid on it, it simply says "JBOD" which is good. And no way to change it from JBOD to RAID (with .21 firmware)

You do not want the RAID setting - that would be incompatible with unRAID.

I did disable INT 13 (was unsure of what it actually was) to my understanding it tries to make a disk, connected to the card, a bootable disk.

Since (as you surmised) INT 13 is only related to whether a disk is seen as a protential boot candidate it should have no impact once you are past that point.

Once disabled I rebuilt the array and I have transferred over 2TB of data. Everything seems good as of right now.

Could the INT 13 have caused the issue I initially had?

The INT 13 setting will not have been relevant. Seems more likely to have been a cabling/bad connection type issue that you have managed to rectify.

xisplo · October 13, 2015

I feel extremely dumb right now.

itimpi said:

"Seems more likely to have been a cabling/bad connection type issue that you have managed to rectify."

and that had me thinking.

I pulled everything out and looked at my motherboard closely (including manual [RTFM!])

The very last PCI-E slot on the board is labeled as 'PCI-E 2.0 X4 (in X8) and this is where the SASLP was initially settled in.

I had recently moved it to the X16 PCI-E slot and since then no issues.

I feel like smacking myself for having wasted your time.

Until further ado, problem has been fixed. (PEBKaC)

Sincerely,

xisplo

[Solved]New to unRaid. issue with disks missing

Recommended Posts

xisplo

Link to comment

Squid

Link to comment

xisplo

Link to comment

trurl

Link to comment

xisplo

Link to comment

xisplo

Link to comment

itimpi

Link to comment

xisplo

Link to comment

xisplo

Link to comment

Archived