Not sure if backplane is playing up


Recommended Posts

I was getting issues with disks being disabled when i was using an internal caddy, i ended up buying a 24 bay rackmount server case and seemed to be fine, but my drive 3 has been disabled again, im thinking just let it rebuild ?...i hope its not the back plane, its a new case..maybe the drive is failing ?

warptower-smart-20210418-1613 (1).zip warptower-diagnostics-20210418-1525.zip

Edited by loady
Link to comment

Disk 3 has file system corruption that causes a call trace when Unraid tries to mount it, so I'd try to repair that first. That's a separate issue from the disk being disabled. I notice it's the older XFS V4 format, not the current V5. The SMART report for the disk looks OK, except for the reported power cycle count of 32823 - has it had power cable problems? In fact, other counters look suspicious too - only 798 power on hours, for example. Did you know that that particular model is so notorious it has its own Wikipedia article?

 

Link to comment
On 4/18/2021 at 7:08 PM, John_M said:

Disk 3 has file system corruption that causes a call trace when Unraid tries to mount it, so I'd try to repair that first. That's a separate issue from the disk being disabled. I notice it's the older XFS V4 format, not the current V5. The SMART report for the disk looks OK, except for the reported power cycle count of 32823 - has it had power cable problems? In fact, other counters look suspicious too - only 798 power on hours, for example. Did you know that that particular model is so notorious it has its own Wikipedia article?

 

That disk was the one that kept getting spat out and disabled when i had it in the 5.25 bay to 5x 3.5" HDD enclosure, i beleive it had issues, the disks now reside in a new 24 bay rack mount server running from a SAS raid card, everything had been running fine, i do turn the server on and off as i need it as it is mainly a media server, i am however thinking of using it to run a windows VM and get rid of my desktop taking up space.

 

How do i fix the corrupt filesystem and go about getting the drive rebuilt ?

Link to comment
On 4/19/2021 at 9:17 PM, John_M said:

Thanks, i read that through and started a check with 'n' prefix, i read that it can take a long time, not sure if its finished already though ? it was reeling lots of text then stopped, the read counts for drive dont seem to be moving. I just pasted the very last part

 

entry "WeVILLAGE_DOOR_UNBREAK24_SF.xxx" at block 17 offset 3376 in directory inode 14068600723 references non-existent inode 13626909168
	would clear inode number in entry at offset 3376...
entry "WeWASP_QUEEN_HEAD_01_SF.xxx" at block 17 offset 3424 in directory inode 14068600723 references non-existent inode 13626909169
	would clear inode number in entry at offset 3424...
entry "WeWasp_Weapon_SF.xxx" at block 17 offset 3464 in directory inode 14068600723 references non-existent inode 13626909170
	would clear inode number in entry at offset 3464...
entry "WeWHISPER_STAFF_SF.xxx" at block 17 offset 3496 in directory inode 14068600723 references non-existent inode 13626909171
	would clear inode number in entry at offset 3496...
entry "WeWill_Users_Sceptre_SF.xxx" at block 17 offset 3536 in directory inode 14068600723 references non-existent inode 13626909172
	would clear inode number in entry at offset 3536...
entry "WeWINDMILL_SAILS_01_SF.xxx" at block 17 offset 3576 in directory inode 14068600723 references non-existent inode 13626909173
	would clear inode number in entry at offset 3576...
entry "WeWINE_SKIN_01_SF.xxx" at block 17 offset 3616 in directory inode 14068600723 references non-existent inode 13626909174
	would clear inode number in entry at offset 3616...
entry "WeWITCH_SPOON_01_SF.xxx" at block 17 offset 3648 in directory inode 14068600723 references non-existent inode 13626909175
	would clear inode number in entry at offset 3648...
entry "WeWOODEN_LAMP_01_OFF_SF.xxx" at block 17 offset 3688 in directory inode 14068600723 references non-existent inode 13626909176
	would clear inode number in entry at offset 3688...
entry "WeWOODEN_LAMP_01_ON_SF.xxx" at block 17 offset 3728 in directory inode 14068600723 references non-existent inode 13626909177
	would clear inode number in entry at offset 3728...
entry "WeYOUNGHERO_02_SF.xxx" at block 17 offset 3768 in directory inode 14068600723 references non-existent inode 13626909178
	would clear inode number in entry at offset 3768...
entry "WeYOUNG_SISTER_01_SF.xxx" at block 17 offset 3800 in directory inode 14068600723 references non-existent inode 13626909179
	would clear inode number in entry at offset 3800...
No modify flag set, skipping phase 5
Inode allocation btrees are too corrupted, skipping phases 6 and 7
No modify flag set, skipping filesystem flush and exiting.

 

Edited by loady
Link to comment
5 hours ago, John_M said:

It said "exiting" so it has finished the check. You need to do it again without the "-n" option so that it can actually make the repairs.

 

Yes, what i thought, but i was saying it can take hours where i was reading the checking phase, just want to be 100% i am doing it correctly.

 

now i got this when i removed the -n prefix....

 

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

 

Now it says on the main page that the filesystem is unmountable, should i go straight in with -L option ?

Link to comment
1 hour ago, JorgeB said:

Yes.

 

Right i did it, it has finished already as it says done at the bottom...is that normal ?, when i was reading the guide for repair it was saying it can take a long time (hours) but it seems to have done the trick, i started the array and the 'unmountable' messages have gone and the disk is just disabled and contents emulated, does this mean i just rebuild that disk now ?

 

EDIT: Disk is now rebuilding, fingers crossed it was a previous power issue to that disk.

 

Thanks for the help and pointers to the info.

Edited by loady
Link to comment
  • loady changed the title to [SOLVED] Not sure if backplane is playing up
1 hour ago, loady said:

when i was reading the guide for repair it was saying it can take a long time (hours) but it seems to have done the trick

 

XFS repairs are normally a matter of minutes.  Perhaps the section you saw was referring to ReiserFS repairs as they can definitely take hours.  

Link to comment
  • 2 weeks later...
10 hours ago, JorgeB said:

If it's the same disk use a different slot or swap with another disk and see where the problem follows.

 

Data rebuild finished with a million errors. the slots are SAS and the raid card is fully utilised with 8 drives, i could maybe connect drive 3 to a sata port on the motherboard.

 

From what i can see the drive is no longer disabled and has rebuilt but what do i do about all the errors ?

warptower-diagnostics-20210505-2253.zip

Link to comment
  • loady changed the title to Not sure if backplane is playing up
26 minutes ago, JorgeB said:

Parity won't be valid because of all the read errors, do this:

and try again.

When you say try again, do you mean invoke a rebuild of the drive or do a parity check, the box to correct errors to parity is checked, is there a certain way I need to do this or just pull the drive and connect to the sata 

Link to comment
1 hour ago, loady said:

invoke a rebuild of the drive or do a parity check, the box to correct errors to parity is checked,

Either one will do, but if there are many errors a sync will be faster.

 

1 hour ago, loady said:

or just pull the drive and connect to the sata 

This.

Link to comment
On 5/6/2021 at 9:33 AM, JorgeB said:

Either one will do, but if there are many errors a sync will be faster.

 

This.

 

i rebooted the server after the rebuild and and the errors went, it said zero errors...so i didnt change the port...its been fine for a week..currently its taking reads and write as i use the server and i know its that disk being used...just know i heard that noise a HDD makes when you pull power...that chirp sound...i just looked in the main section and its again disabled...is the disk dying...is there anything else i can do to test and post here for you good people to see ?...there have been no writes to it since it did this so i am just going to add it back in without doing a sync.

 

EDIT: actually...i think i will just buy another 3TB and find out once and for all if its the drive or a power issue....can anyone recommend a elatively cheap 3TB appropriate for use in my server ?

Edited by loady
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.