Share Folder Empty - Files still exist and accessible


Recommended Posts

Hi all,

 

When I try and access my movies folder everything is gone, Plex can still access everything from it and I can see the files in MC when browsing disks, but not users. Every time I try and access the folder from windows the "BTRFS error (device md1): parent transid verify failed" appears in the system log.

 

disk1 is reporting errors and i'm not sure why, only last month I removed it from the array formatted it and put it back in, the SMART diagnostics show anything to be concerned about (though I may be wrong) I think this is unrelated because last month I didn't loose a whole share folder like this. I can still browse disk1 through MC

 

This could be linked to a sudden power loss which caused an unclean shutdown.

 

I have ordered a new 8tb red and APC UPS to try and fix the issue but that wont arrive for another 6 days

 

If anyone can help that would be greatly appreciated, i'm running a HP Gen8 Microserver

tower-diagnostics-20181016-2229.zip

unraid-missing-files.jpg

Link to comment
1 hour ago, sawdustfarmer said:

disk1 is reporting errors and i'm not sure why, only last month I removed it from the array formatted it and put it back in

Everything about this sentence is wrong. The disk is DISABLED. Removing a disk from the array, formatting it, and putting it back so that Unraid will use it again isn't actually possible without doing a lot more than just that. Is there any more details you can provide as to exactly what you did?

 

Please seek advice on the forum in the future before doing anything else with the disks until you understand how Unraid works.

 

 

Unraid DISABLES a disk when a write to it fails. Most of the time, I/O failures are not due to any actual disk problem. Bad connections are by far the most frequent cause, sometimes controller issues. Actual disk problems are far less common.

 

When a write to a disk in the parity array fails, Unraid disables the disk, but it still uses the data to update parity. After the disk is disabled, Unraid won't actually use the disk again until it is rebuilt, because its data is out-of-sync and is no longer valid. Instead, Unraid emulates the disk by calculating its data by reading parity plus all the other disks in the array. It not only reads the emulated disk from the parity calculation, but it even writes the emulated disk from the parity calculation by updating parity. So the disabled disk isn't used, but the emulated disk can still be read and written.

 

Also, unless you have more parity disks than disabled disks, you are no longer protected by parity since Unraid requires that parity plus all other disks be working in order to rebuild a disk.

 

Shut down and check all connections. You must rebuild the disk. Do you have a spare?

 

If not there is a method to get Unraid to rebuild to the same disk.

 

 

Link to comment

Disk1 is failing and needs to be replaced.

 

Oct 16 13:20:13 Tower kernel: BTRFS error (device md1): parent transid verify failed on 582139904 wanted 116345 found 116343

 

This is usually a fatal filesystem error, after replacing the disk you'll need to be reformatted the disk and restore the data, you can use this to help backup the data from the emulated disk if needed:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

 

It should also be possible to copy most data from old disk1, though some files will have read errors.

 

 

 

Edited by johnnie.black
Link to comment
Just now, trurl said:

Is this because of SMART attribute 200? I forgot to look at that one.

Yep, that together with recent UNC @ LBA errors (aka read errors):


 

Quote

 

Error 55 [6] occurred at disk power-on lifetime: 12255 hours (510 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 02 b8 00 00 3d d7 a7 b0 e0 00  Error: UNC 696 sectors at LBA = 0x3dd7a7b0 = 1037543344

 

And also how the read errors appear in the syslog:

 

Quote

Oct 14 23:10:37 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Oct 14 23:10:37 Tower kernel: ata1.00: irq_stat 0x40000001
Oct 14 23:10:37 Tower kernel: ata1.00: failed command: READ DMA EXT
Oct 14 23:10:37 Tower kernel: ata1.00: cmd 25/00:40:10:28:ad/00:05:c9:01:00/e0 tag 17 dma 688128 in
Oct 14 23:10:37 Tower kernel:         res 51/40:40:c0:2c:ad/00:05:c9:01:00/e0 Emask 0x9 (media error)
Oct 14 23:10:37 Tower kernel: ata1.00: status: { DRDY ERR }
Oct 14 23:10:37 Tower kernel: ata1.00: error: { UNC }
Oct 14 23:10:37 Tower kernel: ata1.00: configured for UDMA/133
Oct 14 23:10:37 Tower kernel: sd 1:0:0:0: [sdb] tag#17 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Oct 14 23:10:37 Tower kernel: sd 1:0:0:0: [sdb] tag#17 Sense Key : 0x3 [current]
Oct 14 23:10:37 Tower kernel: sd 1:0:0:0: [sdb] tag#17 ASC=0x11 ASCQ=0x4
Oct 14 23:10:37 Tower kernel: sd 1:0:0:0: [sdb] tag#17 CDB: opcode=0x88 88 00 00 00 00 01 c9 ad 28 10 00 00 05 40 00 00
Oct 14 23:10:37 Tower kernel: print_req_error: I/O error, dev sdb, sector 7678535696
Oct 14 23:10:37 Tower kernel: md: disk1 read error, sector=7678535632
Oct 14 23:10:37 Tower kernel: md: disk1 read error, sector=7678535640
Oct 14 23:10:37 Tower kernel: md: disk1 read error, sector=7678535648

media errors almost always = disk problem

 

With those 3 indications I'm 99.99% it's a failing disk, OP can confirm by running an extended SMART test, it should result in "read failure"

Link to comment
8 hours ago, trurl said:

Everything about this sentence is wrong. The disk is DISABLED. Removing a disk from the array, formatting it, and putting it back so that Unraid will use it again isn't actually possible without doing a lot more than just that. Is there any more details you can provide as to exactly what you did?

 

Please seek advice on the forum in the future before doing anything else with the disks until you understand how Unraid works.

 

Sorry that was poorly worded, I used the instructions here to Re-enable the drive (format was the wrong word) https://wiki.unraid.net/Troubleshooting#Re-enable_the_drive What I understood of the extended SMART report everything seemed ok, I checked for a bad connection and it all seemed ok, in hindsight I should have asked somone else just to look over it.

 

 

8 hours ago, johnnie.black said:

Disk1 is failing and needs to be replaced.

 


Oct 16 13:20:13 Tower kernel: BTRFS error (device md1): parent transid verify failed on 582139904 wanted 116345 found 116343

 

This is usually a fatal filesystem error, after replacing the disk you'll need to be reformatted the disk and restore the data, you can use this to help backup the data from the emulated disk if needed:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

 

It should also be possible to copy most data from old disk1, though some files will have read errors.

 

Thank you, I found that topic while trying to research my missing files in my Movies share and thought I better ask for advice before going further. I'll wait until my new 8TB arrives and backup the data.

 

So you think I can't access the files in the Movies Share is because of Disk1 failing, I thought parity would emulate the files when a Disk dies or is disabled and you could use unraid as normal until its replaced (without parity protection of course). Why have I lost access to ALL the files in the Share, but things like Plex and MC can access them? or is this something bigger than just a failing disk.

 

Sorry if these questions seem a bit dumb, im just trying to understand the situation.

 

 

8 hours ago, johnnie.black said:

With those 3 indications I'm 99.99% it's a failing disk, OP can confirm by running an extended SMART test, it should result in "read failure"

 

I started the extended SMART test, i'll get back to you with the results when I get home.

Link to comment
1 hour ago, sawdustfarmer said:

format was the wrong word

Format means "write an empty filesystem to this disk". Many people have confusion over this. Sometimes they even format a disk instead of rebuilding it. And Unraid updates parity for any write operation, including formatting, so after a format the data can't be rebuilt.

Link to comment

I finally copied everything from Disk1 that I could (just under 4TB), a few of files failed to copy but nothing I cant re download.

 

But now I have more issues.

 

My system completed its weekly parity check while I was away and things have gone to custard, I've lost all options to shut down, the page just wont load, I cant download a diagnostics it just fails. All I've got are these screenshots.

unraid-3.jpg

unraid 2.jpg

unraid 1.jpg

Link to comment

Important data is backed up offline and to cloud services.

 

This was my plan, tell me if its a bad idea...

- Remove disk1 (4TB), replace with new 8TB as disk1

- Let the parity rebuild (this is my only option isnt it, I can't completly remove the drive and files because unRAID will see its missing?)

- Create a new Movies share, Movies2 on disk1 (8TB)

- Copy all the files from Movies from disk1 disk2 disk3 to Movies2 on disk1 (8TB)

- Delete Movies share, rename Movies2 to Movies (this will keep the dockers accessing it happy)

 

But after last night experiencing the unRAID GUI not responding and acting wierd after the parity check I'm worried that more is corrupt than just disk1.

in one of my previous screenshots you can see disk2 came back with 399 errors after the parity check, the notification says at 9.59am 'array has 2 disks with read errors' then a few hours later at 6.47pm 'read check finsihed (0 errors)' does that mean they're fixed or they still exist?

 

And then the logging window showing 'tower kernel: md: do_drive_cmd: disk2: ATA_OP e3 ioctl error: -5' is that a sign disk2 has issues

 

After seeing my diagnostics what would you recommend moving foward?

Link to comment
10 hours ago, sawdustfarmer said:

Here's a diagnostics. Do I need to bring the array on-line?

tower-diagnostics-20181022-2335.zip

 

I ended up doing a force shutdown by holding the power button because of the unresponsive/half loaded GUI.

 

After powering the server back on I did the diagnostics while the array was offline, I havent bought the array back online since the force shutdown

Link to comment

errors on Disk2 are now at 0, I have physically removed Disk1 there was an odd sound coming from one of the drives, I suspected it was Disk1 it seems to have gone after removing Disk1, i put it in as an unassigned drive and grabbed a file or 2 off it with no issue hoping to hear the noise again (one of the files I grabbed I couldn't copy before and it seems ok, does that mean its corrupt on the parity and not in the disk1?)


In other news my Movies share folder is accessible again?!
 

What do you think is the best method in getting everything going again, just rebuild the parity?

tower-diagnostics-20181023-2202.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.