Everything Has Gone To Hell - General Problems (losing data, slow speed, crashing) - General Support

July 29, 20178 yr

System Hardware:

Case: Lian-Li PC-A77F
Motherboard: SuperMicro X9SCM-F
Processor: Intel Core i3-2120 Sandy Bridge Dual-Core 3.3 GHz LGA 1155
RAM: Kingston 8GB
SATA Extender: Supermicro AOC-SASLP-MV8 SATA Extender
Hot-swap Bays: SuperMicro CSE-M35T-1B (x 2)
Power Supply: CORSAIR Enthusiast Series TX750M
USB (unRAID): Lexar JD Firefly 8GB

Hard Drives:

Seagate BarraCuda - ST3000DM008-2DM1 - 3.00TB
Seagate BarraCuda - ST3000DM001-1ER1 - 3.00TB
Western Digital Green - WDC WD10EADS-00L - 1.00TB
Western Digital Green - WDC WD30EZRX-00S - 3.00TB
Seagate BarraCuda - ST3000DM001-1CH1 - 3.00TB
Western Digital Green - WDC WD20EARS-00M - 2.00TB
Western Digital Green - WDC WD30EZRX-00D - 3.00TB
Western Digital Green - WDC WD20EADS-00R - 2.00TB
Seagate BarraCuda - ST31500341AS - 1.50TB
Western Digital Green - WDC WD30EZRX-19D - 3.00TB

Problems:

All dockers are sluggish or non-responsive. Often they will completely crash minutes after starting up the server. Other times they will run for a few days. This is a screenshot of all of them failing to start this morning after a reboot.
The main system will occasionaly become sluggish and it takes a good 30 seconds to use the interface before I can reboot (I mean, I'll click on a link on the WebUI and nothing will happen for 20-30 seconds).
Data is disappearing at random. My system is used almost exclusively for media downloading, storage, and management. Lately, I've started to notice that episodes from shows are missing at random. When I go to the location on the drives where these are supposed to be, I'll often see the filename followed by "partial" at the end.
I'm getting errors that I don't understand. As I'm not THAT technical, some errors I'm not sure on the severity of. The system was working, so I (stupidly) tended to ignore errors that would pop up that didn't appear critical. It is likely that some of my drives are failing and I'm just not aware of it. It is also possible that my USB stick on which unRAID runs is failing (though that is less likely).

I've posted a few times over the past few days and haven't had many responses. Those that I have received, the people have been most gracious, but the problems always re-emerge.

I've also asked in Reddit:

I'm getting overwhelmed here and I don't know where to start.

I do have some new components on the way to replace the heart of the system:

Motherboard: Asus X99-A II ATX LGA2011-3 Motherboard
Processor: Intel Core i7-6850K 3.6GHz 6-Core Processor
Cooler: Corsair H100i v2 70.7 CFM Liquid CPU Cooler
RAM: G.Skill Ripjaws V Series 16GB (2 x 8GB) DDR4-3200 Memory

But it will be probably 8-12 weeks before I can implement them. Hopefully I'll be able to add 1 or 2 cache drives to it at that point as well. In addition, 2 more hot-swap bays, another SATA extender if needed, and a few more drives.

I'm attaching all the diagnostic information I can think of from the system:

Two diagnostic reports from yesterday and today
Smart reports for all the drives
A system log

**Please, I'm in way over my head and getting depressed and frustrated as these symptoms are getting worse. If someone could spend a little time helping me figure out what the hell is going on here, I would really really appreciate it.**

Thank you for your time.

tower-smart-20170729-0943 (1).zip

tower-smart-20170729-0943 (2).zip

tower-smart-20170729-0943 (3).zip

tower-smart-20170729-0943 (4).zip

tower-smart-20170729-0943 (5).zip

tower-smart-20170729-0943 (6).zip

tower-smart-20170729-0943.zip

tower-smart-20170729-0944 (1).zip

tower-smart-20170729-0944.zip

tower-syslog-20170729-0942.zip

tower-diagnostics-20170728-0006.zip

tower-diagnostics-20170729-0941.zip

tower-smart-20170729-0942.zip

Quote

July 29, 20178 yr

Jul 29 08:54:56 Tower kernel: REISERFS error (device md4): reiserfs-2025 reiserfs_cache_bitmap_metadata: bitmap block 4849664 is corrupted: first bit must be 1
Jul 29 08:54:56 Tower kernel: REISERFS (device md4): Remounting filesystem read-only
Jul 29 08:54:56 Tower kernel: REISERFS warning (device md4): clm-6006 reiserfs_dirty_inode: writing inode 49051 on readonly FS

You need to Check Disk File System on disk 4

Additionally, ReiserFS has been known since day 1 to be terrible on drives that are 90%+ full (very, very slow). You really should convert over to XFS if possible. (Beyond that, Reiser has been troublesome as its an old filesystem and no longer being maintained in the Kernel)

Also, no need to separately upload the smart reports. The Diagnostics Zip has everything in it.

Quote

July 29, 20178 yr

Author

21 minutes ago, Squid said:
Jul 29 08:54:56 Tower kernel: REISERFS error (device md4): reiserfs-2025 reiserfs_cache_bitmap_metadata: bitmap block 4849664 is corrupted: first bit must be 1
Jul 29 08:54:56 Tower kernel: REISERFS (device md4): Remounting filesystem read-only
Jul 29 08:54:56 Tower kernel: REISERFS warning (device md4): clm-6006 reiserfs_dirty_inode: writing inode 49051 on readonly FS
You need to Check Disk File System on disk 4

Additionally, ReiserFS has been known since day 1 to be terrible on drives that are 90%+ full (very, very slow). You really should convert over to XFS if possible. (Beyond that, Reiser has been troublesome as its an old filesystem and no longer being maintained in the Kernel)

Also, no need to separately upload the smart reports. The Diagnostics Zip has everything in it.

Thank you so much for the quick reply! I will start the check disk file system in a moment after this file transfer completes. Hopefully it will be clear as to what I have to do if there are problems.

I have been slowly converting to XFS, but I didn't realize that ReiserFS had such trouble with drives that are almost full. I will see what I can do about getting them converted. The trouble right now is that, I'm out of space. I don't have anywhere to put the data to give myself a blank drive. If the report on disk 4 indicates that I need to replace it, perhaps I could get a larger one and use that... I'd have to replace my parity as well...

Quote

July 29, 20178 yr

Disk 2 is way out of my comfort zone

  5 Reallocated_Sector_Ct   0x0033   066   066   036    Pre-fail  Always       -       1398

Disk 4 has problems

197 Current_Pending_Sector  0x0032   198   196   000    Old_age   Always       -       1090

Everything else looks good

Quote

July 29, 20178 yr

Community Expert

Squid already gave you good advice, just wanted to add, first replace disk4 and only then run reiserfsck, and make sure you enable notifications, disk4 pending sectors are probably the #1 reason for your issues and you'd receive a warning about that.

Quote

July 29, 20178 yr

Author

Here are the results:

reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/md4
Will put log info to 'stdout'
###########
reiserfsck --check started at Sat Jul 29 10:26:17 2017
###########
Replaying journal: Trans replayed: mountid 142, transid 184259, desc 6013, len 6, commit 6020, next trans offset 6003

Replaying journal: |                                        |  0.1%  1 trans
Trans replayed: mountid 142, transid 184260, desc 6021, len 1, commit 6023, next trans offset 6006

Replaying journal: |                                        /  0.2%  2 trans
Trans replayed: mountid 142, transid 184261, desc 6024, len 1, commit 6026, next trans offset 6009
Trans replayed: mountid 142, transid 184262, desc 6027, len 1, commit 6029, next trans offset 6012
Trans replayed: mountid 142, transid 184263, desc 6030, len 1, commit 6032, next trans offset 6015
Trans replayed: mountid 142, transid 184264, desc 6033, len 1, commit 6035, next trans offset 6018
Trans replayed: mountid 142, transid 184265, desc 6036, len 13, commit 6050, next trans offset 6033

Replaying journal: |                                        -  0.6%  7 trans

                                                                                

Replaying journal: Done.
Reiserfs journal '/dev/md4' in blocks [18..8211]: 7 transactions replayed
Checking internal tree..  finished
Comparing bitmaps..Bad nodes were found, Semantic pass skipped
6 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Sat Jul 29 10:38:16 2017
###########
block 5504537: The level of the node (47074) is not correct, (1) expected
 the problem in the internal node occured (5504537), whole subtree is skipped
block 1946122: The level of the node (2) is not correct, (1) expected
 the problem in the internal node occured (1946122), whole subtree is skipped
block 4800674: The level of the node (47801) is not correct, (1) expected
 the problem in the internal node occured (4800674), whole subtree is skipped
block 558244350: The level of the node (40478) is not correct, (1) expected
 the problem in the internal node occured (558244350), whole subtree is skipped
block 5174938: The level of the node (59681) is not correct, (1) expected
 the problem in the internal node occured (5174938), whole subtree is skipped
block 5174937: The level of the node (25567) is not correct, (1) expected
 the problem in the internal node occured (5174937), whole subtree is skipped
vpf-10640: The on-disk and the correct bitmaps differs.

So disk 4 definitely needs replacing then? Should I run the same test on disk 2?

Quote

July 29, 20178 yr

Community Expert

It would not be a bad idea to run file system checks on all drives checking for file system corruption just to be safe. Just because a drive is not reporting problems at the SMART level does not mean file system corruption cannot have occurred. I personally run checks once a month to try and pre-emptively spot any issues. You will find that the checks on XFS drives run MUCH faster that those on ReiserFS drives so it is less of a hassle to do such drives.

Quote

July 29, 20178 yr

Author

Should I run the --rebuild-tree command as it suggests?

Quote

July 29, 20178 yr

Community Expert

After the disk is replaced.

Quote

July 29, 20178 yr

Author

This is what it gave me for disk 2:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

Does that mean it's ok?

It's really hard to interpret these logs

Quote

July 29, 20178 yr

Author

Well shit. Now disk 4 is "unmountable"

I'm gonna go out right now and buy a replacement drive.

Do I need to pre-clear for it to repair when I put it in?

Edited July 29, 20178 yr by Netbug

Quote

July 29, 20178 yr

Author

Ordered 2 Seagate BarraCuda 4TB 3.5-Inch SATA III 6 Gb/s Internal Hard Drive (ST4000DM005) . They should arrive tomorrow with Prime.

Do I replace the parity drive first? I'm not sure what to do here since their larger than my parity drive at present.

Quote

July 29, 20178 yr

Community Expert

Disk2 seems fine, but sometimes it's hard to say with xfs_repair, in doubt run without -n (no modify flag).

As for the disk replacement, you'll need to do the parity swap procedure, you can then use old parity or the new disk to rebuild disk4.

https://wiki.lime-technology.com/The_parity_swap_procedure

Quote

July 29, 20178 yr

Author

Sorry, I'm still a little confused by the next step.

If drive 4 is unmountable, do I not have to replace that BEFORE I can replace the parity? I will have to purchase a third drive (3TB) if that is the case.

How can it rebuild the parity drive if one from the array is unmountable?

Edited July 29, 20178 yr by Netbug

Quote

July 29, 20178 yr

Community Expert

Unmountable is a filesystem problem, it has nothing to do with disk rebuilding, if you don't have have a 3TB disk you'll need to do a parity swap, then rebuild disk4, and only then run reiserfsck to fix the filesystem, if you can get a 3TB disk first replace disk4, then run reiserfsck.

Quote

July 29, 20178 yr

Author

3 minutes ago, johnnie.black said:

Unmountable is a filesystem problem, it has nothing to do with disk rebuilding, if you don't have have a 3TB disk you'll need to do a parity swap, then rebuild disk4, and only then run reiserfsck to fix the filesystem, if you can get a 3TB disk first replace disk4, then run reiserfsck.

I'm so sorry. Forgive my confusion here.

How can I do a parity swap, when disk 4 is missing? Like, how can it rebuild when both disk 4 (unmountable) and the parity drive (removed to swap) are missing? Can it still use the data from that drive even when it's unmountable?

Quote

July 29, 20178 yr

Community Expert

Parity swap copies data from old parity to new parity, not other disks are involved, then disk 4 is rebuilt (still unmontable) using the new parity and all other disks, only after all that you'll deal with the filesystem problem, and no you can't access disk4 data while it's unmountable.

Quote

July 29, 20178 yr

Author

Ah! ok. That makes sense. I'll need somewhere to mount the parity drive then while it is copying. Can I do it as follows then?

Remove one of the other drives (say disk 4)
Place the new drive in that spot
Do the parity duplication
Remove the old parity drive
Put the new parity drive where the old one was
Then put the old parity drive where disk 4 was and rebuild after a pre-clear?

Quote

July 29, 20178 yr

Community Expert

You can't do any of that, unRAID does the copying, you just need to follow the instructions I linked earlier for the parity swap procedure.

Quote

July 29, 20178 yr

3 hours ago, Netbug said:

System Hardware:

SATA Extender: Supermicro AOC-SASLP-MV8 SATA Extender

Your server contains a disk controller (e.g., SASLP, SASLP2) based on a Marvell chip. The Marvell chips contain a defect that can cause drives to drop offline, parity errors, and even data corruption. (unRAID can't fix a controller chip.) Consider a replacement controller like the LSI SAS9201-8i, LSI SAS9211-8i, IBM M1015, or Dell H310. Read this post https://forums.lime-technology.com/topic/39003-marvell-disk-controller-chipsets-and-virtualizationfor more information on the problem and potential workarounds. If you are not experiencing problems, you may be able to safely ignore this warning, but educate yourself to make that determination.

Quote

July 29, 20178 yr

Author

3 hours ago, johnnie.black said:

You can't do any of that, unRAID does the copying, you just need to follow the instructions I linked earlier for the parity swap procedure.

Ok. I think I understand now. Mostly.

I went and bought a 3TB drive and just replaced disk 4. It is now rebuilding that drive. I assume, since it's doing a rebuild, it's going to put it back to reiserfs, which is annoying, but unavoidable at this point.

I have 2 4TB drives on the way that should arrive on Wednesday. When they arrive, I'll replace the smallest drive in the array with one of them, and replace the parity drive (using the instructions you linked) with the other one. That SHOULD give me enough room to use unBALANCE and free up enough space to convert the file systems of the remaining drives to xfs.

Fingers crossed.

3 hours ago, bjp999 said:

Your server contains a disk controller (e.g., SASLP, SASLP2) based on a Marvell chip. The Marvell chips contain a defect that can cause drives to drop offline, parity errors, and even data corruption. (unRAID can't fix a controller chip.) Consider a replacement controller like the LSI SAS9201-8i, LSI SAS9211-8i, IBM M1015, or Dell H310. Read this post https://forums.lime-technology.com/topic/39003-marvell-disk-controller-chipsets-and-virtualizationfor more information on the problem and potential workarounds. If you are not experiencing problems, you may be able to safely ignore this warning, but educate yourself to make that determination.

That's good information to have. I believe that I need to get a new SATA controller anyways as I'm almost out of ports on this one (if I recall correctly). I'll look into a different make/model when I do. Thank you.

Quote

July 29, 20178 yr

Community Expert

32 minutes ago, Netbug said:

Ok. I think I understand now. Mostly.

I went and bought a 3TB drive and just replaced disk 4. It is now rebuilding that drive. I assume, since it's doing a rebuild, it's going to put it back to reiserfs, which is annoying, but unavoidable at this point.

I have 2 4TB drives on the way that should arrive on Wednesday. When they arrive, I'll replace the smallest drive in the array with one of them, and replace the parity drive (using the instructions you linked) with the other one. That SHOULD give me enough room to use unBALANCE and free up enough space to convert the file systems of the remaining drives to xfs.

Fingers crossed.

That's good information to have. I believe that I need to get a new SATA controller anyways as I'm almost out of ports on this one (if I recall correctly). I'll look into a different make/model when I do. Thank you.

Note that you will have to replace the parity drive first with a 4TB drive as you can never have a data disk that is larger than the parity drive. Once that has been done successfully you can add the new 4TB data drive.

Quote

July 30, 20178 yr

Author

I replaced the bad drive (Disk 4), and ran the parity check. It said it was repairing (left it to do so overnight).

But I'm still showing the disk as "unmountable"

What did I do wrong? How do I rebuild that disk?

Quote

July 30, 20178 yr

Community Expert

That's normal, when the rebuild finishes run reiserfsck again.

Quote

July 30, 20178 yr

Author

4 minutes ago, johnnie.black said:

That's normal, when the rebuild finishes run reiserfsck again.

Okie. Running that again now with the --check flag first.

Quote

Everything Has Gone To Hell - General Problems (losing data, slow speed, crashing)

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)