July 29, 20178 yr System Hardware: Case: Lian-Li PC-A77F Motherboard: SuperMicro X9SCM-F Processor: Intel Core i3-2120 Sandy Bridge Dual-Core 3.3 GHz LGA 1155 RAM: Kingston 8GB SATA Extender: Supermicro AOC-SASLP-MV8 SATA Extender Hot-swap Bays: SuperMicro CSE-M35T-1B (x 2) Power Supply: CORSAIR Enthusiast Series TX750M USB (unRAID): Lexar JD Firefly 8GB Hard Drives: Seagate BarraCuda - ST3000DM008-2DM1 - 3.00TB Seagate BarraCuda - ST3000DM001-1ER1 - 3.00TB Western Digital Green - WDC WD10EADS-00L - 1.00TB Western Digital Green - WDC WD30EZRX-00S - 3.00TB Seagate BarraCuda - ST3000DM001-1CH1 - 3.00TB Western Digital Green - WDC WD20EARS-00M - 2.00TB Western Digital Green - WDC WD30EZRX-00D - 3.00TB Western Digital Green - WDC WD20EADS-00R - 2.00TB Seagate BarraCuda - ST31500341AS - 1.50TB Western Digital Green - WDC WD30EZRX-19D - 3.00TB Problems: All dockers are sluggish or non-responsive. Often they will completely crash minutes after starting up the server. Other times they will run for a few days. This is a screenshot of all of them failing to start this morning after a reboot. The main system will occasionaly become sluggish and it takes a good 30 seconds to use the interface before I can reboot (I mean, I'll click on a link on the WebUI and nothing will happen for 20-30 seconds). Data is disappearing at random. My system is used almost exclusively for media downloading, storage, and management. Lately, I've started to notice that episodes from shows are missing at random. When I go to the location on the drives where these are supposed to be, I'll often see the filename followed by "partial" at the end. I'm getting errors that I don't understand. As I'm not THAT technical, some errors I'm not sure on the severity of. The system was working, so I (stupidly) tended to ignore errors that would pop up that didn't appear critical. It is likely that some of my drives are failing and I'm just not aware of it. It is also possible that my USB stick on which unRAID runs is failing (though that is less likely). I've posted a few times over the past few days and haven't had many responses. Those that I have received, the people have been most gracious, but the problems always re-emerge. Post asking about Sonarr problems Post asking about how to check disk health Post asking about possibility of problems I've also asked in Reddit: Post asking about a Sonarr docker error in the Sonarr subreddit Cross-post asking the same question in the unRAID subreddit I'm getting overwhelmed here and I don't know where to start. I do have some new components on the way to replace the heart of the system: Motherboard: Asus X99-A II ATX LGA2011-3 Motherboard Processor: Intel Core i7-6850K 3.6GHz 6-Core Processor Cooler: Corsair H100i v2 70.7 CFM Liquid CPU Cooler RAM: G.Skill Ripjaws V Series 16GB (2 x 8GB) DDR4-3200 Memory But it will be probably 8-12 weeks before I can implement them. Hopefully I'll be able to add 1 or 2 cache drives to it at that point as well. In addition, 2 more hot-swap bays, another SATA extender if needed, and a few more drives. I'm attaching all the diagnostic information I can think of from the system: Two diagnostic reports from yesterday and today Smart reports for all the drives A system log **Please, I'm in way over my head and getting depressed and frustrated as these symptoms are getting worse. If someone could spend a little time helping me figure out what the hell is going on here, I would really really appreciate it.** Thank you for your time. tower-smart-20170729-0943 (1).zip tower-smart-20170729-0943 (2).zip tower-smart-20170729-0943 (3).zip tower-smart-20170729-0943 (4).zip tower-smart-20170729-0943 (5).zip tower-smart-20170729-0943 (6).zip tower-smart-20170729-0943.zip tower-smart-20170729-0944 (1).zip tower-smart-20170729-0944.zip tower-syslog-20170729-0942.zip tower-diagnostics-20170728-0006.zip tower-diagnostics-20170729-0941.zip tower-smart-20170729-0942.zip
July 29, 20178 yr Jul 29 08:54:56 Tower kernel: REISERFS error (device md4): reiserfs-2025 reiserfs_cache_bitmap_metadata: bitmap block 4849664 is corrupted: first bit must be 1 Jul 29 08:54:56 Tower kernel: REISERFS (device md4): Remounting filesystem read-only Jul 29 08:54:56 Tower kernel: REISERFS warning (device md4): clm-6006 reiserfs_dirty_inode: writing inode 49051 on readonly FS You need to Check Disk File System on disk 4 Additionally, ReiserFS has been known since day 1 to be terrible on drives that are 90%+ full (very, very slow). You really should convert over to XFS if possible. (Beyond that, Reiser has been troublesome as its an old filesystem and no longer being maintained in the Kernel) Also, no need to separately upload the smart reports. The Diagnostics Zip has everything in it.
July 29, 20178 yr Author 21 minutes ago, Squid said: Jul 29 08:54:56 Tower kernel: REISERFS error (device md4): reiserfs-2025 reiserfs_cache_bitmap_metadata: bitmap block 4849664 is corrupted: first bit must be 1 Jul 29 08:54:56 Tower kernel: REISERFS (device md4): Remounting filesystem read-only Jul 29 08:54:56 Tower kernel: REISERFS warning (device md4): clm-6006 reiserfs_dirty_inode: writing inode 49051 on readonly FS You need to Check Disk File System on disk 4 Additionally, ReiserFS has been known since day 1 to be terrible on drives that are 90%+ full (very, very slow). You really should convert over to XFS if possible. (Beyond that, Reiser has been troublesome as its an old filesystem and no longer being maintained in the Kernel) Also, no need to separately upload the smart reports. The Diagnostics Zip has everything in it. Thank you so much for the quick reply! I will start the check disk file system in a moment after this file transfer completes. Hopefully it will be clear as to what I have to do if there are problems. I have been slowly converting to XFS, but I didn't realize that ReiserFS had such trouble with drives that are almost full. I will see what I can do about getting them converted. The trouble right now is that, I'm out of space. I don't have anywhere to put the data to give myself a blank drive. If the report on disk 4 indicates that I need to replace it, perhaps I could get a larger one and use that... I'd have to replace my parity as well...
July 29, 20178 yr Disk 2 is way out of my comfort zone 5 Reallocated_Sector_Ct 0x0033 066 066 036 Pre-fail Always - 1398 Disk 4 has problems 197 Current_Pending_Sector 0x0032 198 196 000 Old_age Always - 1090 Everything else looks good
July 29, 20178 yr Community Expert Squid already gave you good advice, just wanted to add, first replace disk4 and only then run reiserfsck, and make sure you enable notifications, disk4 pending sectors are probably the #1 reason for your issues and you'd receive a warning about that.
July 29, 20178 yr Author Here are the results: reiserfsck 3.6.24 Will read-only check consistency of the filesystem on /dev/md4 Will put log info to 'stdout' ########### reiserfsck --check started at Sat Jul 29 10:26:17 2017 ########### Replaying journal: Trans replayed: mountid 142, transid 184259, desc 6013, len 6, commit 6020, next trans offset 6003 Replaying journal: | | 0.1% 1 trans Trans replayed: mountid 142, transid 184260, desc 6021, len 1, commit 6023, next trans offset 6006 Replaying journal: | / 0.2% 2 trans Trans replayed: mountid 142, transid 184261, desc 6024, len 1, commit 6026, next trans offset 6009 Trans replayed: mountid 142, transid 184262, desc 6027, len 1, commit 6029, next trans offset 6012 Trans replayed: mountid 142, transid 184263, desc 6030, len 1, commit 6032, next trans offset 6015 Trans replayed: mountid 142, transid 184264, desc 6033, len 1, commit 6035, next trans offset 6018 Trans replayed: mountid 142, transid 184265, desc 6036, len 13, commit 6050, next trans offset 6033 Replaying journal: | - 0.6% 7 trans Replaying journal: Done. Reiserfs journal '/dev/md4' in blocks [18..8211]: 7 transactions replayed Checking internal tree.. finished Comparing bitmaps..Bad nodes were found, Semantic pass skipped 6 found corruptions can be fixed only when running with --rebuild-tree ########### reiserfsck finished at Sat Jul 29 10:38:16 2017 ########### block 5504537: The level of the node (47074) is not correct, (1) expected the problem in the internal node occured (5504537), whole subtree is skipped block 1946122: The level of the node (2) is not correct, (1) expected the problem in the internal node occured (1946122), whole subtree is skipped block 4800674: The level of the node (47801) is not correct, (1) expected the problem in the internal node occured (4800674), whole subtree is skipped block 558244350: The level of the node (40478) is not correct, (1) expected the problem in the internal node occured (558244350), whole subtree is skipped block 5174938: The level of the node (59681) is not correct, (1) expected the problem in the internal node occured (5174938), whole subtree is skipped block 5174937: The level of the node (25567) is not correct, (1) expected the problem in the internal node occured (5174937), whole subtree is skipped vpf-10640: The on-disk and the correct bitmaps differs. So disk 4 definitely needs replacing then? Should I run the same test on disk 2?
July 29, 20178 yr Community Expert It would not be a bad idea to run file system checks on all drives checking for file system corruption just to be safe. Just because a drive is not reporting problems at the SMART level does not mean file system corruption cannot have occurred. I personally run checks once a month to try and pre-emptively spot any issues. You will find that the checks on XFS drives run MUCH faster that those on ReiserFS drives so it is less of a hassle to do such drives.
July 29, 20178 yr Author This is what it gave me for disk 2: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Does that mean it's ok? It's really hard to interpret these logs
July 29, 20178 yr Author Well shit. Now disk 4 is "unmountable" I'm gonna go out right now and buy a replacement drive. Do I need to pre-clear for it to repair when I put it in? Edited July 29, 20178 yr by Netbug
July 29, 20178 yr Author Ordered 2 Seagate BarraCuda 4TB 3.5-Inch SATA III 6 Gb/s Internal Hard Drive (ST4000DM005) . They should arrive tomorrow with Prime. Do I replace the parity drive first? I'm not sure what to do here since their larger than my parity drive at present.
July 29, 20178 yr Community Expert Disk2 seems fine, but sometimes it's hard to say with xfs_repair, in doubt run without -n (no modify flag). As for the disk replacement, you'll need to do the parity swap procedure, you can then use old parity or the new disk to rebuild disk4. https://wiki.lime-technology.com/The_parity_swap_procedure
July 29, 20178 yr Author Sorry, I'm still a little confused by the next step. If drive 4 is unmountable, do I not have to replace that BEFORE I can replace the parity? I will have to purchase a third drive (3TB) if that is the case. How can it rebuild the parity drive if one from the array is unmountable? Edited July 29, 20178 yr by Netbug
July 29, 20178 yr Community Expert Unmountable is a filesystem problem, it has nothing to do with disk rebuilding, if you don't have have a 3TB disk you'll need to do a parity swap, then rebuild disk4, and only then run reiserfsck to fix the filesystem, if you can get a 3TB disk first replace disk4, then run reiserfsck.
July 29, 20178 yr Author 3 minutes ago, johnnie.black said: Unmountable is a filesystem problem, it has nothing to do with disk rebuilding, if you don't have have a 3TB disk you'll need to do a parity swap, then rebuild disk4, and only then run reiserfsck to fix the filesystem, if you can get a 3TB disk first replace disk4, then run reiserfsck. I'm so sorry. Forgive my confusion here. How can I do a parity swap, when disk 4 is missing? Like, how can it rebuild when both disk 4 (unmountable) and the parity drive (removed to swap) are missing? Can it still use the data from that drive even when it's unmountable?
July 29, 20178 yr Community Expert Parity swap copies data from old parity to new parity, not other disks are involved, then disk 4 is rebuilt (still unmontable) using the new parity and all other disks, only after all that you'll deal with the filesystem problem, and no you can't access disk4 data while it's unmountable.
July 29, 20178 yr Author Ah! ok. That makes sense. I'll need somewhere to mount the parity drive then while it is copying. Can I do it as follows then? Remove one of the other drives (say disk 4) Place the new drive in that spot Do the parity duplication Remove the old parity drive Put the new parity drive where the old one was Then put the old parity drive where disk 4 was and rebuild after a pre-clear?
July 29, 20178 yr Community Expert You can't do any of that, unRAID does the copying, you just need to follow the instructions I linked earlier for the parity swap procedure.
July 29, 20178 yr 3 hours ago, Netbug said: System Hardware: SATA Extender: Supermicro AOC-SASLP-MV8 SATA Extender Your server contains a disk controller (e.g., SASLP, SASLP2) based on a Marvell chip. The Marvell chips contain a defect that can cause drives to drop offline, parity errors, and even data corruption. (unRAID can't fix a controller chip.) Consider a replacement controller like the LSI SAS9201-8i, LSI SAS9211-8i, IBM M1015, or Dell H310. Read this post https://forums.lime-technology.com/topic/39003-marvell-disk-controller-chipsets-and-virtualizationfor more information on the problem and potential workarounds. If you are not experiencing problems, you may be able to safely ignore this warning, but educate yourself to make that determination.
July 29, 20178 yr Author 3 hours ago, johnnie.black said: You can't do any of that, unRAID does the copying, you just need to follow the instructions I linked earlier for the parity swap procedure. Ok. I think I understand now. Mostly. I went and bought a 3TB drive and just replaced disk 4. It is now rebuilding that drive. I assume, since it's doing a rebuild, it's going to put it back to reiserfs, which is annoying, but unavoidable at this point. I have 2 4TB drives on the way that should arrive on Wednesday. When they arrive, I'll replace the smallest drive in the array with one of them, and replace the parity drive (using the instructions you linked) with the other one. That SHOULD give me enough room to use unBALANCE and free up enough space to convert the file systems of the remaining drives to xfs. Fingers crossed. 3 hours ago, bjp999 said: Your server contains a disk controller (e.g., SASLP, SASLP2) based on a Marvell chip. The Marvell chips contain a defect that can cause drives to drop offline, parity errors, and even data corruption. (unRAID can't fix a controller chip.) Consider a replacement controller like the LSI SAS9201-8i, LSI SAS9211-8i, IBM M1015, or Dell H310. Read this post https://forums.lime-technology.com/topic/39003-marvell-disk-controller-chipsets-and-virtualizationfor more information on the problem and potential workarounds. If you are not experiencing problems, you may be able to safely ignore this warning, but educate yourself to make that determination. That's good information to have. I believe that I need to get a new SATA controller anyways as I'm almost out of ports on this one (if I recall correctly). I'll look into a different make/model when I do. Thank you.
July 29, 20178 yr Community Expert 32 minutes ago, Netbug said: Ok. I think I understand now. Mostly. I went and bought a 3TB drive and just replaced disk 4. It is now rebuilding that drive. I assume, since it's doing a rebuild, it's going to put it back to reiserfs, which is annoying, but unavoidable at this point. I have 2 4TB drives on the way that should arrive on Wednesday. When they arrive, I'll replace the smallest drive in the array with one of them, and replace the parity drive (using the instructions you linked) with the other one. That SHOULD give me enough room to use unBALANCE and free up enough space to convert the file systems of the remaining drives to xfs. Fingers crossed. That's good information to have. I believe that I need to get a new SATA controller anyways as I'm almost out of ports on this one (if I recall correctly). I'll look into a different make/model when I do. Thank you. Note that you will have to replace the parity drive first with a 4TB drive as you can never have a data disk that is larger than the parity drive. Once that has been done successfully you can add the new 4TB data drive.
July 30, 20178 yr Author I replaced the bad drive (Disk 4), and ran the parity check. It said it was repairing (left it to do so overnight). But I'm still showing the disk as "unmountable" What did I do wrong? How do I rebuild that disk?
July 30, 20178 yr Author 4 minutes ago, johnnie.black said: That's normal, when the rebuild finishes run reiserfsck again. Okie. Running that again now with the --check flag first.
Archived
This topic is now archived and is closed to further replies.