Apeiron

Members
  • Posts

    17
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Apeiron's Achievements

Noob

Noob (1/14)

1

Reputation

  1. Thank you for the advice. Makes good sense. I always laughed jokingly at people who ran rm -rf / as root, and now I'm practically guilty of it. Lesson learned. Thanks.
  2. I was performing data copy and cleanup using disk to disk transfers. I scrolled to the wrong command and mistakenly ran rm -rf /mnt/disk4 I immediately saw the error and killed the process. Using a disk usage comparison, and modified directory times, it looks like I deleted about 100GB of data from one of the folders on disk4. Reviewing the forums and FAQ, I saw info for ReiserFS and the "general help" link is broken on the FAQ. My data drive is using encrypted XFS. I fortunately did not delete anything critical and I have a backup. No other changes to the data filesystem have taken place. I'm looking for advice on next steps. Should I attempt a recovery operation, or restore from backup? I think restoring would guarantee 100% recovery, but a recovery is probably easier and faster. I'd be ok with a <100% recovery operation if most of the data could be recovered. I have experience recovering NTFS drives for windows using testdisk on linux, but never with XFS, or an encrypted drive. solidsnake-diagnostics-20220802-1026.zip
  3. I finally received my second Sata controller. I used the same one I had previously, SI-PEX40139, so now I have 2 of these cards installed. I have 8 total hard drives, and split them to 4 drives on each card. Nothing is connected via the mobo Sata ports. I have (1) 1TB M.2 drive installed as cache. Lastly, the Unraid USB. After getting the server booted up, I added the new drive as a 7th data drive, and started the array in maintenance mode. I immediately started getting I/O buffer errors again. These started popping up without any load on the system. I have not started the formatting of the new drive yet. I've now replaced the PSU, all Sata cables, added an additional sata pcie card, added cooling fans and cleaned everything. The drive I am adding is new, and no errors were detected during the preclear. Jul 24 19:09:26 solidsnake emhttpd: Opening encrypted volumes... Jul 24 19:09:26 solidsnake kernel: Buffer I/O error on dev md7, logical block 0, async page read Suggestions for next steps or isolating the issue? Thank you. solidsnake-diagnostics-20220724-1921.zip
  4. It seems I was premature. I'm not getting a CRC error, but a different error now. I had gotten full parity back with no apparent issues. So I swapped out a drive, and started the data rebuild process again. Several hours in, I received the following errors and then it stabilized and continued. The rebuild is still running right now. I've been looking into this error using keywords AMD-Vi and IO_PAGE_FAULT, and there are a lot of interesting results. I've seen some comments around IOMMU settings, PCIexpress lanes, BIOS upgrades and kernel bugs. Notably, several comments were found related to my system, Ryzen 2700X and ASMedia X470, having SATA issues with some configurations. Jul 16 01:00:04 solidsnake kernel: ahci 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x407d4db010608040 flags=0x0030] Jul 16 01:00:35 solidsnake kernel: ata2.00: exception Emask 0x0 SAct 0x80fe7fc0 SErr 0x0 action 0x6 frozen Followed by: Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/20:30:38:49:7d/02:00:f9:01:00/40 tag 6 ncq dma 278528 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/58:38:58:4b:7d/02:00:f9:01:00/40 tag 7 ncq dma 307200 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/10:40:b0:4d:7d/02:00:f9:01:00/40 tag 8 ncq dma 270336 in Jul 16 01:00:35 solidsnake kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/10:48:c0:4f:7d/02:00:f9:01:00/40 tag 9 ncq dma 270336 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/30:50:d0:51:7d/01:00:f9:01:00/40 tag 10 ncq dma 155648 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/f8:58:00:53:7d/01:00:f9:01:00/40 tag 11 ncq dma 258048 in Jul 16 01:00:35 solidsnake kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/08:60:f8:54:7d/01:00:f9:01:00/40 tag 12 ncq dma 135168 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/c0:68:00:56:7d/01:00:f9:01:00/40 tag 13 ncq dma 229376 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/40:70:c0:57:7d/01:00:f9:01:00/40 tag 14 ncq dma 163840 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/d8:88:00:59:7d/01:00:f9:01:00/40 tag 17 ncq dma 241664 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/a0:90:d8:5a:7d/01:00:f9:01:00/40 tag 18 ncq dma 212992 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/20:98:78:5c:7d/01:00:f9:01:00/40 tag 19 ncq dma 147456 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/e0:a0:98:5d:7d/00:00:f9:01:00/40 tag 20 ncq dma 114688 in Jul 16 01:00:35 solidsnake kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/d8:a8:78:5e:7d/02:00:f9:01:00/40 tag 21 ncq dma 372736 in Jul 16 01:00:35 solidsnake kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/68:b0:50:61:7d/02:00:f9:01:00/40 tag 22 ncq dma 315392 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/60:b8:b8:63:7d/02:00:f9:01:00/40 tag 23 ncq dma 311296 in Jul 16 01:00:35 solidsnake kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 16 01:00:35 solidsnake kernel: ata2.00: cmd 60/20:f8:18:66:7d/03:00:f9:01:00/40 tag 31 ncq dma 409600 in Jul 16 01:00:35 solidsnake kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 16 01:00:35 solidsnake kernel: ata2.00: status: { DRDY } Jul 16 01:00:35 solidsnake kernel: ata2: hard resetting link Jul 16 01:00:40 solidsnake kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 16 01:00:40 solidsnake kernel: ata2.00: configured for UDMA/133 Jul 16 01:00:40 solidsnake kernel: ata2.00: device reported invalid CHS sector 0 Jul 16 01:00:40 solidsnake kernel: ata2.00: device reported invalid CHS sector 0 Jul 16 01:00:40 solidsnake kernel: ata2.00: device reported invalid CHS sector 0 Jul 16 01:00:40 solidsnake kernel: ata2.00: device reported invalid CHS sector 0 Jul 16 01:00:40 solidsnake kernel: ata2.00: device reported invalid CHS sector 0 Jul 16 01:00:40 solidsnake kernel: ata2: EH complete My next NAS build is definitely going to be a real server. Thoughts on these errors? Note: can't seem to unmark this as solved, so I can start a new thread if necessary. Thank you. solidsnake-diagnostics-20220716-0137.zip
  5. The results won't be too scientific, as I made multiple changes at once. I have parity back, with the new 16TB drive. No other disk changes have been made yet. I ran SMART short self-tests on all disks which passed. It does not appear that the CRC error rate increased since the first instance of the problem. Looking at some historical logs, I noticed my UPS has been engaging for a second or two somewhat regularly (once every few days to once daily). I attribute the fluctuations to the local power service constantly adjusting loads during this summer heat. I don't have a correlating UPS log at the time of the read errors, so probably just an anecdote, but maybe not outside of realm of possibility that I had a power issue. Added several more cooling fans I had laying around and cleaned the case. The case wasn't very dirty, but hard disk temps would go ~41+ during parity checks. Temps dropped to 36 max during latest parity check. Split 2 SATA cables from the motherboard to PCI Syba SI-PEX40139 SATA card. Currently 4x SATA, 3x PCI, previously 6/1. Matched all SATA cables by manufacturer and connector Arranged cables to reduce cable touching on long parallel SATA runs Updated PSU - HX750 On boot up, a read check was executed on all data disks, while the parity was still disabled. After the read check succeeded, I started the parity rebuild, which was also successful. Next step is to replace one of the data drives with the old parity drive, and then add a new drive to the array. Finally, posting the diagnostics again, with new smart reports, in case anything may look awry. Thanks for the help. solidsnake-diagnostics-20220715-1541.zip
  6. Ordered some stuff, should be here in a couple days. Will table this until the next update. Thank you for your response.
  7. Thank you for your response. I have two questions. 1) Should I attempt to get the array back to a known working state? That would involve removing the new drive, and putting the old parity drive back in place. Once I'm sure everything is working without errors, I can try to swap again. The alternative would be to just swap the cables, and reattempt using the new drive. 2) Excuse my ignorance, but if it was a single failed cable, that makes sense to me. Having multiple failures at once doesn't lead me to believe that it is only cables. That's just my intuition talking. Is there any other testing I could perform to help try to isolate any issue? I have a large pile of used sata cables, but I'd order a complete set of new ones if I can confirm that is the problem. edit: ended up ordering a full set of new sata cables anyways. Cheap enough to rule it out as a factor.
  8. I'm working on adding 2 new 16TB disks to my array. I ran the preclear script on both new drives without issue. One drive is slated to upgrade parity from 8TB to 16TB. The other drive will be added as additional data. I installed one of the drives. After booting up, I swapped the existing parity drive and the new drive, and started the array/rebuild. The existing parity drive is in unassigned devices. I let it run and went to bed. I got up to a long string of unraid notifications regarding read errors. 3 existing drives are showing UDMA CRC errors in SMART, with these amounts: 97, 232, 1. The parity drive shows 1024 errors in the unraid array web interface. The parity drive disk log shows numerous errors, starting with: Jul 12 01:42:29 solidsnake kernel: ata9.00: exception Emask 0x10 SAct 0x400000 SErr 0x480000 action 0x6 frozen Jul 12 01:42:29 solidsnake kernel: ata9.00: irq_stat 0x08000000, interface fatal error Jul 12 01:42:29 solidsnake kernel: ata9: SError: { 10B8B Handshk } Jul 12 01:42:29 solidsnake kernel: ata9.00: failed command: WRITE FPDMA QUEUED Jul 12 01:42:29 solidsnake kernel: ata9.00: cmd 61/50:b0:a8:76:08/03:00:c0:00:00/40 tag 22 ncq dma 434176 out Jul 12 01:42:29 solidsnake kernel: ata9.00: status: { DRDY } Jul 12 01:42:29 solidsnake kernel: ata9: hard resetting link Jul 12 01:42:39 solidsnake kernel: ata9: softreset failed (1st FIS failed) Jul 12 01:42:39 solidsnake kernel: ata9: hard resetting link Jul 12 01:42:49 solidsnake kernel: ata9: softreset failed (1st FIS failed) Jul 12 01:42:49 solidsnake kernel: ata9: hard resetting link Jul 12 01:43:24 solidsnake kernel: ata9: softreset failed (1st FIS failed) Jul 12 01:43:24 solidsnake kernel: ata9: limiting SATA link speed to 3.0 Gbps Jul 12 01:43:24 solidsnake kernel: ata9: hard resetting link Jul 12 01:43:29 solidsnake kernel: ata9: softreset failed (1st FIS failed) Jul 12 01:43:29 solidsnake kernel: ata9: reset failed, giving up Jul 12 01:43:29 solidsnake kernel: ata9.00: disabled Jul 12 01:43:29 solidsnake kernel: ata9: EH complete Jul 12 01:43:29 solidsnake kernel: sd 9:0:0:0: [sdh] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Jul 12 01:43:29 solidsnake kernel: sd 9:0:0:0: [sdh] tag#2 CDB: opcode=0x8a 8a 00 00 00 00 00 c0 08 87 88 00 00 04 a0 00 00 Jul 12 01:43:29 solidsnake kernel: blk_update_request: I/O error, dev sdh, sector 3221784456 op 0x1:(WRITE) flags 0x0 phys_seg 148 prio class 0 After some initial research, the CRC errors are indicative of a physical failure, like a cable. Because I had several drives blow up at once, I'm thinking that maybe I overloaded the sata controller? I have syslog running, and it looks like the very first error was on ata4, and then everything started blowing up from there. I'm shutting down the server for now. My recovery idea is to just swap the existing parity drive back in place, removing the new drive, and let the data rebuild to the previous good state. However, I'm apprehensive because I suspect this is not a drive problem, but a motherboard problem. I've never had an issue before today. This system has been live for about two years, with multiple drive upgrades happening in the past. I'd appreciate any and all advice on next steps. solidsnake-diagnostics-20220712-0649.zip
  9. Thanks. I will report back with any BIOS findings tomorrow. I included the additional syslog because I had started the array after pulling diag, just to provide as much complete info as possible. This is my first crash, but since it happened once, it is bound to happen again. I'll get syslog setup right away.
  10. Thanks. Powered it down and back up. Came right up. I started the array in maintenance mode and started a parity check which should be done by tomorrow. As soon as the server came up and before starting the array, I pulled the diagnostics. After starting the array and parity check, I pulled the syslog. Let me know if you see anything out of order. Nothing jumped out at me as problematic. I'll get a syslog server setup for unRaid. Thanks for the idea. solidsnake-diagnostics-20201005-1244-POST-UNCLEAN.zip solidsnake-syslog-20201005-1650-PARITY-CHECK-STARTED.zip
  11. When I got up this morning, my UnRaid server was inaccessible. It was working before I went to bed last night, and has been working flawlessly for about 6 months. Plugging in an HDMI cable shows just a black screen. I plugged in two different keyboards, and neither will light up the Num Lock key, and appear to not be getting initialized. I cannot ping either of the 2 network interfaces on the server. nbtstat -A and nmap cannot find the server. My docker container and VM running on the server are also both inaccessible. It appears to be that the UnRaid server is completely locked up. Before I do anything to the server (like a hard reboot), I wanted to ask for advice on the next steps. Because people have experienced so many different issues with unresponsiveness related to UnRaid, I found solutions to my particular issue a bit hard to search for.
  12. There are some forum posts on this topic. Google: "site:unraid.net linux bootable usb" Some people have had success with various methods. Gparted and Syslinux have ways of making a usb bootable. If you do figure out a good method, maybe you can update the wiki. Appears that a Linux method has been requested for a long time.
  13. You mention that you've run Plex Media Server for a couple years, but I'm assuming this Docker is a new install. If you are building a new media database in the Docker, and you have just added new libraries, then Plex is most likely performing its initial scan and database build. If you have very large libraries, this can take days. If you are using the same media library as your old server, you should be able to copy the database over to the new server and save this initialization step. As Hoopster mentioned, you might be able to get further support on how to do this on the Plex forums. Alternatively, if you want a new database, start a new plex server and don't add any libraries on startup. Disable all the content generation options and tasks, and add one library at a time. You'll still have to wait for it to finish, but you might have more control over over the process. Good luck. https://forums.plex.tv/t/official-plex-media-server-docker-images-getting-started/172291