mattiapsu

Members
  • Posts

    17
  • Joined

  • Last visited

Everything posted by mattiapsu

  1. Quick update on this... I had another read error, so I took the server down, and I rebuilt into the new case. I replaced with as many new cables I had, and I'm still using the same power splitter off the molex power cable (I'm using some shucked drives). I'm back up and running, added 2 new drives, and added 1 to the array successfully. It's been up for 2 days with no issues so far. So in the end, I had to: 1) reload my flash drive from backup 2) replace / reconnect SATA and power cables 3) rebuild an array drive 4) ran scrub on my cache ssd 5) delete and replace my docker and libvert image as these were corrupted along the way. So far so good. Thanks for the help, @JorgeB. I'm sure I'll be back in the future.
  2. I started the rebuild and got horrible speeds, so I stopped immediately. I replaced the parity cable and reconnected disk 1 at the drive and MB. I got about 25 ATA errors in the first minute, but nothing since and the speeds have averaged about 140MB/sec for the rebuild. Would the errors be any indication of drive integrity or just purely connection? I see some 'slow to respond' messages as well, but don't know if that's just my hardware, as I started this for fun, and never looked for top specs. I'll plan on replacing more cables when I move cases and add disks in the next month (or if more issues pop up, I'll do sooner). Any other words of wisdom? Thanks for getting me this far. syslog.txt oldmain-diagnostics-20240330-0907.zip
  3. Thanks, I'll get those cables replaced hopefully soon. Probably when I move to a new case, not a lot of free time right now. Is parity at risk with those ATA errors? The Disk is back up. I can read the emulated contents. It looks good from a file standpoint. There's a few files in the lost+found, not sure what those might be, but only a few, not critical for me. Sorry, don't really know what next steps are. I found your other posts on rebuilding the drive... I assume I'll just follow those steps. oldmain-diagnostics-20240329-1516.zip oldmain-syslog-20240329-1917.zip
  4. I ran through GUI... make sure I did it correctly. To repair, it's running Check without -n and in my case with the -L argument? I can follow the article to run manually if that's next.
  5. Still have the red x and unmountable message across Size-Used-Free. Disk log: Mar 29 14:27:07 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:27:11 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:27:12 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:27:13 OldMain kernel: ata5.00: configured for UDMA/133 Mar 29 14:32:07 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:32:11 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:32:12 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:32:13 OldMain kernel: ata5.00: configured for UDMA/133 Mar 29 14:35:15 OldMain emhttpd: read SMART /dev/sdf Mar 29 14:37:36 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:37:40 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:37:41 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:37:42 OldMain kernel: ata5.00: configured for UDMA/133 Mar 29 14:37:47 OldMain emhttpd: WDC_WD80EMAZ-00WJTA0_7HKHX2BJ (sdf) 512 15628053168 Mar 29 14:37:47 OldMain kernel: mdcmd (2): import 1 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7HKHX2BJ Mar 29 14:37:47 OldMain kernel: md: import disk1: (sdf) WDC_WD80EMAZ-00WJTA0_7HKHX2BJ size: 7814026532 Mar 29 14:37:53 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:37:57 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:37:58 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:37:59 OldMain kernel: ata5.00: configured for UDMA/133 Mar 29 14:37:59 OldMain emhttpd: read SMART /dev/sdf Mar 29 14:40:49 OldMain emhttpd: shcmd (1772): echo 128 > /sys/block/sdf/queue/nr_requests Mar 29 14:49:20 OldMain emhttpd: read SMART /dev/sdf Mar 29 14:49:22 OldMain emhttpd: WDC_WD80EMAZ-00WJTA0_7HKHX2BJ (sdf) 512 15628053168 Mar 29 14:49:22 OldMain kernel: mdcmd (2): import 1 sdf 64 7814026532 0 WDC_WD80EMAZ-00WJTA0_7HKHX2BJ Mar 29 14:49:22 OldMain kernel: md: import disk1: (sdf) WDC_WD80EMAZ-00WJTA0_7HKHX2BJ size: 7814026532 Mar 29 14:49:22 OldMain emhttpd: read SMART /dev/sdf Mar 29 14:49:53 OldMain emhttpd: shcmd (1848): echo 128 > /sys/block/sdf/queue/nr_requests Mar 29 14:54:23 OldMain kernel: ata5: link is slow to respond, please be patient (ready=0) Mar 29 14:54:27 OldMain kernel: ata5: COMRESET failed (errno=-16) Mar 29 14:54:28 OldMain kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 29 14:54:29 OldMain kernel: ata5.00: configured for UDMA/133 oldmain-diagnostics-20240329-1450.zip oldmain-syslog-20240329-1851.zip
  6. Ran the check, without -n, then with -L as instructed. Output is below. It appears that it completed successfully. However, when I start the array (non-maintenance), the drive is still showing unmountable. Again, much appreciated for the support. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 5 - agno = 2 - agno = 4 - agno = 6 - agno = 7 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:976460) is ahead of log (1:2). Format log to cycle 4. done
  7. And the SMART report on the drive. oldmain-smart-20240329-1314.zip
  8. Looks like it's not mounting... unmountable / unsupported or no file system oldmain-diagnostics-20240329-1307.zip oldmain-syslog-20240329-1708.zip
  9. Back up with a flash restore to the same USB. Disk 1 emulated. Logs and diagnostics attached. I did not attempt to start the array. syslog.txt oldmain-diagnostics-20240329-1245.zip
  10. This topic is on the forum a few times, and most result in a new USB. However, my problems started last night and have cascaded to this. I wanted to get help before I start with the USB. Logs are attached and description of events below. 1) Last night, a few of my docker containers were not working, I thought it was due to bad update, and I was able to roll a couple back and they were working. 2) This morning at 7:42am - something happened and the server became unresponsive (in the logs) 3) Performed a bad shutdown - power button hold 4) I would normally start back up and see what I've got, but in prep for a new case and additional drives I opened up the case to take inventory, took some pics, closed back up and started up 5) Upon restart, Disk 1 had UDMA CRC errors and went offline - as this can be connection issue, I did a clean shutdown and made sure all the connections were good. 6) Put back up and started up, and now getting kernel panic, not syncing VFS, unable to mount root fs Note: I know I do have an error on my cache drive, that I have not formatted, and I had to rebuild a VM when first errored, but has been fine since deleting the corrupted files. Haven't taken the time to move everything off and reformat to hopefully correct. Any help is appreciated as I'm totally down now. unraid-2.log
  11. I finally took the time to pull the box down and mess with power cables. 1) plugged new power cable into drive, started up.... showed as drive missing; opened back up 2) plugged old power cable back into drive, started up... showed up with zero errors and running parity check happily I'll monitor, but does this sound like, 1) a random occurrence, slightly bad cable connection 2) a power cable or PSU issue 3) a drive issue? Appreciate your experience here. I don't have spare PSU or drives around, so depending on your thoughts may get some backups. Thanks
  12. I swapped sata cables around (3 way swap between 3 HDD). The ATA error stuck with the parity drive but moved from ata1 to ata2. I didn't mess with any power cables at this point. oldmain-syslog-20220524-0241.zip
  13. A couple of days ago I checked my cables, made sure everything was well connected. Ran a parity check after I closed back up, and it started slow, but returned to normal parity check speed. However, it noted over 50,000 errors once finished and said parity was valid. Errors did show up and are attached in syslog attachment. Started a check today to see how it would react, and started slow and picked up parity errors immediately, I canceled it without seeing if it sped up. Similar errors being thrown today as a couple of days ago. Could it be a power supply issue? I'm using a 400W supply (started with modest ambitions, but I've added more hardware). I can change out data cables as well to check those, but it may be very apparent to you that the PSU is undersized. Quick snapshot of hardware - 2 sticks RAM - 1080 Ti GPU used in VM CCTV only - USB 3.0 PCIe card - Sata PCIe expansion card - 3 HDD (2 data, 1 parity) - 2 SSD (1 appdata cache, 1 cache pool) - 2 USB HDD (old laptop drives) - 1 as unassigned drive, 1 cache pool oldmain-syslog-20220515-1852.zip
  14. Thanks, will take me a couple of days to get to it. But I'll report back.
  15. I did confirm that I installed the new hardware and then ran a parity check that ran normally with no errors found. The only thing I did after the hardware/parity check and before this issue was remove some docker containers (photoprism, mariadb), delete their appdata folder, and then reinstall them. I wanted fresh containers to start over. Thanks. oldmain-diagnostics-20220508-1512.zip
  16. The parity check started last evening, and was only running at 1MB/s. I checked to see if anything was going on with the arrays, docker, vms, but nothing I could see there. I rebooted and attached two log files. One just after startup, one catching errors after parity started. Reading around I see solutions for these errors as simple as a bad SATA cable, or power issue, or worse. Looking at the disc logs, I can see that the error occurs on my parity disk. One other thing to add, I recently added a SATA pci card, and a second cache drive (running 1 cache drive for appdata, and a cache pool - 2drives for downloads). That was working fine for about 2 weeks from what I could tell, and pretty sure ran a successful parity check since then. Before I start opening the box and switching cables, I wanted to get some expert advice. Thanks in advance. oldmain-syslog-20220507-1416.zip oldmain-syslog-20220507-1148.zip
  17. I've been attempting to install wordpress, and the install.php file just downloads when I attempt to open. I can open the readme.html file, but any .php or "non-existing file" results in a download. "Non-existing file" is just referring to any address or file that doesn't exist in the www folder. Everything I've troubleshot makes me think it's php not running correctly but that I have no clue about. I've found various "fixes" across the web, but nothing has worked. Thanks in advance. 1) I own the domain, call it jm.us (this is not my default domain in the container, which is mfam.com) 2) I've listed *.jm.us as EXTRA DOMAIN in container 3) I have a jmus.conf file with server info pointing to subfolder in www 4) I can reach a standard html file by reverse proxy no problem (mfam.com to index.html works as well) 5) I've untar wordpress archive on computer and ssh upload to www sub dir 6) Attempting to reach install.php results in download of file 7) Also, I have tried to update the wp-config.php manually with db info, but still nothing Note: some log and conf modified for my illusion of privacy SWAG log [cont-init.d] 10-adduser: exited 0. [cont-init.d] 20-config: executing... [cont-init.d] 20-config: exited 0. [cont-init.d] 30-keygen: executing... using keys found in /config/keys [cont-init.d] 30-keygen: exited 0. [cont-init.d] 50-config: executing... Variables set: PUID=99 PGID=100 TZ=America/New_York URL=mfam.com SUBDOMAINS=wildcard EXTRA_DOMAINS=*.jm.us ONLY_SUBDOMAINS=true VALIDATION=dns CERTPROVIDER= DNSPLUGIN=cloudflare [email protected] STAGING=false Using Let's Encrypt as the cert provider SUBDOMAINS entered, processing Wildcard cert for only the subdomains of mfam.com will be requested EXTRA_DOMAINS entered, processing Extra domains processed are: -d *.jm.us E-mail address entered: [email protected] dns validation via cloudflare plugin is selected Certificate exists; parameters unchanged; starting nginx Starting 2019/12/30, GeoIP2 databases require personal license key to download. Please retrieve a free license key from MaxMind, and add a new env variable "MAXMINDDB_LICENSE_KEY", set to your license key. [cont-init.d] 50-config: exited 0. [cont-init.d] 60-renew: executing... The cert does not expire within the next day. Letting the cron script handle the renewal attempts overnight (2:08am). [cont-init.d] 60-renew: exited 0. [cont-init.d] 70-templates: executing... **** The following nginx confs have different version dates than the defaults that are shipped. **** **** This may be due to user customization or an update to the defaults. **** **** To update them to the latest defaults shipped within the image, delete these files and restart the container. **** **** If they are user customized, check the date version at the top and compare to the upstream changelog via the link. **** /config/nginx/ssl.conf /config/nginx/site-confs/default /config/nginx/proxy.conf /config/nginx/nginx.conf /config/nginx/authelia-server.conf /config/nginx/authelia-location.conf **** The following reverse proxy confs have different version dates than the samples that are shipped. **** **** This may be due to user customization or an update to the samples. **** **** You should compare them to the samples in the same folder to make sure you have the latest updates. **** /config/nginx/proxy-confs/****.subdomain.conf /config/nginx/proxy-confs/****.subdomain.conf /config/nginx/proxy-confs/****.subdomain.conf [cont-init.d] 70-templates: exited 0. [cont-init.d] 90-custom-folders: executing... [cont-init.d] 90-custom-folders: exited 0. [cont-init.d] 99-custom-files: executing... [custom-init] no custom files found exiting... [cont-init.d] 99-custom-files: exited 0. [cont-init.d] done. [services.d] starting services [services.d] done. Server ready default # redirect all traffic to https server { listen 80; listen [::]:80; server_name mfam.com; return 301 https://$host$request_uri; } # main server block server { listen 443 ssl http2; listen [::]:443 ssl http2; root /config/www; index index.html index.htm index.php; server_name mfam.com; # enable subfolder method reverse proxy confs include /config/nginx/proxy-confs/*.subfolder.conf; # all ssl related config moved to ssl.conf include /config/nginx/ssl.conf; # enable for ldap auth #include /config/nginx/ldap.conf; # enable for Authelia #include /config/nginx/authelia-server.conf; # enable for geo blocking # See /config/nginx/geoip2.conf for more information. #if ($allowed_country = no) { #return 444; #} client_max_body_size 0; location / { try_files $uri $uri/ /index.html /index.php?$args =404; } location ~ \.php$ { fastcgi_split_path_info ^(.+\.php)(/.+)$; fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; include /etc/nginx/fastcgi_params; } fastcgi_buffer_size 4K; #from a fix online fastcgi_buffers 64 4k; #from a fix online } # enable subdomain method reverse proxy confs include /config/nginx/proxy-confs/*.subdomain.conf; # enable proxy cache for auth proxy_cache_path cache/ keys_zone=auth_cache:10m; jmus.conf # redirect all traffic to https server { listen 80; listen [::]:80; server_name *.jm.us; return 301 https://$host$request_uri; } # main server block server { listen 443 ssl http2; listen [::]:443 ssl http2; root /config/www/wp_jmus/; index index.html index.htm index.php; server_name *.jm.us; # all ssl related config moved to ssl.conf include /config/nginx/ssl.conf; }