Jump to content

SP67

Members
  • Posts

    69
  • Joined

  • Last visited

Posts posted by SP67

  1. Yeah, but the failing drive was part of a cache pool (I have one SSD for app data and one HDD for torrent downloads). So AFAIK the parity would no have worked in this case.

     

    Copying the data from the old drive was just to avoid having to download what hadn’t already moved to the array. If this is not the proper way to do it, please correct me as I’m still learning.

  2. Ok, so I turned the server down, added a 4 TB disk, moved the contents of the falling drive to the new one and added the new disk to the cache pool. Then I turned down the server again and removed the old drive.
     

    So far so good, everything is going well.

     

    thanks! 

  3. Reported uncorrect has grown from 4 to 10 in less than 24h. The drive is probably on its last leg...

     

    For what it's worth, I've found that this drive is from the 7200.14 series from Seagate, which had early-death problems that were supposedly fixed with a later firmware. I never saw this update so the drive has been using the factory firmware since I bought it. 

  4. Hi, 

     

    This morning the server returned some SMART errors for a 2 TB I use as a torrent download cache before moving data to the array (I've read that this reduces wear on the array). 

     

    The errors are:

     

    187 Reported uncorrect 0x0032 096 096 000 Old age Always Never 4

    197 Current pending sector 0x0012 100 100 000 Old age Always Never 8

    198 Offline uncorrectable 0x0010 100 100 000 Old age Offline Never 8

     

    I've read online that I might be able to ignore the errors as the drive will just stop using those sectors, but there didn't seem to be much consensus about it.

     

    Any suggestion?

    Thanks

     

    Full smart report:

     

    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-Unraid] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Seagate Barracuda 7200.14 (AF)
    Device Model:     ST2000DM001-9YN164
    Serial Number:    
    LU WWN Device Id: 
    Firmware Version: CC4B
    User Capacity:    2,000,398,934,016 bytes [2.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    7200 rpm
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS T13/1699-D revision 4
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Thu Mar 10 12:32:32 2022 CET
    
    ==> WARNING: A firmware update for this drive may be available,
    see the following Seagate web pages:
    http://knowledge.seagate.com/articles/en_US/FAQ/207931en
    http://knowledge.seagate.com/articles/en_US/FAQ/223651en
    
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    AAM feature is:   Unavailable
    APM level is:     128 (minimum power consumption without standby)
    Rd look-ahead is: Enabled
    Write cache is:   Enabled
    DSN feature is:   Unavailable
    ATA Security is:  Disabled, frozen [SEC2]
    Wt Cache Reorder: Unavailable
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)	Offline data collection activity
    					was never started.
    					Auto Offline Data Collection: Disabled.
    Self-test execution status:      ( 121)	The previous self-test completed having
    					the read element of the test failed.
    Total time to complete Offline 
    data collection: 		(  592) seconds.
    Offline data collection
    capabilities: 			 (0x73) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					No Offline surface scan supported.
    					Self-test supported.
    					Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   1) minutes.
    Extended self-test routine
    recommended polling time: 	 ( 247) minutes.
    Conveyance self-test routine
    recommended polling time: 	 (   2) minutes.
    SCT capabilities: 	       (0x3085)	SCT Status supported.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
      1 Raw_Read_Error_Rate     POSR--   118   099   006    -    170591376
      3 Spin_Up_Time            PO----   093   092   000    -    0
      4 Start_Stop_Count        -O--CK   097   097   020    -    3850
      5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
      7 Seek_Error_Rate         POSR--   076   060   030    -    45291769
      9 Power_On_Hours          -O--CK   091   091   000    -    8254
     10 Spin_Retry_Count        PO--C-   100   100   097    -    0
     12 Power_Cycle_Count       -O--CK   097   097   020    -    3531
    183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
    184 End-to-End_Error        -O--CK   100   100   099    -    0
    187 Reported_Uncorrect      -O--CK   096   096   000    -    4
    188 Command_Timeout         -O--CK   100   099   000    -    1 3 3
    189 High_Fly_Writes         -O-RCK   096   096   000    -    4
    190 Airflow_Temperature_Cel -O---K   062   051   045    -    38 (Min/Max 21/45 #1)
    191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
    192 Power-Off_Retract_Count -O--CK   100   100   000    -    191
    193 Load_Cycle_Count        -O--CK   001   001   000    -    242985
    194 Temperature_Celsius     -O---K   038   049   000    -    38 (128 0 0 0 0)
    197 Current_Pending_Sector  -O--C-   100   100   000    -    8
    198 Offline_Uncorrectable   ----C-   100   100   000    -    8
    199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
    240 Head_Flying_Hours       ------   100   253   000    -    4183h+31m+57.974s
    241 Total_LBAs_Written      ------   100   253   000    -    183471166860639
    242 Total_LBAs_Read         ------   100   253   000    -    81944486528248
                                ||||||_ K auto-keep
                                |||||__ C event count
                                ||||___ R error rate
                                |||____ S speed/performance
                                ||_____ O updated online
                                |______ P prefailure warning
    
    General Purpose Log Directory Version 1
    SMART           Log Directory Version 1 [multi-sector log support]
    Address    Access  R/W   Size  Description
    0x00       GPL,SL  R/O      1  Log Directory
    0x01           SL  R/O      1  Summary SMART error log
    0x02           SL  R/O      5  Comprehensive SMART error log
    0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
    0x06           SL  R/O      1  SMART self-test log
    0x07       GPL     R/O      1  Extended self-test log
    0x09           SL  R/W      1  Selective self-test log
    0x10       GPL     R/O      1  NCQ Command Error log
    0x11       GPL     R/O      1  SATA Phy Event Counters log
    0x21       GPL     R/O      1  Write stream error log
    0x22       GPL     R/O      1  Read stream error log
    0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
    0xa1       GPL,SL  VS      20  Device vendor specific log
    0xa2       GPL     VS    4496  Device vendor specific log
    0xa8       GPL,SL  VS      20  Device vendor specific log
    0xa9       GPL,SL  VS       1  Device vendor specific log
    0xab       GPL     VS       1  Device vendor specific log
    0xb0       GPL     VS    5067  Device vendor specific log
    0xbd       GPL     VS     512  Device vendor specific log
    0xbe-0xbf  GPL     VS   65535  Device vendor specific log
    0xc0       GPL,SL  VS       1  Device vendor specific log
    0xe0       GPL,SL  R/W      1  SCT Command/Status
    0xe1       GPL,SL  R/W      1  SCT Data Transfer
    
    SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
    Device Error Count: 4
    	CR     = Command Register
    	FEATR  = Features Register
    	COUNT  = Count (was: Sector Count) Register
    	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    	LH     = LBA High (was: Cylinder High) Register    ]   LBA
    	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    	LL     = LBA Low (was: Sector Number) Register     ]
    	DV     = Device (was: Device/Head) Register
    	DC     = Device Control Register
    	ER     = Error register
    	ST     = Status register
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
    SS=sec, and sss=millisec. It "wraps" after 49.710 days.
    
    Error 4 [3] occurred at disk power-on lifetime: 8245 hours (343 days + 13 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER -- ST COUNT  LBA_48  LH LM LL DV DC
      -- -- -- == -- == == == -- -- -- -- --
      40 -- 51 00 00 00 00 2e 36 f7 38 00 00  Error: WP at LBA = 0x2e36f738 = 775354168
    
      Commands leading to the command that caused the error were:
      CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
      -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
      61 00 00 00 08 00 00 3c a1 01 60 40 00  1d+05:44:31.301  WRITE FPDMA QUEUED
      61 00 00 00 08 00 00 6f 00 d0 68 40 00  1d+05:44:31.083  WRITE FPDMA QUEUED
      61 00 00 05 20 00 00 6e f4 25 f8 40 00  1d+05:44:31.081  WRITE FPDMA QUEUED
      61 00 00 00 08 00 00 3c a1 01 58 40 00  1d+05:44:31.081  WRITE FPDMA QUEUED
      61 00 00 00 48 00 00 3c 93 15 10 40 00  1d+05:44:31.081  WRITE FPDMA QUEUED
    
    Error 3 [2] occurred at disk power-on lifetime: 8245 hours (343 days + 13 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER -- ST COUNT  LBA_48  LH LM LL DV DC
      -- -- -- == -- == == == -- -- -- -- --
      40 -- 51 00 00 00 00 2e 36 f7 38 00 00  Error: WP at LBA = 0x2e36f738 = 775354168
    
      Commands leading to the command that caused the error were:
      CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
      -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
      61 00 00 08 40 00 00 3f 2a 19 80 40 00  1d+05:44:28.089  WRITE FPDMA QUEUED
      61 00 00 00 08 00 00 3f 40 d9 a8 40 00  1d+05:44:28.089  WRITE FPDMA QUEUED
      61 00 00 04 60 00 00 3c 93 0b 38 40 00  1d+05:44:28.088  WRITE FPDMA QUEUED
      61 00 00 00 08 00 00 3c a1 01 50 40 00  1d+05:44:28.088  WRITE FPDMA QUEUED
      60 00 00 00 08 00 00 2e 36 f7 38 40 00  1d+05:44:28.085  READ FPDMA QUEUED
    
    Error 2 [1] occurred at disk power-on lifetime: 8245 hours (343 days + 13 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER -- ST COUNT  LBA_48  LH LM LL DV DC
      -- -- -- == -- == == == -- -- -- -- --
      40 -- 51 00 00 00 00 2e 36 f7 38 00 00  Error: WP at LBA = 0x2e36f738 = 775354168
    
      Commands leading to the command that caused the error were:
      CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
      -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
      61 00 00 04 c0 00 00 16 e6 3c 48 40 00  1d+05:44:25.346  WRITE FPDMA QUEUED
      61 00 00 00 08 00 00 17 1d a5 78 40 00  1d+05:44:25.346  WRITE FPDMA QUEUED
      61 00 00 04 00 00 00 6e f4 1a f8 40 00  1d+05:44:25.346  WRITE FPDMA QUEUED
      61 00 00 00 08 00 00 6f 00 d0 58 40 00  1d+05:44:25.346  WRITE FPDMA QUEUED
      60 00 00 00 08 00 00 2e 36 f7 38 40 00  1d+05:44:25.119  READ FPDMA QUEUED
    
    Error 1 [0] occurred at disk power-on lifetime: 8245 hours (343 days + 13 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER -- ST COUNT  LBA_48  LH LM LL DV DC
      -- -- -- == -- == == == -- -- -- -- --
      40 -- 51 00 00 00 00 2e 36 f7 38 00 00  Error: UNC at LBA = 0x2e36f738 = 775354168
    
      Commands leading to the command that caused the error were:
      CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
      -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
      60 00 00 08 78 00 00 2e 37 1f 30 40 00  1d+05:44:20.856  READ FPDMA QUEUED
      60 00 00 0a 00 00 00 2e 37 15 30 40 00  1d+05:44:20.855  READ FPDMA QUEUED
      60 00 00 00 08 00 00 2e 3f 31 20 40 00  1d+05:44:20.855  READ FPDMA QUEUED
      60 00 00 03 80 00 00 2e 37 11 a8 40 00  1d+05:44:20.855  READ FPDMA QUEUED
      60 00 00 0a 00 00 00 2e 37 07 a8 40 00  1d+05:44:20.855  READ FPDMA QUEUED
    
    SMART Extended Self-test Log Version: 1 (1 sectors)
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed: read failure       90%      8249         775354168
    # 2  Short offline       Completed without error       00%      1536         -
    # 3  Short offline       Completed without error       00%       641         -
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    SCT Status Version:                  3
    SCT Version (vendor specific):       522 (0x020a)
    Device State:                        Active (0)
    Current Temperature:                    37 Celsius
    Power Cycle Min/Max Temperature:     21/45 Celsius
    Lifetime    Min/Max Temperature:      5/49 Celsius
    Under/Over Temperature Limit Count:   0/0
    
    SCT Data Table command not supported
    
    SCT Error Recovery Control command not supported
    
    Device Statistics (GP/SMART Log 0x04) not supported
    
    Pending Defects log (GP Log 0x0c) not supported
    
    SATA Phy Event Counters (GP Log 0x11)
    ID      Size     Value  Description
    0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
    0x0001  2            0  Command failed due to ICRC error
    0x0003  2            0  R_ERR response for device-to-host data FIS
    0x0004  2            0  R_ERR response for host-to-device data FIS
    0x0006  2            0  R_ERR response for device-to-host non-data FIS
    0x0007  2            0  R_ERR response for host-to-device non-data FIS

     

  5. Ok so I think my usb drive just died. Here what I did, following this guide: https://wiki.unraid.net/Console#To_cleanly_Stop_the_array_from_the_command_line

     

    Stopped mover

    Stopped docker

    Stopped Samba

    umount /dev/md1

    umount /dev/md2 (only 2 disk + parity in the array)

    /root/mdcmd stop

    reboot

     

    Now the server refuses to boot. I do have a backup of the usb drive using the my server plugin, but I’m not sure how to proceed.

     

     

    EF519811-23D2-4327-8EB3-02E5E90B60F4.thumb.jpeg.5f9ee503e3de09e0225572ad4250d163.jpeg

  6. Hi, 

     

    Since a few days ago, I'm getting very slow performance (close to unusable). Access via SMB (using VPN) has become really slow (down from a couple MB/s to 50-100 kB/s), using the web interface is quite painful, etc.

     

    I've checked the system log as I found the following error repeatedly: TCP: out of memory -- consider tuning tcp_mem.

     

    After searching online I have not found anything of help. I attach some more lines of the log in case something can be of help.

     

    Thanks!

     

    Feb 20 16:48:37 NAS kernel: docker0: port 11(veth7126499) entered disabled state
    Feb 20 16:48:37 NAS kernel: veth31a3898: renamed from eth0
    Feb 20 16:48:37 NAS avahi-daemon[1742]: Interface veth7126499.IPv6 no longer relevant for mDNS.
    Feb 20 16:48:37 NAS avahi-daemon[1742]: Leaving mDNS multicast group on interface veth7126499.IPv6 with address fe80::5036:b8ff:fe52:19d.
    Feb 20 16:48:37 NAS kernel: docker0: port 11(veth7126499) entered disabled state
    Feb 20 16:48:37 NAS kernel: device veth7126499 left promiscuous mode
    Feb 20 16:48:37 NAS kernel: docker0: port 11(veth7126499) entered disabled state
    Feb 20 16:48:37 NAS avahi-daemon[1742]: Withdrawing address record for fe80::5036:b8ff:fe52:19d on veth7126499.
    Feb 20 16:50:23 NAS kernel: br-926959b88699: port 2(veth1cd4cd0) entered disabled state
    Feb 20 16:50:23 NAS kernel: veth75ec41e: renamed from eth0
    Feb 20 16:50:24 NAS avahi-daemon[1742]: Interface veth1cd4cd0.IPv6 no longer relevant for mDNS.
    Feb 20 16:50:24 NAS avahi-daemon[1742]: Leaving mDNS multicast group on interface veth1cd4cd0.IPv6 with address fe80::4470:62ff:fe0c:e210.
    Feb 20 16:50:24 NAS kernel: br-926959b88699: port 2(veth1cd4cd0) entered disabled state
    Feb 20 16:50:24 NAS kernel: device veth1cd4cd0 left promiscuous mode
    Feb 20 16:50:24 NAS kernel: br-926959b88699: port 2(veth1cd4cd0) entered disabled state
    Feb 20 16:50:24 NAS avahi-daemon[1742]: Withdrawing address record for fe80::4470:62ff:fe0c:e210 on veth1cd4cd0.
    Feb 20 16:53:46 NAS kernel: br-926959b88699: port 1(veth9105c65) entered disabled state
    Feb 20 16:53:46 NAS kernel: veth945447c: renamed from eth0
    Feb 20 16:53:46 NAS avahi-daemon[1742]: Interface veth9105c65.IPv6 no longer relevant for mDNS.
    Feb 20 16:53:46 NAS avahi-daemon[1742]: Leaving mDNS multicast group on interface veth9105c65.IPv6 with address fe80::64e4:56ff:fe1f:baa6.
    Feb 20 16:53:46 NAS kernel: br-926959b88699: port 1(veth9105c65) entered disabled state
    Feb 20 16:53:46 NAS kernel: device veth9105c65 left promiscuous mode
    Feb 20 16:53:46 NAS kernel: br-926959b88699: port 1(veth9105c65) entered disabled state
    Feb 20 16:53:46 NAS avahi-daemon[1742]: Withdrawing address record for fe80::64e4:56ff:fe1f:baa6 on veth9105c65.
    Feb 20 16:54:14 NAS kernel: docker0: port 12(veth33f6111) entered disabled state
    Feb 20 16:54:14 NAS kernel: veth5922185: renamed from eth0
    Feb 20 16:54:14 NAS avahi-daemon[1742]: Interface veth33f6111.IPv6 no longer relevant for mDNS.
    Feb 20 16:54:14 NAS avahi-daemon[1742]: Leaving mDNS multicast group on interface veth33f6111.IPv6 with address fe80::40ba:fbff:fef8:f50d.
    Feb 20 16:54:14 NAS kernel: docker0: port 12(veth33f6111) entered disabled state
    Feb 20 16:54:14 NAS kernel: device veth33f6111 left promiscuous mode
    Feb 20 16:54:14 NAS kernel: docker0: port 12(veth33f6111) entered disabled state
    Feb 20 16:54:14 NAS avahi-daemon[1742]: Withdrawing address record for fe80::40ba:fbff:fef8:f50d on veth33f6111.
    Feb 20 16:54:32 NAS kernel: TCP: out of memory -- consider tuning tcp_mem
    Feb 20 16:55:15 NAS kernel: TCP: out of memory -- consider tuning tcp_mem
    Feb 20 16:56:03 NAS kernel: TCP: out of memory -- consider tuning tcp_mem
    Feb 20 16:59:27 NAS kernel: TCP: out of memory -- consider tuning tcp_mem
    Feb 20 17:00:13 NAS kernel: TCP: out of memory -- consider tuning tcp_mem
    Feb 20 17:00:13 NAS kernel: TCP: out of memory -- consider tuning tcp_mem

     

  7. Hi,

     

    I'm having some issues trying to set-up the docker without VPN. I have changed the /data folder but, how should i proceed with the other options that refer to the VPN?

     

    Other issue I'm having is that rTorrent is not uploading practically anything (including torrents with leechers). I think it is because I need to open some port on my router, but I'm not sure which one is it. I assume it is the listening port that ruTorrent lists on the settings, but should I add anything on the docker config? Or just open it on my router and point it to the unraid server?

     

    Thanks in advance

×
×
  • Create New...