NAS Posted June 12, 2010 Share Posted June 12, 2010 and now there is talk about another stable without any public beta testing. We should learn from the need to even have had a "4.5.3-TEST". Use the community at your disposal. More betas, less stables I agree, get rid of this Microsoft mentality of releasing what should be a beta version as a full version. It would be nice to see a new version posted as beta for a month or so and then after there is confidence that any critical bugs have been ironed out and fixed, release it as final. Patches to released versions generally are for isolated bug fixes or minor additions such as drivers for different hardware. In these cases I almost always am working with the person(s) who request the change. 4.5.2 was a bit special in that I had to quickly generate a release to support h/w needed to ship in servers. The -TEST version was also a special case, released to get help from the Community to solve that format problem. That makes sense but to my eye there are two district usegroups here. People like us that always stay up to date and others (which are likely the majority) that update far less frequently and only to stable versions. I still say release everything as a Beta or RC then periodically re-release one with no changes as a stable once it has proven worthy. If you look at the last few stables with an impartial eye there were some comparatively big changes in them like Samba and kernel updates that were released as stable without any community testing. With such a rag tag collection of hardware its the safest way. In essence some of this is semantics but to me at least it is the sensible option. It also means you have far fewer stables to officially support. Quote Link to comment
Rajahal Posted June 14, 2010 Share Posted June 14, 2010 I still say release everything as a Beta or RC then periodically re-release one with no changes as a stable once it has proven worthy. I'm with NAS on this one. From my limited perspective, it seems that the whole 'unformatted bug' fiasco with 4.5.3 could have been avoided with the proper use of beta versions. At least one user lost data because of the bug, and that's one too many. I understand the Adaptec to SuperMicro hardware change forced your hand a bit in releasing a new version quickly, but as I understand it that change was unrelated to the unformatted bug (please correct me if I'm wrong here). On a separate note, I would love to be a part of the alpha testing, and I do have two test servers I could throw at it, but I'll be traveling for the next month so I expect I'll miss out on that. If it is still going around mid-July, I'll see if I can pitch in at that time. Quote Link to comment
slaveunit Posted June 14, 2010 Share Posted June 14, 2010 Hello All, Since there is no 4.5.x thread in the support section I am posting here. I noticed on my console some errors suggesting me to run FSCK this morning. So following the wiki I ran "reiserfsck --check /dev/md1" My FSCK return was: Trans replayed: mountid 20, transid 66029, desc 428, len 9, commit 438, next trans offset 421 Trans replayed: mountid 20, transid 66030, desc 439, len 9, commit 449, next trans offset 432 Trans replayed: mountid 20, transid 66031, desc 450, len 10, commit 461, next trans offset 444 Trans replayed: mountid 20, transid 66032, desc 462, len 14, commit 477, next trans offset 460 Trans replayed: mountid 20, transid 66033, desc 478, len 1, commit 480, next trans offset 463 Trans replayed: mountid 20, transid 66034, desc 481, len 9, commit 491, next trans offset 474 Trans replayed: mountid 20, transid 66035, desc 492, len 71, commit 564, next trans offset 547 Trans replayed: mountid 20, transid 66036, desc 565, len 14, commit 580, next trans offset 563 Reiserfs journal '/dev/md15' in blocks [18..8211]: 618 transactions replayed Checking internal tree../ 1 (of 6)/ 13 (of 170)/144 (of 170)block 38887627: The level of the node (0) is not correct , (1) expected the problem in the internal node occured (38887627), whole subtree is skipped finished ) Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Bad nodes were found, Semantic pass skipped 1 found corruptions can be fixed only when running with --rebuild-tree ########### reiserfsck finished at Mon Jun 14 08:02:01 2010 ########### root@Tower:~# So since it is asking me to run the "--rebuild-tree" switch I am asking for advice here. If it matters at all I did not reboot the box before running FSCK. syslog is too large to attach so I have posted it at pastebin: http://pastebin.com/EqUQuUF1 Quote Link to comment
Joe L. Posted June 14, 2010 Share Posted June 14, 2010 You are proceeding as you should. Run with the --rebuild-tree next. Joe L. Quote Link to comment
slaveunit Posted June 14, 2010 Share Posted June 14, 2010 Thanks for the quick reply. Is there any risk of data loss by running this? Would it be safer just to RMA the drive? Quote Link to comment
Joe L. Posted June 14, 2010 Share Posted June 14, 2010 Thanks for the quick reply. Is there any risk of data loss by running this? Would it be safer just to RMA the drive? It has nothing to do with the drive itself. It has to do with a corrupted file-system on it. Have you gotten a smartctl report on the drive? That would let you know of its health. Joe L. Quote Link to comment
slaveunit Posted June 14, 2010 Share Posted June 14, 2010 I didn't look previously. Im not 100% keen on how to read all he sections here. I do see some errors with the on time. root@Tower:/dev# smartctl -d ata -a /dev/sdq smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST31500341AS Serial Number: 9VS1R38D Firmware Version: CC1H User Capacity: 1,500,301,910,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Jun 14 09:10:16 2010 GMT+8 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 609) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 105 095 006 Pre-fail Always - 8067228 3 Spin_Up_Time 0x0003 096 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 56 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 061 060 030 Pre-fail Always - 1367503 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 896 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 30 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 156 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 041 041 000 Old_age Always - 59 190 Airflow_Temperature_Cel 0x0022 070 068 045 Old_age Always - 30 (Lifetime Min/Ma x 22/31) 194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 20 0 0) 195 Hardware_ECC_Recovered 0x001a 040 033 000 Old_age Always - 8067228 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 38001870635228 241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 3759789166 242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 3403232947 SMART Error Log Version: 1 ATA Error Count: 156 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 156 occurred at disk power-on lifetime: 175 hours (7 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 ff ff ff ef 00 1d+10:45:08.277 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+10:45:08.248 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+10:45:08.228 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+10:45:08.209 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+10:45:08.047 READ NATIVE MAX ADDRESS EXT Error 155 occurred at disk power-on lifetime: 175 hours (7 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 ff ff ff ef 00 1d+10:45:05.291 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+10:45:05.261 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+10:45:05.241 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+10:45:05.222 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+10:45:05.131 READ NATIVE MAX ADDRESS EXT Error 154 occurred at disk power-on lifetime: 175 hours (7 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 ff ff ff ef 00 1d+10:45:02.284 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+10:45:02.255 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+10:45:02.235 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+10:45:02.215 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+10:45:02.054 READ NATIVE MAX ADDRESS EXT Error 153 occurred at disk power-on lifetime: 175 hours (7 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 ff ff ff ef 00 1d+10:44:59.277 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+10:44:59.248 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+10:44:59.228 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+10:44:59.209 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+10:44:59.128 READ NATIVE MAX ADDRESS EXT Error 152 occurred at disk power-on lifetime: 175 hours (7 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 ff ff ff ef 00 1d+10:44:56.281 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+10:44:56.252 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+10:44:56.232 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+10:44:56.212 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+10:44:56.052 READ NATIVE MAX ADDRESS EXT SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@Tower:/dev# Quote Link to comment
Joe L. Posted June 14, 2010 Share Posted June 14, 2010 The drive does not look like it is having any errors in the smart report. It is responding and has no sectors penfing re-allocation or that have been re-allocated. Run the --rebuild-tree on /dev/md15. Joe L. Quote Link to comment
Joe L. Posted June 14, 2010 Share Posted June 14, 2010 As far as data loss. The file tree is unable to get to all the data now. It might be referencing blocks of data from files you deleted, or it might be files you just don'y know are missing in the directory listings. (or they are listed, but their contents incomplete) With any luck, running the reiserfsck will fix things. Other than that, looking in your log file, were you plugging and un-plugging the LAN cable? It shows the link being lost again and again. # Jun 13 12:52:41 Tower kernel: r8169: eth0: link down Jun 13 12:52:42 Tower ifplugd(eth0)[1517]: Link beat lost. Jun 13 12:52:42 Tower kernel: r8169: eth0: link up Jun 13 12:52:43 Tower ifplugd(eth0)[1517]: Link beat detected. Jun 13 12:52:57 Tower kernel: r8169: eth0: link down Jun 13 12:52:58 Tower ifplugd(eth0)[1517]: Link beat lost. Jun 13 12:53:01 Tower kernel: r8169: eth0: link up Jun 13 12:53:01 Tower ifplugd(eth0)[1517]: Link beat detected. Jun 13 12:57:35 Tower kernel: mdcmd (18155): spindown 13 Jun 13 12:57:56 Tower kernel: mdcmd (18159): spindown 6 Jun 13 12:57:56 Tower kernel: mdcmd (18160): spindown 8 Jun 13 13:05:29 Tower kernel: mdcmd (18207): spindown 5 Jun 13 13:05:46 Tower kernel: r8169: eth0: link down Jun 13 13:05:47 Tower ifplugd(eth0)[1517]: Link beat lost. Jun 13 13:05:49 Tower kernel: r8169: eth0: link up Jun 13 13:05:50 Tower ifplugd(eth0)[1517]: Link beat detected. Jun 13 13:06:02 Tower kernel: r8169: eth0: link down Jun 13 13:06:03 Tower ifplugd(eth0)[1517]: Link beat lost. Jun 13 13:06:04 Tower kernel: r8169: eth0: link up Quote Link to comment
slaveunit Posted June 14, 2010 Share Posted June 14, 2010 On the NIC question, yes. I was updating the fw on the router the unRAID is connected to. I am currently running the reiserfsck --rebuild-tree now and will then run another --check to see if it fixed the issue. I will report results when complete. Update: Its now completed. Here are the results. It looks like it did fix it. root@Tower:~# reiserfsck --check /dev/md15 reiserfsck 3.6.19 (2003 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md15 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Mon Jun 14 12:53:19 2010 ########### Replaying journal.. Reiserfs journal '/dev/md15' in blocks [18..8211]: 0 transactions replayed Checking internal tree..finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 159504 Internal nodes 970 Directories 82 Other files 138 Data block pointers 161395625 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Mon Jun 14 13:41:11 2010 ########### root@Tower:~# Thanks very much for your time Joe!! Quote Link to comment
unraided Posted June 22, 2010 Share Posted June 22, 2010 Hi guys. I've been out of the loop regarding the forum for a while. Does v4.5.4 cause any well known problems? I'm still running v4.5.1. Any reports/problems especially with WOL not working with regards to the Realtek RTL8112L NIC's? Thanks. Quote Link to comment
Calvin Posted November 21, 2010 Share Posted November 21, 2010 Wow. searching my email shows its a MD-1500/LL that I bought in 2007. I think the problem is that the system image size has increased beyond a certain value, and the 'ldlinux.sys' file on your flash (from 2007) can not load it correctly. After shutting down your server, you need to remove Flash and plug into your PC. Now backup the contents of your 'config' directory, e.g., drag to windows desktop. Next download the version of syslinux found here: http://lime-technology.com/download/cat_view/55-utilities Now follow instructions for installing the latest release found here: http://lime-technology.com/support/unraid-server-installation Finally, restore the 'config' directory to the flash. If this is unclear, then send me an email: [email protected]. OK. so its been awhile ... over 5 months... Looks like this was it. My heart skipped a beat when it said access to the flash drive was denied. But another thread said start syslinux with admin and that let me do it. It might be good idea to put the gotcha issue with win7 on the server installation page that's talking about how to use syslinux. I'm not sure what the latest stable is. There is a thread saying 4.5.8 is available but 4.5.6 was the one the main page let me download so that is what I've done for now. It looks like there is a 4.6 going stable shortly if I read that right. At any rate. Thanks. The system booted this time taking me from ver 4.5 to 4.5.6 and started a Parity-Check. The raid is up and I can access my files. Thanks for the info about syslinux and the likely issue with the system image size from my older 2007 files. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.