[SOLVED]Did m1015 bug corrupt my system?

November 3, 201213 yr

Hi,

I have been running unRaid for a couple of months now on a standalone system without problem(didn't use the 1015 then). Decided on Friday to consolidate my servers and install it in VMware. No problems doing that with all the guides around and all seem to be working fine(passthru the 1015 and fixed the USB booting). It was today I noticed that the system acted weird. The weirdness was that after the sleep command the system gave me access errors.(nothing was showing as red).

Did a parity check during the night with showed no errors. long story short. Did a research and found the "sleep" bug. To verify that this was the cause I spun down one drive and tried to access it. Gave me an error and I spun it manually up, that didn't work ok and the disk gave me a red ball(I could still access the fielsd on the disk OK.). Did a check with smartctl that gave me SMART overall-health self-assessment test result: PASSED. Disk seemed fine So what did I do now:

1) I stopped the array.

2) Removed the disk and started the array again.

3 ) stopped the arrary and selected the same disk again.

4) rebuild process started.

6Hours later and now the files seems to be corrupted.

Nov 3 07:24:24 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 9361 does not match to the expected one 1

Nov 3 07:24:24 Tower kernel: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 251756545. Fsck?

Nov 3 07:24:24 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 37722 does not match to the expected one 1

Nov 3 07:24:24 Tower kernel: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 251756547. Fsck?

Nov 3 07:24:24 Tower kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find s tat data of [27184 27605 0x0 SD]

Nov 3 08:00:22 Tower kernel: end_request: I/O error, dev sdb, sector 112460208

Nov 3 08:00:27 Tower kernel: sd 0:0:1:0: [sdb] Device not ready

Nov 3 08:00:27 Tower kernel: sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08

Nov 3 08:00:27 Tower kernel: sd 0:0:1:0: [sdb] Sense Key : 0x2 [current]

Nov 3 08:00:27 Tower kernel: sd 0:0:1:0: [sdb] ASC=0x4 ASCQ=0x2

Nov 3 08:00:27 Tower kernel: sd 0:0:1:0: [sdb] CDB: cdb[0]=0x28: 28 00 06 b4 01 b0 00 00

--------------------------------------

root@Tower:~# smartctl -A /dev/sdd

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

=== START OF READ SMART DATA SECTION ===

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 162 139 021 Pre-fail Always - 8866

4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1681

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0

9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 22716

10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 268

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 130

193 Load_Cycle_Count 0x0032 057 057 000 Old_age Always - 429367

194 Temperature_Celsius 0x0022 108 083 000 Old_age Always - 44

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Sat Nov 3 18:56:15 2012

###########

Replaying journal: Trans replayed: mountid 73, transid 60485, desc 1153, len 1, commit 1155, next trans offset 1138

Trans replayed: mountid 73, transid 60486, desc 1156, len 1, commit 1158, next trans offset 1141

Trans replayed: mountid 73, transid 60487, desc 1159, len 1, commit 1161, next trans offset 1144

Replaying journal: Done.

Reiserfs journal '/dev/md2' in blocks [18..8211]: 3 transactions replayed

Checking internal tree.. \/ 2 (of 13\/ 8 (of 158//138 (of 152\block 225160929: The level of the node (19471) is not correct, (1) expected

the problem in the internal node occured (225160929), whole subtree is skipped / 13 (of 13\/170 (of 170//108 (of 108/bad_stat_data: The objectid (25737) is shared by at least two files. Can be fixed with --rebuild-tree only.

bad_leaf: block 214009923, items 5 and 6: The wrong order of items: [225885 25737 0x0 SD (0)], [225885 225886 0x1 DIR (3)]finished

Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

2 found corruptions can be fixed only when running with --rebuild-tree

###########

reiserfsck finished at Sat Nov 3 19:23:52 2012

###########

I guess that parity disk has the "corrupt" data also so my only guess is to do a rebuild-tree?

If I did something in the wrong order then please tell me since I not so eager to have this happend again.

Regards

Martin

Quote

November 3, 201213 yr

I guess that parity disk has the "corrupt" data also so my only guess is to do a rebuild-tree?

If I did something in the wrong order then please tell me since I not so eager to have this happend again.

Regards

Martin

Yes, you must now run

reiserfsck --rebuild-tree /dev/md2

(as instructed)

Quote

November 4, 201213 yr

Author

After 12h rebuild-tree I probably got really lucky. There was only a an empty folder in lost+found and all files I have tried accessed seemed to be ok.I have turned off spin down on all disk and removed my cache disk(since it still went to sleep and triggered the sleep bug)I'm doing a parity check now to be sure that the paritydisk have the "updated" structure.

After that I will upgrade to the latest RC8 since this problem seemed to be fixed in that kernel.

Quote

November 4, 201213 yr

Author

Ok. Upgraded to RC8 and to samba 3.6.8. Everything runs fine and no problems with the sleep function.

Quote

[SOLVED]Did m1015 bug corrupt my system?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)