unRAID Server Release 5.0-beta6a Available


Recommended Posts

This is puzzling and frustrating.  I'm sure that you experienced precisely the condition which Tom is trying to catch.  Unfortunately, contrary to expectation, the MBR was not 'unknown' prior to the system being started.  However, once started, unRAID decided that it should create a new partition on some of your drives.  This, in itself, may provide a valuable clue.

 

To be honest, I wont with absolute certainty guarantee that there wasn't any "MBR: Unknown"'s.. I was preoccupied with verifying that all S/Ns matched the ones from my documentation, and that all disks were assigned to the correct slots.. Then I looked for anything out of the ordinary - error messages and such..

 

It is possible that I overlooked if any of the disks said "MBR: Unknown" as I didn't realize the significance of looking for that message..  :-[

 

I do not understand why you would have installed a BETA release of unRAID without thoroughly reading the announcement thread to understand outstanding issues with the release.  There was a warning at the top of the general forum that there was a bug related to the MBR, and several reports of users losing data.  I am glad that you did not overreact and were able to salvage your data, but I am still puzzled that you'd think that a new beta was a good choice for a producion array.

 

What would you have expected that would have prevented you from installing a potentially dangerous beta on your system?

Link to comment
  • Replies 349
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

To be honest, I wont with absolute certainty guarantee that there wasn't any "MBR: Unknown"'s.

 

Ah, in that case, the clue(s) which Tom has been looking for may well have been lost! >:(

 

The problem is that no machine that is already running b6 or b6a seems to be able to reproduce the circumstances which lead to this  problem occurring, whether or not they were afflicted initially.  However, with all the warnings which have been posted, it seems that few are willing to put their data at risk.

Link to comment

The issues were only when the cache drive was formatted in a non-standard way by power-users that had created multiple partitions on it.

 

Hey Joe...

 

I installed my cache drive last night and it showed up as "MBR unknown" (es expected I assume).  I assigned it to the cache slot, started the array, and formatted the drive.  After that it then showed up as MBR aligned 4k.  I then went into Share Settings to confirm the config and left things as they were (defaults):

 

Share Settings

--------------------------------------------------------------------------------

Enable User Shares:  Yes

Included disk(s):  

Excluded disk(s):  

 

Cache Settings

--------------------------------------------------------------------------------

Use cache disk:  Yes

Min. free space:  2000000

 

Mover Settings

--------------------------------------------------------------------------------

Mover schedule:  40 3 * * *

Mover logging:  Enabled

 

 

I then edited each of my shares to use the cache disk...at least that I actually wanted to...one of them being "Downloads".

 

I then copied a test file (~1GB AVI) to the Downloads folder and verified that it was inded copied to the cache drive (and also visible in teh Downloads share).

 

However, it appears that the mover script did not invoke at 3:40am as expected.  Here is the syslog from that time.  You can see that at 6:13 I invoked it manually and it worked as expected.

 

 

Mar 14 22:05:40 unRAID kernel: mdcmd (53): spindown 10

Mar 14 22:05:40 unRAID kernel: mdcmd (54): spindown 12

Mar 14 22:05:40 unRAID kernel: mdcmd (55): spindown 13

Mar 14 22:05:50 unRAID kernel: mdcmd (56): spindown 0

Mar 14 22:05:51 unRAID kernel: mdcmd (57): spindown 3

Mar 14 22:05:51 unRAID kernel: mdcmd (58): spindown 11

Mar 14 23:28:39 unRAID kernel: mdcmd (59): spindown 2

Mar 14 23:28:40 unRAID kernel: mdcmd (60): spindown 18

Mar 14 23:30:01 unRAID kernel: mdcmd (61): spindown 1

Mar 15 04:40:01 unRAID logrotate: ALERT - exited abnormally.

Mar 15 06:13:45 unRAID emhttp: shcmd (132): /usr/local/sbin/mover 2>&1 |logger &

Mar 15 06:13:45 unRAID logger: mover started

Mar 15 06:13:45 unRAID logger: ./Downloads/fant1-int-illusion.avi

Mar 15 06:13:45 unRAID logger: .d..t...... ./

Mar 15 06:13:45 unRAID logger: .d..t...... Downloads/

Mar 15 06:13:45 unRAID logger: >f+++++++++ Downloads/fant1-int-illusion.avi

Mar 15 06:14:41 unRAID logger: .d..t...... Downloads/

Mar 15 06:14:41 unRAID logger: mover finished

 

Any ideas?  I thought I remeber someone saying that an additional reboot was required.

 

Thanks,

 

John

Link to comment

It may take a reboot to get the "mover" scheduled for the middle of the night. I don't use a cache drive, so I really don't know.

 

If you type

crontab -l

you can see what is scheduled.    I do know one thing, if the program you used to move the file still had it open, or if the program you used to play it still had it open, it would be skipped as open files are not moved.

Link to comment

It may take a reboot to get the "mover" scheduled for the middle of the night. I don't use a cache drive, so I really don't know.

 

If you type

crontab -l

you can see what is scheduled.    I do know one thing, if the program you used to move the file still had it open, or if the program you used to play it still had it open, it would be skipped as open files are not moved.

 

# Run hourly cron jobs at 47 minutes after the hour:

47 * * * * /usr/bin/run-parts /etc/cron.hourly 1> /dev/null

#

# Run daily cron jobs at 4:40 every day:

40 4 * * * /usr/bin/run-parts /etc/cron.daily 1> /dev/null

#

# Run weekly cron jobs at 4:30 on the first day of the week:

30 4 * * 0 /usr/bin/run-parts /etc/cron.weekly 1> /dev/null

#

# Run monthly cron jobs at 4:20 on the first day of the month:

20 4 1 * * /usr/bin/run-parts /etc/cron.monthly 1> /dev/null

 

Link to comment

It may take a reboot to get the "mover" scheduled for the middle of the night. I don't use a cache drive, so I really don't know.

...

 

Please create a topic in the General Support forum for this.  It is not related to the 6.0a beta.  Thanks!

Link to comment

The advanced format drives, except for a jumpered EARS, just work better when the partition starts on sector 64.

 

There was some newsflash for the intel admins in my environment some time ago about partition alignment on Wintel servers.  The issue was that if the partitions weren't aligned on the disk boundaries, the system would be doing more IO due to split blocks...  I have a powerpoint on it somewhere, just can't find it atm that explained it really well.

 

Is this the same "issue"?

Link to comment

The advanced format drives, except for a jumpered EARS, just work better when the partition starts on sector 64.

 

There was some newsflash for the intel admins in my environment some time ago about partition alignment on Wintel servers.  The issue was that if the partitions weren't aligned on the disk boundaries, the system would be doing more IO due to split blocks...  I have a powerpoint on it somewhere, just can't find it atm that explained it really well.

 

Is this the same "issue"?

exactly the same issue.
Link to comment

The advanced format drives, except for a jumpered EARS, just work better when the partition starts on sector 64.

 

There was some newsflash for the intel admins in my environment some time ago about partition alignment on Wintel servers.  The issue was that if the partitions weren't aligned on the disk boundaries, the system would be doing more IO due to split blocks...  I have a powerpoint on it somewhere, just can't find it atm that explained it really well.

 

Is this the same "issue"?

 

Not exactly. The issue has nothing to do with there being more system IO. The drive itself has to do more IO internally. Someone here did before and after alignment tests on a EARS and saw about a 10MBps increase in the array write speed (around a 30% speed increase).

 

Peter

Link to comment

The problem is that no machine that is already running b6 or b6a seems to be able to reproduce the circumstances which lead to this  problem occurring, whether or not they were afflicted initially. 

 

Not entirely true. I can reproduce my issue at will. It comes down to running unRAID on a full Slackware distro and using LILO as the boot manager.

Link to comment

The problem is that no machine that is already running b6 or b6a seems to be able to reproduce the circumstances which lead to this  problem occurring, whether or not they were afflicted initially. 

 

Not entirely true. I can reproduce my issue at will. It comes down to running unRAID on a full Slackware distro and using LILO as the boot manager.

And that is good to know.  It is the LILO boot manager writing to the MBR.

 

Lime-tech should be able to code around that issue.

Link to comment

Yeah, though something changed in unRAID around the time of removing the slot-concept, as I ran the same way with multiple invocations of LILO over the 19 month period without the MBRs being overwritten (up until unRAID 5.05b). LILO had been using the 4 bytes for it's own calculated drive Serial ID and the following 2 bytes as a Magic Signature verification for some time now.

 

I'm guessing that when unRAID used the slot concept, it didn't bother to check or rewrite the MBRs if the drive serial id's matched. Now that it doesn't use drive slots, in addition to using the drive hardware serial numbers it has an additional check for MBRs and will rewrite the MBR if it doesn't meet expectations.

 

My workaround for now is to be proactive, and after any LILO runs I run makembr appropriately on my data drives before I reboot. Perhaps a bit of a hassle, but it's enough to get by.

 

 

Link to comment

The problem is that no machine that is already running b6 or b6a seems to be able to reproduce the circumstances which lead to this  problem occurring, whether or not they were afflicted initially. 

 

Not entirely true. I can reproduce my issue at will. It comes down to running unRAID on a full Slackware distro and using LILO as the boot manager.

 

Yes, but I suspect that your case is special.  I doubt that everyone who has experienced (apparent) MBR corruption is using LILO.

Link to comment

Yeah, though something changed in unRAID around the time of removing the slot-concept, ...

 

My guess is that Tom has enhanced the testing performed for a valid unRAID MBR as part of the checking for 63/64 sector partition alignment.

 

This would result in a number of 'dirty' disks (ie those whose MBR does not precisely match that which any version of unRAID/Preclear would have created) to fall foul of the more stringent testing.

Link to comment

Yeah, though something changed in unRAID around the time of removing the slot-concept, ...

 

My guess is that Tom has enhanced the testing performed for a valid unRAID MBR as part of the checking for 63/64 sector partition alignment.

 

This would result in a number of 'dirty' disks (ie those whose MBR does not precisely match that which any version of unRAID/Preclear would have created) to fall foul of the more stringent testing.

Actually, I think Tom @ lime-tech had posted that in the prior releases the MBR was not examined for exact structure once a disk had its model/serial  number registered in the super.dat file.  Since the 5.0beta6 release has us delete the existing super.dat file, the non-standard MBR structures were examined as the disks were all treated as if they were newly added and it was creating a new super.dat.

 

The LILO disk IDs were not expected, nor were any partitions that originally had HPAs that subsequently were removed.

 

Joe L.

Link to comment

It may take a reboot to get the "mover" scheduled for the middle of the night. I don't use a cache drive, so I really don't know.

...

 

Please create a topic in the General Support forum for this.  It is not related to the 6.0a beta.  Thanks!

 

A reboot fixed the issue.

Link to comment

Limetech have gone silent on the updates on a new beta.

 

He is still trying to gather data on what exactly is causing this and how he can work around it.

 

With so many code changes lately and the removal of the slots concept there are probably quite a few things that have changed.

Link to comment

Actually, I think Tom @ lime-tech had posted that in the prior releases the MBR was not examined for exact structure once a disk had its model/serial  number registered in the super.dat file.   Since the 5.0beta6 release has us delete the existing super.dat file, the non-standard MBR structures were examined as the disks were all treated as if they were newly added and it was creating a new super.dat.

 

The LILO disk IDs were not expected, nor were any partitions that originally had HPAs that subsequently were removed.

 

Unfortunately, now all drives' MBRs are also examined on every array start cycle even if super.dat exists.

Link to comment

I followed the steps to remove the super.dat file. I reassigned my drives and pressed start. All drives began "clearing" with the write count going up on the drives and the read count remained the same. Realizing that unraid appears to be formatting my drives I walked over the the server and pressed the power button to start up my shutdown script. After a quick reboot 6/7 drives on the server showed as unformatted while one seemed fine (disk 5 for what it matters).

 

Having nothing of importance on drives 2-7 I ran the preclear script to get the server back online. Assigned the now cleared drives and started the array. The array appears to be functioning now however I am having the same issue I had using 5.4b in that AFP will become unavailable and the GUI for the server will become unresponsive. SMB does appear to stay up.

 

Disabling AFP seems to alleviate the issue but my system log is now showing repeated errors.

 

Mar 17 18:30:21 DarkStar kernel: ------------[ cut here ]------------

Mar 17 18:30:21 DarkStar kernel: WARNING: at arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb9() (Minor Issues)

Mar 17 18:30:21 DarkStar kernel: Hardware name: H57M-USB3

Mar 17 18:30:21 DarkStar kernel: empty IPI mask

Mar 17 18:30:21 DarkStar kernel: Modules linked in: md_mod xor pata_jmicron mvsas libsas i2c_i801 i2c_core ahci r8169 scsi_transport_sas jmicron libahci [last unloaded: md_mod] (Drive related)

Mar 17 18:30:21 DarkStar kernel: Pid: 13665, comm: dd Not tainted 2.6.36.2-unRAID #8 (Errors)

Mar 17 18:30:21 DarkStar kernel: Call Trace: (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1027b52>] warn_slowpath_common+0x65/0x7a (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1015a12>] ? default_send_IPI_mask_logical+0x2f/0xb9 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1027bcb>] warn_slowpath_fmt+0x26/0x2a (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1015a12>] default_send_IPI_mask_logical+0x2f/0xb9 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c10142d0>] native_send_call_func_ipi+0x4f/0x51 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1046c60>] smp_call_function_many+0x15e/0x176 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1059794>] ? drain_local_pages+0x0/0x10 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1059794>] ? drain_local_pages+0x0/0x10 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1046c92>] smp_call_function+0x1a/0x20 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c102bb86>] on_each_cpu+0xf/0x1e (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c105a48c>] drain_all_pages+0x14/0x16 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c105ab8a>] __alloc_pages_nodemask+0x316/0x450 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1056f88>] grab_cache_page_write_begin+0x4f/0x8b (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1096386>] block_write_begin+0x21/0x68 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1099965>] ? blkdev_write_end+0x2d/0x36 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c109998c>] blkdev_write_begin+0x1e/0x20 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1098e53>] ? blkdev_get_block+0x0/0xc4 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1055afa>] generic_file_buffered_write+0xb5/0x1a9 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1056c9b>] __generic_file_aio_write+0x392/0x3d3 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1099096>] blkdev_aio_write+0x2e/0x6d (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1079fc2>] do_sync_write+0x8a/0xc5 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1003bf1>] ? do_IRQ+0x86/0x9a (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1191803>] ? __clear_user+0x11/0x28 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c107a7d5>] vfs_write+0x8a/0xfd (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c1079f38>] ? do_sync_write+0x0/0xc5 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c107a8df>] sys_write+0x3b/0x60 (Errors)

Mar 17 18:30:21 DarkStar kernel:  [<c130fb5d>] syscall_call+0x7/0xb (Errors)

Mar 17 18:30:21 DarkStar kernel: ---[ end trace 037ddde044816d85 ]---

 

Mar 18 22:15:01 DarkStar kernel: sas: command 0xf6b1fa80, task 0xf6cc5540, timed out: BLK_EH_NOT_HANDLED (Drive related)

Mar 18 22:15:01 DarkStar kernel: sas: Enter sas_scsi_recover_host (Drive related)

Mar 18 22:15:01 DarkStar kernel: sas: trying to find task 0xf6cc5540 (Drive related)

Mar 18 22:15:01 DarkStar kernel: sas: sas_scsi_find_task: aborting task 0xf6cc5540 (Drive related)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=c5160000 task=f6cc5540 slot=c517163c slot_idx=x2 (System)

Mar 18 22:15:01 DarkStar kernel: sas: sas_scsi_find_task: querying task 0xf6cc5540 (Drive related)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5 (System)

Mar 18 22:15:01 DarkStar kernel: sas: sas_scsi_find_task: task 0xf6cc5540 failed to abort (Minor Issues)

Mar 18 22:15:01 DarkStar kernel: sas: task 0xf6cc5540 is not at LU: I_T recover (Drive related)

Mar 18 22:15:01 DarkStar kernel: sas: I_T nexus reset for dev 0400000000000000 (Drive related)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 2083:port 4 ctrl sts=0x89800. (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 2085:Port 4 irq sts = 0x1001001 (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 2111:phy4 Unplug Notice (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 2083:port 4 ctrl sts=0x199800. (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 2085:Port 4 irq sts = 0x1001081 (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 2083:port 4 ctrl sts=0x199800. (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 2085:Port 4 irq sts = 0x10000 (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[4] (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 1224:port 4 attach dev info is 60400 (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 1226:port 4 attach sas addr is 4 (System)

Mar 18 22:15:01 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 378:phy 4 byte dmaded. (System)

Mar 18 22:15:01 DarkStar kernel: sas: sas_form_port: phy4 belongs to port4 already(1)! (Drive related)

Mar 18 22:15:03 DarkStar kernel: drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[4]:rc= 0 (System)

Mar 18 22:15:03 DarkStar kernel: sas: I_T 0400000000000000 recovered (Drive related)

Mar 18 22:15:03 DarkStar kernel: sas: sas_ata_task_done: SAS error 8d (Errors)

Mar 18 22:15:03 DarkStar kernel: ata13: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Mar 18 22:15:03 DarkStar kernel: ata13: status=0x01 { Error } (Errors)

Mar 18 22:15:03 DarkStar kernel: ata13: error=0x04 { DriveStatusError } (Errors)

Mar 18 22:15:03 DarkStar kernel: sas: --- Exit sas_scsi_recover_host (Drive related)

 

As a side note non of the recovery options for reiserfs appear to work on drive one, every attempt ends in "aborted".

Link to comment

here is one I have not seen before, I have 18 - 2TB drives installed. No parity, no cache. Raid was set up with no issues, drives cleared properly, a share was set up as Movies which I have been testing r/w to extensively over the course of the last 2 weeks with no issues. Today all of the sudden I received an error while trying to copy a 1gb file over to the server, "not enough space available". When I go into //Tower and review unraid web gui it shows 24gb of disk space available, as it should. WHen I check in windows explorer it shows "0 bytes" of free space. Restarted the server, restarted my windows 7 machine yet the error still persists. I am running 6A by the way.

 

Any ideas?

 

 

Link to comment

Wrong word used, raid is not set up, the array was set up without a hitch and then a share across all the drives. I wanted to do testing on the array before I set up unraid. So the problem is actually creeping up on the share of the array. Where it shows 0 bytes available when it actually has 24TB of spave available.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.