-
Posts
5040 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by NAS
-
-
An update.
I have changed to 4.2-beta-2 and started again slowly testing every piece of equipment. As it stands i am back to all original hardware including the 3rd TX4 card and the system is rock solid. Zero (not a single) error ever.
The last test will be to add in the 9th drive but i want to run with the current hardware for a bit to see if it glitches before this.
-
Ah sorry i get you now. Its still a pity that when a disk actually spins up or down there are no logs to signify this event.
-
Swings and round abouts. I would suggest that one bigger one would be better since there is an economy of scale in UPSs (to a point).
The thing is though unless you have some serious systems one UPS to cover 3 of them still wont be that huge or expensive. It all comes down to uptime in the event of a power failure thats where the money disappears.
If you want 3 systems to be up for 2 hours after a complete power outage you will spend alot more than if you only need 5 minutes.
I would suggest that the main protection people here need is from short black outs but more importantly brown outs (the real kit killer).
If unRAID could implement some UPS integration agent then all you would need is 10 minutes of battery coverage to allow for graceful shutdown.
One final note pay careful attention to weight. UPS companies have deals with delivery specialists to make delivery cheap.. so you may be unpleasantly surprised how heavy some of these things are.
-
Ahh that explains it.
FYI this "may" have fixed a problem i reported in my TX4 thread. I need to do alot of incremental testing to confirm that though. (I only post this incase someone else is in the same boat)
Update: thinking about it since this is an event perhaps this along with any other relevant ones should be sent to the syslog i.e. when unRAID does something log it.
-
What i would do is buy a big brand UPS such as APC.
Now before you say "too expensive" check out ebay and get one even if it has no battery. Then you can get a brand new proper battery by mail order. What you end up with is essentially a brand new top end UPS for a fraction of the cost.
As for how big once you go to a proper brand your no longer have to think about quality problems of protection and merely are calculating the amount of time the system will run without mains.
-
A couple of messages that would appear in syslog alot are:
hdparm -y ...
and
APCI Error etc
I assume the 4.2 branch has APCI turned of by default again which explain that one (thanks for that btw).
However where have all the disk to sleep messages gone?
-
A small update.
Ive found a pattern that may be a red herring or not. Severla of the fatal lockups seem to happen after usr/sbin/hdparm -y /dev/sdg >/dev/null like:
Sep 2 17:37:44 NASBOX emhttp[1294]: shcmd (14): /usr/sbin/hdparm -y /dev/sdg >/dev/null
Sep 2 17:37:54 NASBOX kernel: [18173.780487] ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Does that hint at anything?
-
I will give both ideas a go. Obviously stripping out all the drives and checking them will take some time so if don't get back to the thread for a couple of days that will be why.
Appreciated for the continued help.
-
Thanks for the replies Hopefully i answer all the questions here:
department of longshots:
I seem to recall something about problems with more than two Promise controllers on one motherboard. Not a Unraid issue, but a Promise issue.
/Rene
If anyone can point me at any of these it would be appreciated. I will keep looking as well.
The posts I saw on this topic referred to it being solved by swapping PCI slots, but you did that
I have even tried swapping cards out with new ones as i have spares. Have i tried every mathematical combination of cards and slots, probably not, but im not sure thats completely practical... thats an awful lot of reboots.
* You mention not having changed anything but APCI, does this mean it worked before?
No sorry that was perhaps a red herring. All the APCI fix done was remove a recurring syslog entry complqaining about a second CPU (which didnt exist).
Fix applied as per a post here and the error went away.
* Did you try turning APCI back to its original setting?
Yes and the problem persists. As the problem is sporadic it sometimes takes a while to come back so at first i thought the APCI fix had done it but as i mentioned previousoly it was a red herring
* Have you tried getting a smaller number of drives working? Then add them until it is no longer stable.
The system has grown so in essence i have done this. It wasnt until i added this latest drive (the first on the 3rd card) that the
problem became quite so prevalent, although its always been there.
* Are there drives hooked up directly to the mobo?
No they all exist on the TX4 cards. In fact the mobo BIOS has all the SATA ports turned of as well as the secondary IDE. The primary IDE has a DVDR in it.
* Are your drives set so they work with SATA150?
Thats a new one on me what needs to be done to set a drive to SATA150?
* Did you reseat the cards and cables (power and signal)? SATA cables can get come loose fairly easily.
Obviously there are alot of cables etc but they are relatively neat as I believe they are all ok having checked them several times.
* Is your PSU up to the task? For nine drives I would expect at least a good 500W supply
Its a Hyper Type R 580W power supply so I assume yes.
* Please post your complete config - be specific (mobo, age, cpu, drives, psu, memory, other devices, brands/models, how hooked up, ...)
Hyper Type R 580W power
Asrock K7S8XE+
AMD Athlon XP 2500+
2 Gail 512MB PC3200 Memory Sticks
3 Promise SATA150 TX4 controllers in PCI slots 1,2 and 3
DVDR (Would need to identify the brand but brand new)
Parity WDC WD5000AAKS-0
disk1 WDC WD5000YS
disk2 WDC WD5000AAKS
disk3 WDC WD5000YS
disk4 WDC WD5000YS
disk5 WDC WD5000AAKS
disk6 WDC WD5000YS
disk7 WDC WD5000AAKS
disk8 WDC WD5000YS
-
I am having alot of problems getting my unRAID box stable. It has nine 500GB WD SATAII drives connected three Promise SAT150 TX4.
The symptoms are that sporadically i receive countless syslogs (posted next) and then the box becomes completely unresponsove cant even ping it. I have everything that is unnecessary turned of in the BIOS, tried swapping PCI slots, rolling back unRAID versions, swapping NICs, turning of spin down and as many other things as i can think of.
I am prepared to post as much information as anyone would like and basically try anything necessary to get to the route of this problem as it exceptionally annoying and its just a matter of time before i lose data.
Nothing has been changed in unRAID except turning of apci via syslinux boot code.
Syslog samples
Sep 2 17:37:43 NASBOX emhttp[1294]: shcmd (12): /usr/sbin/hdparm -y /dev/sde >/dev/null
Sep 2 17:37:44 NASBOX emhttp[1294]: shcmd (13): /usr/sbin/hdparm -y /dev/sdf >/dev/null
Sep 2 17:37:44 NASBOX emhttp[1294]: shcmd (14): /usr/sbin/hdparm -y /dev/sdg >/dev/null
Sep 2 17:37:54 NASBOX kernel: [18173.780487] ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 2 17:37:54 NASBOX kernel: [18173.780500] ata9.00: cmd e0/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
Sep 2 17:37:54 NASBOX kernel: [18173.780502] res 40/00:00:00:40:00/0c:00:3a:00:00/00 Emask 0x4 (timeout)
Sep 2 17:37:55 NASBOX kernel: [18174.110115] ata9: soft resetting port
Sep 2 17:37:55 NASBOX kernel: [18174.110123] ata9: SATA link down (SStatus 614 SControl 300)
Sep 2 17:37:55 NASBOX kernel: [18174.110131] ata9: failed to recover some devices, retrying in 5 secs
Sep 2 17:47:12 NASBOX kernel: [ 234.527461] ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
Sep 2 17:47:12 NASBOX kernel: [ 234.527469] ata9.00: (port_status 0x20200000)
Sep 2 17:47:12 NASBOX kernel: [ 234.527478] ata9.00: cmd 25/00:f8:ef:c9:0b/00:01:00:00:00/e0 tag 0 cdb 0x0 data 258048 in
Sep 2 17:47:12 NASBOX kernel: [ 234.527480] res 51/0c:97:50:cb:0b/0c:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Sep 2 17:47:12 NASBOX kernel: [ 234.850346] ata9: soft resetting port
Sep 2 17:47:12 NASBOX kernel: [ 235.010193] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 2 17:47:12 NASBOX kernel: [ 235.060666] ata9.00: configured for UDMA/33
Sep 2 17:47:12 NASBOX kernel: [ 235.060680] ata9: EH complete
Sep 2 17:47:12 NASBOX kernel: [ 235.129956] sd 9:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
Sep 2 17:47:12 NASBOX kernel: [ 235.178340] sd 9:0:0:0: [sdg] Write Protect is off
Sep 2 17:47:12 NASBOX kernel: [ 235.178347] sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00
Sep 2 17:47:12 NASBOX kernel: [ 235.255839] sd 9:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO
Slow telnet login
in General Support (V5 and Older)
Posted
When i try to telnet to the box it works but takes about 10 seconds to give a prompt. This is with a few different telnet clients so the prob is unRAID.
I suspect it is to do with static IP and no real entries in resolv.conf but before i start messing about with it does anyone else get this?