I have had some experience with this interface and have discovered that the problem lies in the way the Escient handles TCP connections. Heavy heavy traffic will shut this thing down and escient and RTI have acknowleged the problem. I verified the problem with RTI. For light use the interface is great but do not expect this to work under heavy loads.
Here is some excerpt from my exchanges with Escient/RTI that describe my finding related to this issue.
-------------------------------------------------------------------------
Hello,
I am running an E2-100 fireball and controlling the device using the web interface. The system is running with 10 RTI RK3 touchscreens. We are noticing that the fireball occasionally quites responding over Ethernet. Specifically the web interface stops working from both the RTI and the regular web interface. Sometimes it recovers after a period of time and sometimes is does not. In generally the performance on the Escient seems to deteriorate over time and finally locks up and is corrected with a power cycle of the Escient.
All devices are running on UPS with surge and properly cooled.
We have another web application running from the same touchscreens and it seems to works fine. We have updated to all the latest firmware on all devices in question. From my extensive experience with TCP/IP, I suspect there is a problem either with the number of open connections allowed on the E2 or an issue getting the TCP connection closed somewhere.
I am copying both Escient and RTI as I suspect the problem could be on either end.
Question for RTI: When does the TCP connection get closed from the IE browser? Is is closed evertime when the IE Object page is torn down? Is there a possibility is it not being closed?
Question for Escient: Is your web server holding open the connection to the rti? I think you are? I suspect this because you are refreshing songs on the "Now Playing" menu.
I would bet the issue is that there could be some persistent TCP connections that are not being closed and eventually the Escient gets bogged down trying to process either a connection that doesn't exist or exceeds the maximum number of open sockets. As the sockets time out and are closed the Escient recovers. Just a guess based on how it is behaving.
NOTE: If I am exceeding the limits of these devices by running 10 touchscreens, that fine but I need to specifically understand the limitations. We are pushing this as hard as we can before turning it over the customer and if we add an additional zone that's fine BUT if I throw more hardware at it is MUST work. I am open to any suggestions anyone may have here.
Can I get a case opened up with both companies? I guess I am going to find out what the term "integration partner" really means. I hope this can be solved without a bunch of finger pointing.
Thanks,
John
P.S.
====================================================================
If your products are going to support TCP/IP you need to have the ability to troubleshoot problems from the device. Pass this along to your engineers.
===================================================================
All TCP/IP enabled devices need to have some basic tools for troubleshooting connections without having to have a packet sniffer. At a minimum the following tools should be available from a diagnostic menu or control panel.
1. netstat (full implementation)
2. arp
3. ping
4. ifconfig (ability to check for transmission errors)
On 7/16/07, Cory Poore <
[email protected] <
mailto:[email protected]> <
mailto:[email protected] <
mailto:[email protected]> > > wrote:
John,
I consulted with engineering regarding the issue which you are encountering in integrating the RTI RK3 panels and the Fireball's RTI interface. Here's the response I received back from engineering.
" As per this customers comments there is a limit on the number of open TCP connections permitted at a time. The limit is based on the limited resources on the box in particular memory. The limit can be exceeded when two many connections are in the FIN_WAIT state. This occurs when a connection is closed but the OS must keep the socket around for some period of time to make sure that extra packets that may arrive out of order are handled correctly.
It sounds like the configuration is exceeding the number of clients supported using the web application at a time. This number was established to avoid problems like this. I don't recall what the number was Rob or Brian should be able to comment on that. "
Exactly how many IP enabled devices such as the RK3 panels do you have access the Fireball at one given time? This should include both active and idle devices.
My Response back
Subject: Re: Escient/RK3 Issue
Got to pondering your response more on the way home this eveningand have a few thoughts on the overall problem. First of all, I described the problem in great detail. I have better things to do then write long winded emails but I need this problem solved. I believe I described my issue in great detail, go back and read it again and you should find out some of the answers to the redundant questions you asked me. I appreciate your attempt to answer but it was not an acceptable answer or solution to my problem.Let's try this again.
1. The technical term for the behavioryou have described is a DENIAL OF SERVICE ATTACK.(Not an attack but certainly DOS).
2. If there is indeed a FIN-Wait timeout issue as I suspected, then why are you asking ME about idle connections? You need to be asking RTI. Since neither the Escient or RTI has any type of Netstat -a command to figure out what is open or closed, I could not possibly answer that question.
3. I would also argue that this is a major design flaw and that this issue should have been dealt with during the design of this interface. I observed several different URLs being issued from RK3 to Escient. This would imply that some of the interface is not a persistent TCP connection. Therefore, the very act of USING the interface itself is going to cause sockets to be opened and closed hence TCP connections in FIN-WAIT state.
4. RTI has a part in this issue as well. They do not support persistence IE Browser Objects. So when I leave the Escient screen to do something else and then come back, I have probably closed and opened a couple of sockets. Hence more TCP connections in FIN-WAIT state. I would contend I can lock up fireball with one RK3 by just going in and out of the interface. If this is true then the number of panels is irrelevant to this problem.
5. The other humorous comment that was made was the reference to memory preservation. I use to worry about this greatly back in 1986 when I wrote assemble language on the Intel 80186 processor with 32K of static RAM. The last memory sticks I bought a few months were less than $100 for 512MB DDR2 (old school by today's standards). For the price of this machine, it should be loaded with memory.RTI your so called "Integration Partner" should not be able to shut your system down just by interfacing to it. I am a dried upold computer guy so please takethe following in that context as I meanno disrespect.You guys want to pat yourselves on the back because your stuff is actuallyinteroperablewith something (sort of). I would have been fired if I designed something that wasn't. We use the term "Open Platform" twenty years ago. Unfortunately, most of the AV world still does not get it, although companies like RTI and Escient at least trying to make an effort.
6. Lastly, the FIN-WAIT timer should be set to a very small number. Most drivers default this to several minutes.This is way too long for this application. We can generally assume we are running on a LAN of at least 10Mbs with probably 0 hops between RK3/Escient. In rare cases one might access across a VPN (I actually tested this from my office to house and it worked fine) but for most production environments this should really not an issue. It would be extremely rare unless there was a device failing on the network to see out of order packets so setting this number fairly low would not be unreasonable. I would take my chances with this vs having the interface lock up just by using it. Depending on your OS, you may not be able to get to this, if your running Linux, it can be changed.
7. Finally, I did not receive case numbers as requested from either Escient or RTI. This is my 2nd request. I paid good money for this hardware and still am within a window to return everything (particularly the Escient). We were a very loyal Audio Request dealer and bought this device specifically because of the integration (?) with RTI. Get your engineers on the phone together and figure this out. I'll be happy to assist in any way I can.
Regards,
John
date Jul 17, 2007 5:02 PM
subject RE: Escient/RK3 Issue
John,
The FireBall application runs on an embedded OS and processor with either 32 or 64MB of RAM and no MMU. It is not a PC. We allocate memory for 100 simultaneous sockets at boot. Each RTI web interface opens and maintains 6 persistent sockets. The Fireball application also uses the socket pool for other internal and external communications. Given this, the maximum number of simultaneous RTI web connection the FireBall supports is 11. We are not able to change the FIN_WAIT parameter in our network stack. RTI closes all socket connections to the FireBall when the web application exits on the RK3 and K4 panels. These 6 sockets are not immediately returned to the Fireball socket pool because of the FIN_WAIT requirement. If the RTI user enters and exits the FireBall web interface a number of times in succession on a single panel or in your test case, you open 10 simultaneous connections on 10 different panels and then start closing and opening these then it is possible to run the FireBall out of available sockets. However, this is only a temporary issue as sockets are automatically returned to the pool. We understand your need to pressure test the RTI and FireBall interface but the FireBall and RTI interface should support 10 RTI panels without issue under normal use.
Regards,
Cory Poore