You are here

Jmeter Load Testing Luminis 4 - 503 errors (too many outstanding requests)

Submitted by jwade on Wed, 03/23/2011 - 10:07

We are finally getting around to doing load testing our Luminis 4 environment after having a problem with the start of registration every semester. I am using Jmeter 2.4. I have a full script that mimics what a user does during peak registration, it logs into the portal, (downloading all the graphics, css files, etc.) then switches to our registration tab, calls the cpip URL to single signon to banner, navigates to a couple of menus in banner, retrieves the users current registrations and logs out of luminis. I have some response assertions checking for failed logins, etc. At low levels of activity, the script works well and returns expected results. As I increase the load, it quickly starts getting 503 errors from Luminis:

Apache Tomcat/5.5.27 - Error report
HTTP Status 503 - You have too many outstanding requests - wait for them to complete and try again

It seems that Tomcat is cutting me off. Has anyone successfully load tested. Do I have to run distributed tests from many machines?

Any work arounds?

Luminis Version:

Comments

I have been investigating this error recently. I don't think the "you" in the error statement has anything to do with the person reading the error. I think it means the Tomcat server has too many open connections and will not serve anymore page requests until some of them close.

One thing you can do is to up the maximum threads allowed in ${CP_ROOT}/products/tomcat/tomcat-cp/conf/server.xml for both the HTTP and HTTPS servers. I currently have mine set to 300. But this won't permanently fix the problem; it will merely delay it.

Would you be open to posting your JMeter script? I have been meaning to write one.

Luminis will try and prevent someone from abusing the system by serving up 503 pages if it thinks it's being DDoS.

I believe there are two different configman settings for authenticated vs unauthenticated users. Look for:

configman -g server.policy.maxrequests.user.limit
and
configman -g server.policy.maxrequests.guest.limit

The default settings are pretty low. Try setting the user.limit to 100 and the guest.limit to 50. Does your script take into account any sleep time, ie waiting in between clicks? You might need to slow the load generator with some sleep times to simulate a real user.

Ah, that explains it - we do not have these limits set. And since our portal is still allowing people to log in, a non-existing value must mean "unlimited" as opposed to "none".

Todd

We were having problems with 503s caused by too many simultaneous logins.

We used jMeter to test what happened as we varied the settings.

i.e. guest is how many http requests Luminis will handle simultaneously (per Portal web tier) for unauthenticated users.

the user is for authenticated session traffic.

couple this with the max simultaneous logins (defaults to 15), which is in the admin tool chest (system usage), and at 11am you can easily go beyond the built in throttle.

We prefer to set these too high (higher than SGHE recommends) and let our Netscaler take care of DoS attacks.

Derek
University of Leeds, UK

Thanks!!

This is what I thought, but could not find. I will boost up the settings in our test system and see what happens. We are trying to debug performance issues in Single Signon from Luminis to Banner and I need to get Luminis to allow me to test.

I had bumped up the Max concurrent logins, but had not found this setting.

Thanks again!

What version are you running? Are you retrieving user attributes from Banner? We recently performed some load testing on our portal with SunGard and found that pulling attributes from Banner was causing a bottleneck in our login process. Since the process slowed down it took longer for users to be authenticated and the ones 'in line' were getting the 503 error. We are currently on 4.2.0 and SunGard support identified a fix in 4.2.1 (I think) that provides some caching for the personDirectory lookups. We're preparing to upgrade to 4.3.0 and then plan on re-testing. I was hoping to find some people using JMeter because I'd like to benchmark the process as much as I can along the way without a formal load test.

Tom
https://link.jwu.edu
Johnson and Wales University

I am keen to find out where the bottleneck and latency comes from within the authentication process.

I had not properly taken into account the personDirectory lookups.

How did you work out it was these lookups as opposed to the call to Luminis LDAP/Active Directory?

We have got a simple jMeter test script for spotting the increase in latency caused by too many simulatneous logins, but cannot think of a neat way of checking the timing of each individual step.

(Actually I can think of quite an exciting way to do this... involving a rewrite of the auth code split into multiple URL calls with timing checks added so that parallel processes can be compared)

Derek
University of Leeds, UK

Derek,

We don't know exactly the attribute lookup is the issue because as you said we cannot measure the timing of each part of the authentication process separately with ease.

The way we determined that was very likely the bottleneck is through a full analysis of our server resources and the traffic generated during the load test. We saw the login times slow to a crawl and the load increase heavily on our portal tier while our app and database tiers remained low. We also saw a slight load increase during the same time on our Banner server as well which we believe were due to the attribute lookups.

As I mentioned before we performed this testing as an engagement with SunGard and we both reviewed the logs and didn't find any errors that would indicate a problem with the authentication process that might be slowing things. SunGard also pointed out that in a newer minor version of 4.2.* there was a fix for the personDirectory calls to add caching. (I believe it was 4.2.1) They highly recommended upgrading versions so we are looking to go to 4.3.0.

I still need to develop our jMeter scripts (I haven't used it yet) but I can think of a way that you could test the theory with your setup. If you have a test instance of your portal you could run the load test against it as-is with the personDirectory lookup enabled and get a benchmark. Then you can remove the attribute lookup call from the configuration so that step just doesn't happen and then run the same test again. If there is in fact a problem with the lookup in your installation I would think it should be somewhat obvious.

I would be very interested in talking about jMeter some time. Hopefully I can get around to it soon but my focus now is the upgrade.

Tom
https://link.jwu.edu
Johnson and Wales University

When I have the chance I will run jMeter to produce these metrics. Might even go further and build up a whole matrix of timings...

login with all personDs, login with none, with just A, with just B, with just C

against ramping up for 20 users over 5s, 40 users over 5s, up to 200 say, with a silly outlier of 1000 over 5s

Might even remember to report the matrix back here.

Derek
Univeristy of Leeds, UK