one of my channels is not working - layout manager in a pickle

We finally managed to launch our version (made out of combining four variants) of "Portal Classified Adverts" channel.

(Most credit goes to edu.iastate.ait.channels.classifieds and Saskatchewan - of course we made some local modifications too.)

After "understanding" how to manage permissions and push things through on a Parallel deployed system, the channel went onto three layouts (student-lo, faculty-lo, employee-lo). Then reports were received about the channel not working. Thinking this was related to permissions, the code was modified to say "if doesn't have permission" then send a message to the log file and grant permission. (Once everything is fixed and we need to remove permissions this will be changed back).

Then we discover that several users cannot use the channel. All of these inherit the employee-lo layout. Yesterday I found an entry in cp.log that I think is causing all of my woes.

(similar to error in http://www.lumdev.net/node/856)

[2008-04-24 23:37:40,495] [ERROR] WebServlet [org.jasig.portal.ChannelManager]: u12l1n538: failed to contruct channel - error code: 2
org.jasig.portal.PortalException: Could not find an infrastructure node for id: u12l1n538
at org.jasig.portal.layout.InfrastructureUserLayoutManagerWrapper.getInfrastructureNode(InfrastructureUserLayoutManagerWrapper.java:382)
at org.jasig.portal.layout.InfrastructureUserLayoutManagerWrapper.getNode(InfrastructureUserLayoutManagerWrapper.java:169)
at org.jasig.portal.ChannelManager.instantiateChannel(ChannelManager.java:814)
at org.jasig.portal.ChannelManager.getChannelInstance(ChannelManager.java:1284)
at org.jasig.portal.ChannelManager.processRequestChannelParameters(ChannelManager.java:1145)
at org.jasig.portal.ChannelManager.startRenderingCycle(ChannelManager.java:1399)
at org.jasig.portal.UserInstance.renderState(UserInstance.java:478)
at org.jasig.portal.UserInstance.writeContent(UserInstance.java:302)
at org.jasig.portal.PortalSessionManager.doGet(PortalSessionManager.java:277)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)
at com.pipeline.uportal.CpPortalSessionManager.service(CpPortalSessionManager.java:150)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at com.iplanet.server.http.servlet.NSServletRunner.invokeServletService(NSServletRunner.java:952)
at com.iplanet.server.http.servlet.WebApplication.service(WebApplication.java:1094)
at com.iplanet.server.http.servlet.NSServletRunner.ServiceWebApp(NSServletRunner.java:1023)

0
No votes yet

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

more about the error

So, the error basically says:

when the iPlanet webapp wrapper (where Luminis is the webapp) receives some parameters in the GET request, it knows to pass them on to the Luminis channel.

(i.e. lines in stack trace
at ChannelManager.getChannelInstance(ChannelManager.java:1284)
at processRequestChannelParameters(ChannelManager.java:1145)

but when it tries to pass them to the correct channel (Classified Adverts)
the
InfrastructureUserLayoutManagerWrapper.getInfrastructureNode
does not quite get to the right channel instance...

org.jasig.portal.PortalException:
Could not find an infrastructure node for id: u12l1n538

Looking in the UP database this becomes clearer

select * from up_layout_struct where user_id=12 and struct_id=538;
(we always have l1 at the moment)

this shows chan_id = 911
which is indeed Portal Classified Adverts
select * from UP_CHANNEL where CHAN_ID=911;

BUT
select * from up_user where user_id=12;
shows "student-lo", not "employee-lo"

The user that triggers the exception does not load any channels from u12, but does from u14

the user experience of the broken channel

I meant to say how this manifests itself to our humble users

employee -
sees a List of adverts, thinks that they would like to visit page 2, the Portal ignores their click.
Decides to change setting to "50 ads per page", the Portal ignores their selection
Decides to click an ad to see if any part of the channel works, the Portal ignores their click.
User assumes that it is completely broken and never visits it again

student -
Portal responds to every click and they can advertise wonderful things.

Derek
p.s. Hopefully I will write up the experience of "hacking Classifieds", although it will not be a good as "Pimp my Portal"

Linked List problem

From your description, it sounds like a problem in the node that is *pointing* to u12l1n538, which would likely be the channel just above the classifieds on the employee-lo.

To fix this, you can try going in to the employee-lo fragment and deleting all of the channels on the tab, and re-adding them. However, you may have to use the layout resetter channel on employee-lo to blow away all the layout entries and then re-create the tab.

If' you're really masochistic, you could select * from up_layout_struct where user_id = 14 and trace through the linked list by hand and fix things up manually. :)

Todd

tracing things manually

Todd,

Of course I had already looked at the print out from user_id=14

There is nothing suspect there at all (that I can see)!

Derek

even more about the error

After a weekend (and two Portal restarts later), I decided to have a look from a fresh angle.

The links (href) in the HTML generated for the Channel differ:

So, we have a couple of parts
1) the channel title header with the icons (since this is maximised channel view, there are only Help and About icons)
2) the channel content - cut down to be just indicating a single link

In 1)
<TABLE id="channel" cellspacing="0" cellpadding="0" border="0" width="100%">
<TR>
<TD width="100%"><span id="section_head_txt" class="uportal-head24">Portal Classified Adverts</span></TD>
<TD nowrap="" valign="BOTTOM" align="RIGHT"><a onmouseover="window.status=''; return true;" href="tag.40153226fb343f6e.render.userLayoutRootNode.uP?uP_help_target=u14l1n364"><img border="0" height="16" width="16" src="help.gif" title="help" alt="help"></a><a href="target=u14l1n364">"about"</a></TD>
</TR>
<TR><TD colspan="2">we have a couple of spacer rows transparent.gif</TD></TR>
<script>checkParam();</script>
</TABLE>

all of the actions are targeted at u14l1n364 (the correct channel)

In 2)
<table cellspacing="0" cellpadding="0" border="0" width="100%">
<tr>
<td class="uportal-channel-text">
<form xmlns:locale="java.util.Locale"
xmlns:bundle="edu.iastate.ait.channels.JarPropertyResourceBundle"
method="post"
action="tag.40153226fb343f6e.render.userLayoutRootNode.target.u12l1n538.uP?action=list"
name="header">
</form>
</td>
</tr>
</table>

all of the actions are targeted at u12l1n538 (a random reference to a different user student-lo)

** ** **
Does this mean that the baseActionURL has been cached, or even the whole channel content (the XHTML resulting from the content XML and transformed by default.xsl)?

If so, how can I clear it?

Derek

Baseline

Have you tried putting a more "baseline" channel in place, one that hasn't had all of your local changes made to it?

Todd

ermm...

The problem with this is:

The features we have changed are crucial (particularly an enforcement of "I have read the terms and conditions")

Two other template users (student-lo, faculty-lo) are correctly giving the mainstay of our Portal community a working version of the channel.

Every fix with Luminis seems to be "wait until a problem shows up, and then take a short cut workaround." We would prefer an understanding (actually we would prefer a tool that could detect/diagnose) of why this area has gone wrong, and aim to avoid it in future.

The main problem is that we do not have another Luminis instance with the same fault - so all of the experimentation is having to be on our parallel deployed Live Portal.

(Of course, if someone can suggest how to migrate the broken layout to a different Luminis instance - then perhaps I would be laughing)

The other thing I have not done very much of yet is:
looking at the uPortal documentation and working out if the XHTML output for the channel is cached in a file/database somewhere (and the timeout on the cache is corrupt?)

Derek

how do I fix this

How do I fix this

a) to remove the immediate problem with one channel
b) to resolve any database integrity issues

My thoughts are:

Can I force a reload of the employee-lo (it must be cached in memory or some scratch space somewhere)? The up_layout_struct entry for employee-lo is correct. How do I force the reload?

Can I think forget it, and rebuild the tab where the channel appears? and would this solve the problem?

What is the best way of checking where the "actual layout loaded by the layout manager" and the "layout implied by the database" differ?

Derek

what happens when the output gets cached

Hang on a minute - have I overlooked something?

When I subscribe to the channel (as a user with the problem role)

the bad entry still exists.

so, the conclusion is that

when the channel called it renderXML method, for some reason the
baseActionURL was wrongly set which meant that
stateParameters like baseActionURL + actual contentXML + XSLT =
renderedXHTML fragment

does not produce acurate renderedXHTML fragment.

Which means.... either
renderedXHTML is cached, and I need a method to clear the cache
baseActionURL value is cached - which I expect subscribing to a new channel instance to clear the cache
the XSLT is being confused - maybe it is not the file I expect, or it becomes optimised?

Does any of this sound reasonable?

My next step is to destroy and recreate a tab, but fortunately I have forgotten a password which I need to do so!

Derek

solution found (new bug found too)

The crux of the problem was that the Channel implments
ICacheable (actual it implements IMultithreadedCacheable)

if the key, or the KeyScope, are completely wrong (well slightly
wrong) - then the channel cache is used even though the
render(ed)XML contains the baseActionURL from the first
call. This baseActionURL is determined by the channel in
the student-lo layout, which means the channel works fine
for all students - but it does not work fine for anyone that
picks up employee-lo layout.

Making a few changes to the generateKey() method sorted this
out.

Now I just have to see whether the problem with two adverts
being posted (one from the posting user, and one from a user
posting a different ad) is also caused by bad use of
the cacheing interface, or by some other difficult to trace
factor.

cf SQL
select tb1.id, tb1.category, tb1.title, tb1.description, tb1.userid, tb1.contact, tb1.price,
tb2.id, tb2.userid, tb2.contact, tb2.price
from classifieds_ads tb1, classifieds_ads tb2
where not(tb1.id = tb2.id) and
tb2.title=tb1.title and
tb2.price=tb1.price;

Derek