Discussion:
JM does not see wotaskd-after some time
Ondřej Čada
2010-03-05 15:07:28 UTC
Permalink
Hello there,

I've got a weird problem. We have a pretty plain WO installation in a 10.6 Server: installed through WOInstaller.jar, replaced Apache adaptor by the 64-bit one from Wonder, added the launchd plist for wotaskd, updated the Apache config, yadda yadda. Installed four-odd applications, most one instance, one of them two instances.

Thing is: for about a day all works perfectly.

Then, JavaMonitor stops seeing wotaskd ("Failed to contact localhost-1085").

At about the same time (and quite probably for the same reason) the two-instance application starts behaving a bit weird; sometimes, I can't log in at all, sometimes, one instance never gets a request, all of them are directed to the other -- even if I try the "server/cgi-bin/WO/app/1" URL, I get re-directed to ".../2". Indeed, as seen in the instance logs, the instance 2 does all the work (instance 1 does run though -- there is a couple of WOTimer-launched internal actions there, and they tick all right all the time).

Now for the really weird thing: wotaskd, does run and is accessible. If I switch JM to the Hosts page (where one host, "localhost", is configured), _IT REPORTS AVAILABLE: YES_! (And clicking YES I do get the configuration all right in a new window.) Yet, switching back to Applications and clicking "Detail View", I get again "Failed to contact localhost-1085". Can be repeated. I'll be damned :-O

All the logs look OK, about the only thing which seems related is that the wotaskd log contains a few items of kind

[2010-03-05 15:02:58 CET] <WorkerThread10> <WOWorkerThread id=10 socket=Socket[addr=/127.0.0.1,port=50809,localport=1085]> Exception while sending response: java.net.SocketException: Broken pipe

Nevertheless, there's not many of them -- definitely such a report does NOT occur anytime JavaMonitor tries to connect to wotaskd and fails; the log occurs only occassionally.

The one cure I've found so far is ugly: to reboot the server. In that case, all runs perfectly -- for about a day again, when the problems are back. Note: they seem to pop up rather in a fixed time, than being based on uptime; this weakly hints the problem might be rather related to some timed task on the server which fouls something, than to a buffer/cache/whatever overload -- although the latter is definitely possible, too.

I'd be pretty glad for any hint; at this moment I do not really know what to do :(

Thanks a lot,
---
Ondra Čada
OCSoftware: ***@ocs.cz http://www.ocs.cz
private ***@ocs.cz http://www.ocs.cz/oc

Loading...