+2 votes
1.4k views

Has anyone else run into an issue of fetches freezing after the collaboration server has been running for a while? Specifically, the openlca fetch status bar will reach the end or close to the end and just stop. If we restart the server, openLCA will error out of the fetch and allow us to continue working. After the server restart, the fetch will work. If we leave the server up, then we need to use task manager to end the openlca process. I've tried just restarting the tomcat process after this happens and the error persists. I believe I've also tried stopping tomcat, restarting elasticsearch, and the starting tomcat again without fixing the problem.

It seems like we run into issues when memory usage reaches around 3GB - I have 7 GB allocated to it. Or maybe another way to say this is that I've never seen memory usage go above around 3GB.

-------

Updating (again on 10/17 to reflect what I think is the correct understanding of how this works): The behavior we're seeing for the fetches is that the openLCA client issues the GET to see the changes:

"GET /lca-collaboration/ws/public/fetch/request/netl/NETL_Starter_DB/?sync=false"

And then openLCA issues a POST to start the download:

"POST /lca-collaboration/ws/public/fetch/netl/NETL_Starter_DB/?download=false"

But the POST isn't reflected in the server log "localhost_access_log.yyyy-mm-dd.txt". This is only in the instances where the fetch freezes. Looking at the network traffic, it seems like the server actually responds by sending the data. There's a spike in network utilization, and the status bar takes longer on bigger databases.

On successful fetches, the POST is reflected in the server log.

11-Oct-2018 14:29:44.201 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version:        Apache Tomcat/8.5.31
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Apr 27 2018 20:24:25 UTC
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server number:         8.5.31.0
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Windows Server 2012 R2
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            6.3
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             C:\jdk1.8.0_172\jre
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_172-b11
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         C:\apache-tomcat-8.5.31
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_HOME:         C:\apache-tomcat-8.5.31
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.home=C:\apache-tomcat-8.5.31
11-Oct-2018 14:29:44.279 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.base=C:\apache-tomcat-8.5.31
11-Oct-2018 14:29:44.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dignore.endorsed.dirs=C:\apache-tomcat-8.5.31\endorsed
11-Oct-2018 14:29:44.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.io.tmpdir=C:\apache-tomcat-8.5.31\temp
11-Oct-2018 14:29:44.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
11-Oct-2018 14:29:44.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.config.file=C:\apache-tomcat-8.5.31\conf\logging.properties
11-Oct-2018 14:29:44.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Xms7168M
11-Oct-2018 14:29:44.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Xmx7168M
in LCA Collaboration Server by (5.3k points)
recategorized by
by (8.9k points)
At the end of the fetch process, the collaboration server indexes the data set through elastic search. It sounds like your elasticsearch instance seems to stop or hang (it has a different memory configuration, maybe elasticsearch runs out of memory). Can you check the elasticsearch service as well?
by (8.9k points)
My bad, I did think of the commit process (because I experienced the same issue while the commit finishes up, not the fetch).
by (8.9k points)
We don't experience issues with the fetch, but our server is regularly updated and sometimes restarted, maybe every once in a week or two. How long does your server run until you first experience this problem?
by (5.3k points)
I think in this instance it had only been a day before this happened, but we did have more traffic than usual. I guess usually it just takes a while for us to reach ~3GB memory usage on the server, it just happened to take far less time yesterday.

Elasticsearch is a bit harder to check. It allocates and uses all of the memory on start up (xms and xmx are both at 5GB). The logs look to be pretty benign.

It may also be worth mentioning that the collaboration server is the only thing our tomcat is running, so the memory usage is almost completely attributed to CS. What's the max memory usage you've seen out of CS?
by (8.9k points)
We are running an instance with 8GB for tomcat and 7GB for elasticsearch, because we ran into problems with committing bigger lci databases with 4GB and 3GB (but this was when committing). Also the tomcat takes up about 7.7GB of the memory
by (5.3k points)
Well, dang. We're not dealing with databases anywhere near that. I mean in most cases <50MB. You're using windows or linux?

I also suspect that the problem may be outside of the scope of CS and elasticsearch because we have to do a full server restart for it to start working again, but I can't even begin to figure out how I would troubleshoot that.
by (8.9k points)
moved by
We run it on Ubuntu 16.04.5 LTS. In general it is just tomcat and elasticsearch, since the database is embedded (derby).

I just remembered there was an issue with elasticsearch and the number of open file descriptors. I did not run into this problem ever again, but this might be worth looking into maybe. I'm not sure though if this is a linux specific issue.

https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
by (5.3k points)
This came up once before - it's not supposed to be a windows issue.
by (5.3k points)
edited by
<p>I made some changes to the Tomcat&nbsp;server.xml today that seemed to fix this. Specifically, adding "asyncTimeout" changed that from the default value of "30000", which is what seemed to fix the fetch issues we were having. Changing "connectionTimeout" from "20000" to "300000" helped out with some commit issues we've been running into the past couple of days as well. I also added "maxPostSize", "maxHeaderCount", "maxHttpHeaderSize", and "maxExtensionSize" so those are included below, but I don't think these are the fixes to our problems. Perhaps the GreenDelta team can confirm.&nbsp;</p><pre>&lt;Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="300000"
           asyncTimeout="300000"
           redirectPort="8443"
           address="0.0.0.0"
           maxPostSize="-1"
           maxHeaderCount="-1"
           maxHttpHeaderSize="32768"
           maxExtensionSize="-1"
/&gt;</pre>
by (8.9k points)
How long did your longest fetch/commit run approx.? We had successful commits of 20-30 minutes without changing any of these values:

 <Connector port="80" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="443"
               compression="on" compressionMinSize="2048"
               noCompressionUserAgents="gozilla, traviata"
               compressableMimeType="text/html,text/xml,text/plain,text/css,
                 text/javascript,text/json,application/x-javascript,
                 application/javascript,application/json"/>
     <Connector port="443" SSLEnabled="true"
               maxThreads="200" scheme="https" secure="true"
               keystoreFile="/path/to/keystore.jks" keystorePass="xxxxxxxxxxxx"
               clientAuth="false" sslProtocol="TLS"
               compression="on" compressionMinSize="2048"
               noCompressionUserAgents="gozilla, traviata"
               compressableMimeType="text/html,text/xml,text/plain,text/css,
                 text/javascript,text/json,application/x-javascript,
                 application/javascript,application/json"/>
by (5.3k points)
They weren't taking that long - for the most part under a couple minutes. I think the longest has been about 10 mins. I had to restart the server this morning. The compression parts of your server are interesting. I'll try plugging those in as well.

I think I'll add some more details in the original description to provide some more details about what's happening.
by (5.3k points)
Updated the original post to provide some more details on at least the symptoms of what's going on. Today I completely opened up port 8080, rather than just the application-specific firewall rule in windows for tomcat.exe.

Please log in or register to answer this question.

...