The Java application (on a Windows server) is designed to receive requests embedded in XML and translate the requests which will result in executing SQL on iSeriese IBM Machines. This is a multi-threaded application that processes such requests usually in high volumes. It works fine in all environments, however, all of a sudden, we started seeing the following pattern with an error that caused the process to break:
- Receive request.
- Translate it to SQL.
- Open connecting to DB or use the existing one if available.
- Execute the SQL, and get the results.
- Commit the process.
- Close the connection.
- Sometimes, randomly, we see this error:
java.sql.SQLNonTransientConnectionException: The connection does not exist
In the logs, we are seeing the above pattern nearly consistently, and when I researched the error, most are saying that we need to add validationQuery
to the connection pooling definition or the connection string. The purpose of this is to help with troubleshooting which could take some time to find the root cause, in addition, I need to find a reasonable explanation for what might be causing this error, for example as follows:
Multiple threads are running in parallel, and the code is closing the connection, so, it could be that due to timing issues, the JDBC engine tries to use a connection that was just closed by another thread, and removed from the pool.
Consider this scenario: The threads T1, T2, T3...Tn all are using the connection and they all succeed but a few attempts somewhere in the middle between 1 and n. The thread Ty, for example, failed to close the connection, maybe because when it started, the DB Connection was reused, and the SQL was executed successfully.
However, by the time Ty tried to close the connection, it was not found maybe it was removed after the previous thread Tx finished processing the SQL and closed/removed the connection due to the way it was opened using a valid combination of any of the following parameters validationQuery, minEvictableIdleTimeMillis, validationQueryTimeout, validationInterval, maxAge, testOnBorrow,testOnReturn, testWhileIdle, validationInterval
.
Then, thread Tz succeeds in processing the SQL because the process starts from the beginning by opening a new connection.
I appreciate your help by providing your feedback on my explanation above and providing any additional information to troubleshoot this issue.
Update 1:
I am updating this question as still I could not find a solution. I need help with the following which I am researching:
How to enable debug to see more details? The debug needs to be enabled on both Windows Machines where the Java client is running and IBM i machine where the SQL DB is hosted.
How to dump all effective SQL Connection Parameters used after running the process? I am thinking of dumping the values from the IBM i side and on the Java side.
I am thinking of catching this exception and ending the process safely. Are there any issues with this approach?
I analyzed the logs, for about 10 transactions, in detail, and found out the exception will happen when the duration between the start and the error is 19000 ms. And in normal cases when there is no error, the duration from start to commit is less than 40 ms.
Following is the stack trace dump:
java.sql.SQLNonTransientConnectionException: The connection does not exist. at com...SQLTransactionProcessor.rollback(***SQLTransactionProcessor.java:642) at com...***RequestProcessorImpl$TransactionProcessorTracker.releasePooledProcessors(***RequestProcessorImpl.java:812) at com...***RequestProcessorImpl.processRequest(***RequestProcessorImpl.java:225) at com...***ServerImpl.submit(***ServerImpl.java:203) at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$2.run(Unknown Source) at sun.rmi.transport.Transport$2.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
Update 2:
More analysis is done, and it is believed that the SQL Execution is exceeding a set threshold, the process is killed, and the connection is removed.
I compared the Java code with the logs and it's clear the error is from this statement:
rsResultSet = stmtPreparedStatement.executeQuery();
The above statement will crash, but the problem is that the Java code doesn't print the error that occurred in the catch block. It will proceed to close the connection then we see the errors mentioned above.
Usually, successful transactions will be completed in less than 1 second. For failed transactions, the process will complete with the above error in more than 5 or 19 seconds.
I am thinking of turning on a higher level of debugging to see why the SQL Execution Process is being killed and connection the connection removed. How we can check what is going on? Do we need to do anything on IBM i SQL? Is there a way to turn on debugging on IMB i SQL or in JVM where the connection is created? How?