I have been looking into this issue a few days now and I am at a loss here.
We have deployed an applications to a glassfish 3.1.2.2 (build 5) cluster, which uses jax-ws to publish a soap endpoint. Most of the traffic involves uploading files via the soap interface.
After a few weeks, all instances of the cluster crashed with
java.io.FileNotFoundException: [filename] (Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.
running lsof with the glassfish pid shows a lot of entries of these two types:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 24502 user 749w FIFO 0,8 0t0 12559690 pipe
java 24502 user 750u 0000 0,9 0 1972 anon_inode
after restarting glassfish, there are about 1200 file handles open.
after a few days, this increases to about 4000. about 2000 of them are pipes. about 1000 are anon_inodes. This seems to only increase slowly, though there are a lot of request made.
ulimit -Hn shows a limit of 8192 files
I have been trying to reproduce this issue with a local installation, but haven't been able to do so. Everything was running fine - no open file handles. It looks like the open files only leak every now and then. I haven't been able to find any pattern so far.
I have been requesting heap dumps from the VM and analyzing them with VisualVM. I have noticed that there are a lot of streams, that do not have any references any more.
For instance, I have about 2000 DataOutputStreams and about 2000 java.io.File. I am wondering why these don't seem to be cleaned up by the GC. The GC log shows a full GC about every few minutes.
I have about 300 instances of FileOutputStream, however there are about 700 java.io.FileDescriptor, which seems a little high. But, a lot of them again don't have any references.
We have reviewed the code, but haven't found any critical issues. FindBug does not report any streams that are not closed as well.
Does anyone have any ideas how I could further narrow down the cause of this issue or what could be the happening here?