Troubleshooting Out of Memory issues in Oracle Fusion Middleware applications requires planning and patience. It is similar to looking for a needle in a haystack.  The approach should be to monitor, gather data and then analyze the data to determine the root cause. Out of  Memory can be caused by the code that is deployed on to the middleware infrastructure,  or there may be an underlying issue with the Fusion Middleware infrastructure (Framework/Cache issues). This post outlines the approach to understanding the problem and identifying the root cause for the OOM issues along with commands that can assist with the data gathering and analysis of OOM.
There are different types of Out of Memory issues. The following are some of the most common types:
- java.lang.OutOfMemoryError: requested XXXX bytes for Chunk::new. Out of swap space?
- java.lang.OutOfMemoryError: unable to create new native thread
- java.lang.OutOfMemoryError: PermGen space
- java.lang.OutOfMemoryError: allocLargeObjectOrArray
- java.lang.OutOfMemoryError: getNewTla
- java.lang.OutOfMemoryError: Java Heap Space
- java.lang.StackOverflowError
1. Determine if your managed servers is reporting OOM:
Go to the managed servers location and execute find command to locate files with OOM exceptions logged
cd /Oracle/Middleware/user_projects/domains/mydomain/servers find ./ -type f -exec grep -l java.lang.OutOfMemoryError {} \;
An entry from a .log or .out file, is an indication that your managed server(s) is/are reporting OOM. From the output, see which managed server is reporting OOM exceptions. For example, from the following output you can determine that WLS_Spaces1 is reporting OOM:
./WLS_Spaces1/adr/diag/ofm/webcenter_domain/WLS_Spaces1/alert/log.xml ./WLS_Spaces1/logs/WLS_Spaces1.out00010
Go to the log location of that managed server and execute find and print the line
cd WLS_Spaces1/logs find ./ -type f -exec grep java.lang.OutOfMemoryError {} \;
Note: Your log location may be different based on your settings
From the output look for the lines that immediately follow the text java.lang.OutOfMemoryError. Pipe your output to more for convenience if required.
For example the following output suggests that this managed server is experiencing allocLargeObjectOrArray and getNewTla OOM exceptions:
Exception in thread "Thread-665" java.lang.OutOfMemoryError: allocLargeObjectOrArray: [B, size 8208 Exception in thread "Thread-664" java.lang.OutOfMemoryError: allocLargeObjectOrArray: [B, size 8208 Exception in thread "WsMgmtWorkScheduler" java.lang.OutOfMemoryError: getNewTla
2. Collect Thread and Heap Dumps
Thread Dumps:
Thread dump is a snapshot of the state of all the threads that are in the process. To collect a thread dump, execute the following command every 5 seconds for 8 to 10 times:
>kill -3 <PID>
Dumps will be written to the <Server_Name>.out file in the defined log directory (Ex: /Oracle/Middleware/user_projects/domains/mydomain/servers/myserver/logs).
OR
>. ./setDomainEnv.sh >jrcmd <pid> print_threads >tdump.txt
Heap Dumps:
Heap dump is a snapshot of the memory of a Java process at a certain point of time. To collect heap dumps execute the following command once.
>/<Jrockit_home>/bin/jrcmd <PID> hprofdump filename=heapdump.hprof
Dump file will be written to the domain home (Ex: /Oracle/Middleware/user_projects/domains/mydomain/)
GC Logs:
Enable GC tracing in the application server by including the following as a java startup parameter:
-Xverbose:gc -Xverbosetimestamp -Xverboselog:/<dir_name>/<file_name>
3. Analyze dumpsÂ
Analyzing Thread Dumps:
When your managed server is experiencing slow response or showing a health warning in the WebLogic admin console, run top to determine the process that is consuming high resource.
On the host that is running the managed server run top:
top top - 16:26:55 up 584 days, 5:57, 4 users, load average: 4.96, 5.08, 5.10 Tasks: 167 total,  2 running, 165 sleeping,  0 stopped,  0 zombie Cpu(s): 86.0%us, 0.5%sy, 13.5%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16436932k total, 15797128k used,  639804k free,  166504k buffers Swap: 12289684k total, 1662108k used, 10627576k free, 1229076k cached PID -- USER     PR NI VIRT RES SHR S %CPU %MEM   TIME+ COMMAND 29381 oracle   15  0 4249m 3.0g 32m S 171.8 19.0 190:12.57 java 21162 oracle   18  0 3499m 2.2g 4240 S 10.0 14.0 123:06.66 java 21139 oracle   18  0 3450m 2.1g 7984 S 9.3 13.6 109:11.02 java 21365 oracle   20  0 3196m 1.7g 16m S 7.0 10.9 80:56.99 java
Note the PID of the process that is consuming high resources. In the above top output PID 29381 is our guy. Type q to quit out of top.
The JVM for Linux implements Java threads as native threads, which results in each thread being a separate Linux process. To determine the thread process that is consuming high resource, run top –H –b –p <PID>
top -H -b -p 29381 >tophbp.txt CTRL+C vi tophbp.txt top - 16:25:21 up 584 days, 5:55, 4 users, load average: 4.96, 5.17, 5.14 Tasks: 129 total,  3 running, 126 sleeping,  0 stopped,  0 zombie Cpu(s): 2.5%us, 0.4%sy, 9.8%ni, 86.1%id, 0.9%wa, 0.1%hi,  0.2%si, 0.0%st Mem: 16436932k total, 15794704k used,  642228k free,  165840k buffers Swap: 12289684k total, 1662108k used, 10627576k free, 1228912k cached PID -- USER     PR NI VIRT RES SHR S %CPU %MEM   TIME+ COMMAND 29876 oracle   25  0 4248m 3.0g 32m R 77.0 19.0 61:41.25 [STANDBY] Execu 29389 oracle   16  0 4248m 3.0g 32m R 55.3 19.0 24:15.73 (Code Optimizat 29449 oracle   18  0 4248m 3.0g 32m R 43.4 19.0 61:51.18 [STANDBY] Execu 29381 oracle   15  0 4248m 3.0g 32m S 0.0 19.0  0:00.00 java
Note the thread process id that is consuming high resource. In the above top output PID 29876 should be our interest.
Export domain variables and collect thread dumps for the managed servers java process (use jcmd command if you are using SUN Java)
cd /Oracle/Middleware/user_projects/domains/mydomain/bin . ./setDomainEnv.sh cd ~/ jrcmd 29381 print_threads >tdump.txt
Examine the thread dump file to look for the thread process id
vi tdump.txt "[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" id=134 idx=0x200 tid=29876 prio=1 alive, native_blocked, daemon at java/util/HashMap.buildCache(HashMap.java:589) at java/util/HashMap.resize(HashMap.java:576)
Note: If you are using SUN Java you need to convert the thread pid to hex using the following command. If you are using JRockit this is not required
printf "%x \n" <thread_pid>
In the above example, it was HashMap.buildCache that was causing issues. However if you have a faulty code that is leaking memory, you now have a clue to the same. You can now go and examine the referenced code in JDeveloper to further investigate. When you open a Support Ticket with Oracle Support, providing the Heap and Thread Dumps along with the top output, would further assist the support engineers in determining the root cause of the issue.
Thread Deadlock is another situation contributing to OOM. If you suspect a deadlock, you can feed your thread dump to a Thread Dump Analyzer such as Samurai or TDA which can identify the deadlock.
Analyzing Heap Dumps:
Use MTA to analyze your heap dumps to determine any leak suspects. You can select the Run Leak Suspect report to determine which object is consuming high heap memory.
Here are some more nice articles that explain OOM:
When your application is running with some error condition, it can have a big impact on performance. For example, if a connection to WebCenter Content Server becomes intermittent or not accessible, pages with content related components respond very slowing while attempting to connect and eventually may time out.