Troubleshooting Out of Memory issues in Oracle Fusion Middleware applications requires planning and patience. It is similar to looking for a needle in a haystack.  The approach should be to monitor, gather data and then analyze the data to determine the root cause. Out of  Memory can be caused by the code that is deployed on to the middleware infrastructure,  or there may be an underlying issue with the Fusion Middleware infrastructure (Framework/Cache issues). This post outlines the approach to understanding the problem and identifying the root cause for the OOM issues along with commands that can assist with the data gathering and analysis of OOM.

There are different types of Out of Memory issues. The following are some of the most common types:

  1. java.lang.OutOfMemoryError: requested XXXX bytes for Chunk::new. Out of swap space?
  2. java.lang.OutOfMemoryError: unable to create new native thread
  3. java.lang.OutOfMemoryError: PermGen space
  4. java.lang.OutOfMemoryError: allocLargeObjectOrArray
  5. java.lang.OutOfMemoryError: getNewTla
  6. java.lang.OutOfMemoryError: Java Heap Space
  7. java.lang.StackOverflowError

1. Determine if your managed servers is reporting OOM:

Go to the managed servers location and execute find command to locate files with OOM exceptions logged


cd /Oracle/Middleware/user_projects/domains/mydomain/servers
find ./ -type f -exec grep -l java.lang.OutOfMemoryError {} \;

An entry from a .log or .out file, is an indication that your managed server(s) is/are reporting OOM. From the output, see which managed server is reporting OOM exceptions. For example, from the following output you can determine that WLS_Spaces1 is reporting OOM:

./WLS_Spaces1/adr/diag/ofm/webcenter_domain/WLS_Spaces1/alert/log.xml
./WLS_Spaces1/logs/WLS_Spaces1.out00010

Go to the log location of that managed server and execute find and print the line


cd WLS_Spaces1/logs
find ./ -type f -exec grep java.lang.OutOfMemoryError {} \;

Note: Your log location may be different based on your settings

From the output look for the lines that immediately follow the text java.lang.OutOfMemoryError. Pipe your output to more for convenience if required.

For example the following output suggests that this managed server is experiencing allocLargeObjectOrArray and getNewTla OOM exceptions:

Exception in thread "Thread-665" java.lang.OutOfMemoryError: allocLargeObjectOrArray: [B, size 8208
Exception in thread "Thread-664" java.lang.OutOfMemoryError: allocLargeObjectOrArray: [B, size 8208
Exception in thread "WsMgmtWorkScheduler" java.lang.OutOfMemoryError: getNewTla

2. Collect Thread and Heap Dumps

Thread Dumps:

Thread dump is a snapshot of the state of all the threads that are in the process. To collect a thread dump, execute the following command every 5 seconds for 8 to 10 times:

>kill -3 <PID>

Dumps will be written to the <Server_Name>.out file in the defined log directory (Ex: /Oracle/Middleware/user_projects/domains/mydomain/servers/myserver/logs).

OR

>. ./setDomainEnv.sh
>jrcmd <pid> print_threads >tdump.txt

Heap Dumps:

Heap dump is a snapshot of the memory of a Java process at a certain point of time. To collect heap dumps execute the following command once.

>/<Jrockit_home>/bin/jrcmd <PID> hprofdump filename=heapdump.hprof

Dump file will be written to the domain home (Ex: /Oracle/Middleware/user_projects/domains/mydomain/)

GC Logs:

Enable GC tracing in the application server by including the following as a java startup parameter:

-Xverbose:gc -Xverbosetimestamp -Xverboselog:/<dir_name>/<file_name>

3. Analyze dumps 

Analyzing Thread Dumps:

When your managed server is experiencing slow response or showing a health warning in the WebLogic admin console, run top to determine the process that is consuming high resource.

On the host that is running the managed server run top:

top
top - 16:26:55 up 584 days,  5:57,  4 users,  load average: 4.96, 5.08, 5.10
Tasks: 167 total,   2 running, 165 sleeping,   0 stopped,   0 zombie
Cpu(s): 86.0%us,  0.5%sy, 13.5%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16436932k total, 15797128k used,   639804k free,   166504k buffers
Swap: 12289684k total,  1662108k used, 10627576k free,  1229076k cached
PID -- USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29381 oracle    15   0 4249m 3.0g  32m S 171.8 19.0 190:12.57 java
21162 oracle    18   0 3499m 2.2g 4240 S 10.0 14.0 123:06.66 java
21139 oracle    18   0 3450m 2.1g 7984 S  9.3 13.6 109:11.02 java
21365 oracle    20   0 3196m 1.7g  16m S  7.0 10.9  80:56.99 java

Note the PID of the process that is consuming high resources. In the above top output PID 29381 is our guy. Type q to quit out of top.

The JVM for Linux implements Java threads as native threads, which results in each thread being a separate Linux process. To determine the thread process that is consuming high resource, run top –H –b –p <PID>

top -H -b -p 29381 >tophbp.txt
CTRL+C

vi tophbp.txt
top - 16:25:21 up 584 days,  5:55,  4 users,  load average: 4.96, 5.17, 5.14
Tasks: 129 total,   3 running, 126 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.5%us,  0.4%sy,  9.8%ni, 86.1%id,  0.9%wa,  0.1%hi,  0.2%si,  0.0%st
Mem:  16436932k total, 15794704k used,   642228k free,   165840k buffers
Swap: 12289684k total,  1662108k used, 10627576k free,  1228912k cached
PID -- USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29876 oracle    25   0 4248m 3.0g  32m R 77.0 19.0  61:41.25 [STANDBY] Execu
29389 oracle    16   0 4248m 3.0g  32m R 55.3 19.0  24:15.73 (Code Optimizat
29449 oracle    18   0 4248m 3.0g  32m R 43.4 19.0  61:51.18 [STANDBY] Execu
29381 oracle    15   0 4248m 3.0g  32m S  0.0 19.0   0:00.00 java

Note the thread process id that is consuming high resource. In the above top output PID 29876 should be our interest.

Export domain variables and collect thread dumps for the managed servers java process (use jcmd command if you are using SUN Java)


cd /Oracle/Middleware/user_projects/domains/mydomain/bin
. ./setDomainEnv.sh
cd ~/
jrcmd 29381 print_threads >tdump.txt

Examine the thread dump file to look for the thread process id

vi tdump.txt
"[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" id=134 
idx=0x200 tid=29876 prio=1 alive, native_blocked, daemon
at java/util/HashMap.buildCache(HashMap.java:589)
at java/util/HashMap.resize(HashMap.java:576)

Note: If you are using SUN Java you need to convert the thread pid to hex using the following command. If you are using JRockit this is not required

printf "%x \n" <thread_pid>

In the above example, it was HashMap.buildCache that was causing issues. However if you have a faulty code that is leaking memory, you now have a clue to the same. You can now go and examine the referenced code in JDeveloper to further investigate. When you open a Support Ticket with Oracle Support, providing the Heap and Thread Dumps along with the top output, would further assist the support engineers in determining the root cause of the issue.

Thread Deadlock is another situation contributing to OOM. If you suspect a deadlock, you can feed your thread dump to a Thread Dump Analyzer such as Samurai or TDA which can identify the deadlock.

Analyzing Heap Dumps:

Use MTA to analyze your heap dumps to determine any leak suspects. You can select the Run Leak Suspect report to determine which object is consuming high heap memory.

MTA_LeakSuspect

Here are some more nice articles that explain OOM: