Java garbage collector frees the heap memory occupied by the objects that are no longer reachable. It takes the job from programmers who often forget to release the memory when they no longer need it and sometimes find it hard to debug memory issues.
What does JVM benefit from garbage collection other than preventing programmers from messing up with the memory? The answer is the efficiency at allocating memory for new objects. This is because garbage collector not only frees memory but also compacts objects in the heap. I wonder if other languages without garbage collector have some similar mechanisms for compaction.
This post is about the implementation of garbage collection in JVM. JVM adopts a hybrid of various strategies, but generational GC is the cornerstone.
I. Generational GC
Generational GC divides objects into several generations and has separate memory regions for different ages of objects, and applies different mechanisms of garbage collection to these generations.
Take Hotspot JVM for example, the heap is divided into two regions: young generation and old generation.
The Young Generation is where new objects are allocated and aged. It is further divided into one Eden space and two Survivor spaces. This is how it works (more details can be found here):
- The majority (but not all) of newly created objects are stored in the Eden space.
- When the memory is running low in the Eden space, a minor GC occurs. Supposing that Survivor0 is empty and Survivor1 is occupied by objects.
a. The objects that are unreachable are left in the Eden space, and the surviving objects are moved to Survivor0. The Eden space is cleared.
b. The reachable objects in Survivor1 are moved to Survivor0, and Survivor1 is empty as well.
c. If an object survives one minor GC, its age will be increased by 1. Now Survivor0 is occupied by objects with different ages.
d. If an object exceeds the age threshold, it will be promoted from the Young Generation to the Old Generation.
The Old Generation is used to store long surviving objects. The garbage collection occurs in the Old Generation is called a full GC. JVM provides different algorithms for full GC: Serial GC, Parallel GC, Parallel Compacting GC, Concurrent Mark & Sweep GC (aka. CMS), and Garge First (G1) GC.
Note that both minor GC and full GC are Stop the World events, which means the application stop running util gc completes.
Quick questions:
Q1. Why generational GC?
There is empirical analysis of applications which shows that most objects are short lived. From there we have a generational hypothesis:
The most recently created objects are also those most likely to become unreachable quickly.
This can be used to improve the performance of garbage collection because different strategies can be applied to different generations. Since most objects in the young generation will soon become unreachable, it is more efficient to move surviving objects from one region to another than removing unreachable objects while compacting the space; but for the old generation, it is pointless to move the majority of objects from one place to another without achieving anything.
Q2: Why two Survivor spaces?
First of all, not all JVMs implement two survivor spaces.
If there is only one survivor space, one has to compact the objects first in the survivor space when a minor GC occurs, and then move the surviving objects from the eden space. But as mentioned above, most newly created objects are likely to become unreachable, so it is more simple and effient to move objects round than compacting the survivor space.
References
1 Wikipedia: Tracing Garbage Collection
2 Java Garbage Collection Basics
3 Understand Java Garbage Collection
4 Java GC: Why Two Survivor Regions