Evaluate and improve object copy time by micro-optimizations and splitting out slow and fast paths aggressively.
Co-authored-by: Tony Printezis <tprintezis@twitter.com>
Reviewed-by: kbarrett, mgerdin, jmasa
When G1 is run with -XX:-G1DeferredRSUpdate, the VM crashes because of wrong initialization order of member variables. The change makes the initalization explicit, not relying on initialization order any more.
Reviewed-by: brutisso, mgerdin
Mentioned closures are actually wrapped methods. This adds confusion to readers, and in this case also increases code size as G1ParScanHeapEvacClosure is part of the oop_oop_iterate() methods. Move them into G1ParScanThreadState as methods.
Reviewed-by: stefank
Move methods that are not dependent on any of G1ParCopyClosure's template parameters into G1ParCopyHelper. Further remove unused methods and members of the class hierarchy.
Reviewed-by: mgerdin, stefank
Clean up and slightly optimize reference handling from the GC reference task queue. Since we never push partial array chunks as narrowOop* we can manually specialize the code so that some code can be optimized away.
Reviewed-by: tonyp, brutisso, stefank
Remove the above mentioned template parameter and related unused code. Also remove some classes that are never used.
Reviewed-by: stefank, mgerdin, jwilhelm
Remove PermGen, allocate meta-data in metaspace linked to class loaders, rewrite GC walking, rewrite and rename metadata to be C++ classes
Co-authored-by: Stefan Karlsson <stefan.karlsson@oracle.com>
Co-authored-by: Mikael Gerdin <mikael.gerdin@oracle.com>
Co-authored-by: Tom Rodriguez <tom.rodriguez@oracle.com>
Reviewed-by: jmasa, stefank, never, coleenp, kvn, brutisso, mgerdin, dholmes, jrose, twisti, roland
Some minor profile based optimizations. Reduce the number of branches and branch mispredicts by removing some virtual calls, through closure specalization, and refactoring some conditional statements.
Reviewed-by: brutisso, tonyp
Re-enable survivors during the initial-mark pause. Afterwards, the concurrent marking threads have to scan them and mark everything reachable from them. The next GC will have to wait for the survivors to be scanned.
Reviewed-by: brutisso, johnc
Remove the separate counting phase of concurrent marking by tracking the amount of marked bytes and the cards spanned by marked objects in marking task/worker thread local data structures, which are updated as individual objects are marked.
Reviewed-by: brutisso, tonyp
This change simplifies the interaction between GC and concurrent marking. By disabling survivor spaces during the initial-mark pause we don't need to propagate marks of objects we copy during each GC (since we never need to copy an explicitly marked object).
Reviewed-by: johnc, brutisso
During remembered set scanning, the reference processor could discover a reference object whose referent was in the process of being copied and so may not be completely initialized. Do not perform reference discovery during remembered set scanning.
Reviewed-by: tonyp, ysr
G1 now uses two reference processors - one is used by concurrent marking and the other is used by STW GCs (both full and incremental evacuation pauses). In an evacuation pause, the reference processor is embedded into the closures used to scan objects. Doing so causes causes reference objects to be 'discovered' by the reference processor. At the end of the evacuation pause, these discovered reference objects are processed - preserving (and copying) referent objects (and their reachable graphs) as appropriate.
Reviewed-by: ysr, jwilhelm, brutisso, stefank, tonyp
As a result of the changes for 7080389, an evacuation failure during an initial mark pause may result in some root objects not being marked. Pass whether the caller is a root scanning closure into the evacuation failure handling code so that the thread that successfully forwards an object to itself also marks the object.
Reviewed-by: ysr, brutisso, tonyp
Refactor code marking code in the evacuation pause copy closures so that an evacuated object is only marked by the thread that successfully copies it.
Reviewed-by: stefank, brutisso, tonyp
Some optimizations to improve the concurrent marking phase: specialize the main oop closure, make sure a few methods in the fast path are properly inlined, a few more bits and pieces, and some cosmetic fixes.
Reviewed-by: stefank, johnc
Implemented block-based work stealing. Moved copying during the rset scanning phase to the main copying phase. Made the size of rset table depend on the region size.
Reviewed-by: apetrusenko, tonyp