Duplicated code from G1RemSet::par_write_ref() inlined into G1UpdateRSOrPushRefOopClosure::do_oop_nv() was showing up in profiles with a fairly high amount of CPU time. Manually inline the main part of G1RemSet::par_write_ref() to eliminate the code duplication.
Reviewed-by: azeemj, brutisso
7151532: DCmd for hotspot native memory tracking
Implementation of native memory tracking phase 1, which tracks VM native memory usage, and related DCmd
Reviewed-by: acorn, coleenp, fparain
Remove the per-thread expansion tables (PosParPRT) and associated expansion and compaction from the fine grain RSet entries. This code has been unused for a while.
Reviewed-by: johnc, brutisso
Before the last thread to leave a JNI critical region was able to schedule a GCLocker Initiated GC, another thread was attempting an allocation and saw that the GCLocker region was no longer active and successfully scheduled a GC. Stall allocating threads until the GCLocker Initiated GC is performed and then retry the allocation.
Reviewed-by: brutisso, huntch
Rename "to-space-overflow" to "to-space-exhausted", Introduce one decimal point in the size format, Add Sum to the aggregate and re-order the entries, Add number of GC workers to the log output
Reviewed-by: johnc, jwilhelm
Removed the HeapRegion::_top_at_conc_mark_count field. It is no longer needed as a result of the changes for 6888336 and 7127706. Refactored the closures that finalize and verify the liveness counting data so that common functionality was placed into a base class.
Reviewed-by: brutisso, tonyp
Cleanup of the CSet chooser class: standardize on uints for region num and indexes (instead of int, jint, etc.), make the method / field naming style more consistent, remove a lot of dead code.
Reviewed-by: johnc, brutisso
Change the type of fields / variables / etc. that represent region counts and indeces from size_t to uint.
Reviewed-by: iveresov, brutisso, jmasa, jwilhelm
Added log levels "fine", "finer" and "finest". Let PrintGC map to "fine" and PrintGCDetails map to "finer". Separated out the per worker information in the G1 logging to the "finest" level.
Reviewed-by: stefank, jwilhelm, tonyp, johnc
Tiered compilation has increased the number of nmethods in the code cache. This has, in turn, significantly increased the number of marked nmethods processed during the StrongRootsScope destructor. Create a specialized version of CodeBlobToOopClosure for G1 which places only those nmethods that contain pointers into the collection set on to the marked nmethods list.
Reviewed-by: iveresov, tonyp
Make two G1 cmd line flags available in product builds: G1HeapWastePercent (previously called: G1OldReclaimableThresholdPercent) and G1MixedGCCountTarget (previous called: G1MaxMixedGCNum). Also changed the default of the former from 1% to 5% and the default for G1OldCSetRegionLiveThresholdPercent to 90%.
Reviewed-by: azeemj, jwilhelm, johnc
Attempting to initiate a marking cycle when allocating a humongous object can, if a marking cycle is successfully initiated by another thread, result in the allocating thread spinning until the marking cycle is complete. Eliminate a deadlock between the main ConcurrentMarkThread, the SurrogateLocker thread, the VM thread, and a mutator thread waiting on the SecondaryFreeList_lock (while free regions are going to become available) by not manipulating the pending list lock during the prologue and epilogue of the cleanup pause.
Reviewed-by: brutisso, jcoomes, tonyp
Revamp of the mechanism that chooses old regions for inclusion in the CSet. It simplifies the code and introduces min and max bounds on the number of old regions added to the CSet at each mixed GC to avoid pathological cases. It also ensures that when we do a mixed GC we'll always find old regions to add to the CSet (i.e., it eliminates the case where a mixed GC will collect no old regions which can happen today).
Reviewed-by: johnc, brutisso
If we try to schedule an initial-mark GC in order to explicit start a conc mark cycle and it gets pre-empted by antoher GC, we should retry the attempt as long as it's appropriate for the GC cause.
Reviewed-by: brutisso, johnc
Some minor profile based optimizations. Reduce the number of branches and branch mispredicts by removing some virtual calls, through closure specalization, and refactoring some conditional statements.
Reviewed-by: brutisso, tonyp
Re-enable survivors during the initial-mark pause. Afterwards, the concurrent marking threads have to scan them and mark everything reachable from them. The next GC will have to wait for the survivors to be scanned.
Reviewed-by: brutisso, johnc
Remove the separate counting phase of concurrent marking by tracking the amount of marked bytes and the cards spanned by marked objects in marking task/worker thread local data structures, which are updated as individual objects are marked.
Reviewed-by: brutisso, tonyp
During an initial mark pause, signal the Concurrent Mark thread after the pause output from PrintGC/PrintGCDetails is complete.
Reviewed-by: tonyp, brutisso
There is a high number of mispredicted branches associated with calling BitMap::iteratate() from within CMBitMapRO::iterate(). Implement a version of CMBitMapRO::iterate() directly using inline-able routines.
Reviewed-by: tonyp, iveresov
This change simplifies the interaction between GC and concurrent marking. By disabling survivor spaces during the initial-mark pause we don't need to propagate marks of objects we copy during each GC (since we never need to copy an explicitly marked object).
Reviewed-by: johnc, brutisso
Store the "next chunk start index" in the length field of the to-space object, instead of the from-space object, so that we can always reliably read the size of all from-space objects.
Reviewed-by: johnc, ysr, jmasa
Parallelize the removal of self forwarding pointers etc. by wrapping in a HeapRegion closure, which is then wrapped inside an AbstractGangTask.
Reviewed-by: tonyp, iveresov
Introduce a flag that is set when a heap expansion attempt during a GC fails so that we do not consantly attempt to expand the heap when it's going to fail anyway. This not only prevents the excessive ergo output (which is generated when a region allocation fails) but also avoids excessive and ultimately unsuccessful expansion attempts.
Reviewed-by: jmasa, johnc