This is a collection of rules, guidelines, and suggestions for writing HotSpot code. Following these will help new code fit in with existing HotSpot code, making it easier to read and maintain. Failure to follow these guidelines may lead to discussion during code reviews, if not outright rejection of a change.
Some programmers seem to have lexers and even C preprocessors installed directly behind their eyeballs. The rest of us require code that is not only functionally correct but also easy to read. More than that, since there is no one style for easy-to-read code, and since a mashup of many styles is just as confusing as no style at all, it is important for coders to be conscious of the many implicit stylistic choices that historically have gone into the HotSpot code base.
Some of these guidelines are driven by the cross-platform requirements for HotSpot. Shared code must work on a variety of platforms, and may encounter deficiencies in some. Using platform conditionalization in shared code is usually avoided, while shared code is strongly preferred to multiple platform-dependent implementations, so some language features may be recommended against.
Some of the guidelines here are relatively arbitrary choices among equally plausible alternatives. The purpose of stating and enforcing these rules is largely to provide a consistent look to the code. That consistency makes the code more readable by avoiding non-functional distractions from the interesting functionality.
When changing pre-existing code, it is reasonable to adjust it to match these conventions. Exception: If the pre-existing code clearly conforms locally to its own peculiar conventions, it is not worth reformatting the whole thing. Also consider separating changes that make extensive stylistic updates from those which make functional changes.
Many of the guidelines mentioned here have (sometimes widespread) counterexamples in the HotSpot code base. Finding a counterexample is not sufficient justification for new code to follow the counterexample as a precedent, since readers of your code will rightfully expect your code to follow the greater bulk of precedents documented here.
Occasionally a guideline mentioned here may be just out of synch with the actual HotSpot code base. If you find that a guideline is consistently contradicted by a large number of counterexamples, please bring it up for discussion and possible change. The architectural rule, of course, is "When in Rome do as the Romans". Sometimes in the suburbs of Rome the rules are a little different; these differences can be pointed out here.
Proposed changes should be discussed on the HotSpot Developers mailing list. Changes are likely to be cautious and incremental, since HotSpot coders have been using these guidelines for years.
Substantive changes are approved by rough consensus of the HotSpot Group Members. The Group Lead determines whether consensus has been reached.
Editorial changes (changes that only affect the description of HotSpot style, not its substance) do not require the full consensus gathering process. The normal HotSpot pull request process may be used for editorial changes, with the additional requirement that the requisite reviewers are also HotSpot Group Members.
Group related code together, so readers can concentrate on one section of one file.
Classes are the primary code structuring mechanism. Place related functionality in a class, or a set of related classes. Use of either namespaces or public non-member functions is rare in HotSpot code. Static non-member functions are not uncommon.
If a class FooBar
is going to be used in more than
one place, put it a file named fooBar.hpp and fooBar.cpp. If the class
is a sidekick to a more important class BazBat
, it can go
in bazBat.hpp.
Put a member function FooBar::bang
into the same
file that defined FooBar
, or its associated .inline.hpp
or .cpp file.
Use public accessor functions for member variables accessed outside the class.
Assign names to constant literals and use the names instead.
Keep functions small, a screenful at most. Split out chunks of logic into file-local classes or static functions if needed.
Factor away nonessential complexity into local inline helper functions and helper classes.
Think clearly about internal invariants that apply to each class, and document them in the form of asserts within member functions.
Make simple, self-evident contracts for member functions. If you cannot communicate a simple contract, redesign the class.
Implement classes as if expecting rough usage by clients. Check
for incorrect usage of a class using assert(...)
,
guarantee(...)
, ShouldNotReachHere()
and
comments wherever needed. Performance is almost never a reason to omit
asserts.
When possible, design as if for reusability. This forces a clear design of the class's externals, and clean hiding of its internals.
Initialize all variables and data structures to a known state. If a class has a constructor, initialize it there.
Do no optimization before its time. Prove the need to optimize.
When you must defactor to optimize, preserve as much structure as possible. If you must hand-inline some name, label the local copy with the original name.
If you need to use a hidden detail (e.g., a structure offset), name it (as a constant or function) in the class that owns it.
Don't use the Copy and Paste keys to replicate more than a couple lines of code. Name what you must repeat.
If a class needs a member function to change a user-visible attribute, the change should be done with a "setter" accessor matched to the simple "getter".
All source files must have a globally unique basename. The build system depends on this uniqueness.
Do not put non-trivial function implementations in .hpp files. If the implementation depends on other .hpp files, put it in a .cpp or a .inline.hpp file.
.inline.hpp files should only be included in .cpp or .inline.hpp files.
All .inline.hpp files should include their corresponding .hpp file as the first include line. Declarations needed by other files should be put in the .hpp file, and not in the .inline.hpp file. This rule exists to resolve problems with circular dependencies between .inline.hpp files.
All .cpp files include precompiled.hpp as the first include line.
precompiled.hpp is just a build time optimization, so don't rely on it to resolve include problems.
Keep the include lines alphabetically sorted.
Put conditional inclusions (#if ...
) at the end of
the include list.
JTReg tests should have meaningful names.
JTReg tests associated with specific bugs should be tagged with
the @bug
keyword in the test description.
JTReg tests should be organized by component or feature under
test/
, in a directory hierarchy that generally follows that
of the src/
directory. There may be additional
subdirectories to further categorize tests by feature. This structure
makes it easy to run a collection of tests associated with a specific
feature by specifying the associated directory as the source of the
tests to run.
The length of a name may be correlated to the size of its scope. In particular, short names (even single letter names) may be fine in a small scope, but are usually inappropriate for larger scopes.
Prefer whole words rather than abbreviations, unless the abbreviation is more widely used than the long form in the code's domain.
Choose names consistently. Do not introduce spurious variations. Abbreviate corresponding terms to a consistent length.
Global names must be unique, to avoid One Definition Rule (ODR) violations. A common prefixing scheme for related global names is often used. (This is instead of using namespaces, which are mostly avoided in HotSpot.)
Don't give two names to the semantically same thing. But use
different names for semantically different things, even if they are
representationally the same. (So use meaningful typedef
or
template alias names where appropriate.)
When choosing names, avoid categorical nouns like "variable",
"field", "parameter", "value", and verbs like "compute", "get".
(storeValue(int param)
is bad.)
Type names and global names should use mixed-case with the first
letter of each word capitalized (FooBar
).
Embedded abbreviations in otherwise mixed-case names are usually capitalized entirely rather than being treated as a single word with only the initial letter capitalized, e.g. "HTML" rather than "Html".
Function and local variable names use lowercase with words
separated by a single underscore (foo_bar
).
Class data member names have a leading underscore, and use
lowercase with words separated by a single underscore
(_foo_bar
).
Constant names may be upper-case or mixed-case, according to historical necessity. (Note: There are many examples of constants with lowercase names.)
Constant names should follow an existing pattern, and must have a distinct appearance from other names in related APIs.
Class and type names should be noun phrases. Consider an "er" suffix for a class that represents an action.
Function names should be verb phrases that reflect changes of state known to a class's user, or else noun phrases if they cause no change of state visible to the class's user.
Getter accessor names are noun phrases, with no
"get_
" noise word. Boolean getters can also begin with
"is_
" or "has_
". Member function for reading
data members usually have the same name as the data member, exclusive of
the leading underscore.
Setter accessor names prepend "set_
" to the getter
name.
Other member function names are verb phrases, as if commands to the receiver.
Avoid leading underscores (as "_oop
") except in
cases required above. (Names with leading underscores can cause
portability problems.)
Clearly comment subtle fixes.
Clearly comment tricky classes and functions.
If you have to choose between commenting code and writing wiki content, comment the code. Link from the wiki to the source file if it makes sense.
As a general rule don't add bug numbers to comments (they would soon overwhelm the code). But if the bug report contains significant information that can't reasonably be added as a comment, then refer to the bug report.
Personal names are discouraged in the source code, which is a team product.
You can almost always use an inline function or class instead of a macro. Use a macro only when you really need it.
Templates may be preferable to multi-line macros. (There may be subtle performance effects with templates on some platforms; revert to macros if absolutely necessary.)
#ifdef
s should not be used to introduce
platform-specific code into shared code (except for _LP64
).
They must be used to manage header files, in the pattern found at the
top of every source file. They should be used mainly for major build
features, including PRODUCT
, ASSERT
,
_LP64
, INCLUDE_SERIALGC
,
COMPILER1
, etc.
For build features such as PRODUCT
, use
#ifdef PRODUCT
for multiple-line inclusions or
exclusions.
For short inclusions or exclusions based on build features, use
macros like PRODUCT_ONLY
and NOT_PRODUCT
. But
avoid using them with multiple-line arguments, since debuggers do not
handle that well.
Use CATCH
, THROW
, etc. for
HotSpot-specific exception processing.
In general, don't change whitespace unless it improves readability or consistency. Gratuitous whitespace changes will make integrations and backports more difficult.
Use One-True-Brace-Style. The opening brace for a function or class is normally at the end of the line; it is sometimes moved to the beginning of the next line for emphasis. Substatements are enclosed in braces, even if there is only a single statement. Extremely simple one-line statements may drop braces around a substatement.
Indentation levels are two columns.
There is no hard line length limit. That said, bear in mind that excessively long lines can cause difficulties. Some people like to have multiple side-by-side windows in their editors, and long lines may force them to choose among unpleasant options. They can use wide windows, reducing the number that can fit across the screen, and wasting a lot of screen real estate because most lines are not that long. Alternatively, they can have more windows across the screen, with long lines wrapping (or worse, requiring scrolling to see in their entirety), which is harder to read. Similar issues exist for side-by-side code reviews.
Tabs are not allowed in code. Set your editor accordingly.
(Emacs: (setq-default indent-tabs-mode nil)
.)
Use good taste to break lines and align corresponding tokens on adjacent lines.
Use spaces around operators, especially comparisons and assignments. (Relaxable for boolean expressions and high-precedence operators in classic math-style formulas.)
Put spaces on both sides of control flow keywords
if
, else
, for
,
switch
, etc. Don't add spaces around the associated
control expressions. Examples:
while (test_foo(args...)) { // Yes
while(test_foo(args...)) { // No, missing space after while
while ( test_foo(args...) ) { // No, excess spaces around control
Use extra parentheses in expressions whenever operator precedence
seems doubtful. Always use parentheses in shift/mask expressions
(<<
, &
, |
). Don't add
whitespace immediately inside parentheses.
Use more spaces and blank lines between larger constructs, such as classes or function definitions.
If the surrounding code has any sort of vertical organization, adjust new lines horizontally to be consistent with that organization. (E.g., trailing backslashes on long macro definitions often align.)
Use the Resource Acquisition Is
Initialization (RAII) design pattern to manage bracketed critical
sections. See class ResourceMark
for an example.
Avoid implicit conversions to bool
.
bool
for boolean values.&&
, ||
, if
,
while
. Instead, compare explicitly, i.e.
if (x != 0)
or if (ptr != nullptr)
, etc.if (T v = value) { ... }
.Use functions from globalDefinitions.hpp and related files when
performing bitwise operations on integers. Do not code directly as C
operators, unless they are extremely simple. (Examples:
align_up
, is_power_of_2
,
exact_log2
.)
Use arrays with abstractions supporting range checks.
Always enumerate all cases in a switch statement or provide a default case. It is ok to have an empty default with comment.
HotSpot was originally written in a subset of the C++98/03 language. More recently, support for C++14 is provided, though again, HotSpot only uses a subset. (Backports to JDK versions lacking support for more recent Standards must of course stick with the original C++98/03 subset.)
This section describes that subset. Features from the C++98/03 language may be used unless explicitly excluded here. Features from C++11 and C++14 may be explicitly permitted or explicitly excluded, and discussed accordingly here. There is a third category, undecided features, about which HotSpot developers have not yet reached a consensus, or perhaps have not discussed at all. Use of these features is also excluded.
(The use of some features may not be immediately obvious and may slip in anyway, since the compiler will accept them. The code review process is the main defense against this.)
Some features are discussed in their own subsection, typically to provide more extensive discussion or rationale for limitations. Features that don't have their own subsection are listed in omnibus feature sections for permitted, excluded, and undecided features.
Lists of new features for C++11 and C++14, along with links to their descriptions, can be found in the online documentation for some of the compilers and libraries. The C++14 Standard is the definitive description.
As a rule of thumb, permitting features which simplify writing code and, especially, reading code, is encouraged.
Similar discussions for some other projects:
Google C++ Style Guide — Currently (2020) targeting C++17.
C++11 and C++14 use in Chromium — Categorizes features as allowed, banned, or to be discussed.
llvm Coding Standards — Currently (2020) targeting C++14.
Using C++ in Mozilla code — C++17 support is required for recent versions (2020).
Do not use exceptions. Exceptions are disabled by the build configuration for some platforms.
Rationale: There is significant concern over the performance cost of exceptions and their usage model and implications for maintainable code. That's not just a matter of history that has been fixed; there remain questions and problems even today (2019). See, for example, Zero cost deterministic exceptions. Because of this, HotSpot has always used a build configuration that disables exceptions where that is available. As a result, HotSpot code uses error handling mechanisms such as two-phase construction, factory functions, returning error codes, and immediate termination. Even if the cost of exceptions were not a concern, the existing body of code was not written with exception safety in mind. Making HotSpot exception safe would be a very large undertaking.
In addition to the usual alternatives to exceptions, HotSpot provides its own exception mechanism. This is based on a set of macros defined in utilities/exceptions.hpp.
Do not use Runtime Type Information (RTTI). RTTI is disabled by the build
configuration for some platforms. Among other things, this means
dynamic_cast
cannot be used.
Rationale: Other than to implement exceptions (which HotSpot doesn't use), most potential uses of RTTI are better done via virtual functions. Some of the remainder can be replaced by bespoke mechanisms. The cost of the additional runtime data structures needed to support RTTI are deemed not worthwhile, given the alternatives.
Do not use the standard global allocation and deallocation functions (operator new and related functions). Use of these functions by HotSpot code is disabled for some platforms.
Rationale: HotSpot often uses "resource" or "arena" allocation. Even where heap allocation is used, the standard global functions are avoided in favor of wrappers around malloc and free that support the VM's Native Memory Tracking (NMT) feature. Typically, uses of the global operator new are inadvertent and therefore often associated with memory leaks.
Native memory allocation failures are often treated as non-recoverable. The place where "out of memory" is (first) detected may be an innocent bystander, unrelated to the actual culprit.
Use public single inheritance.
Prefer composition rather than non-public inheritance.
Restrict inheritance to the "is-a" case; use composition rather than non-is-a related inheritance.
Avoid multiple inheritance. Never use virtual inheritance.
Avoid using namespaces. HotSpot code normally uses "all static"
classes rather than namespaces for grouping. An "all static" class is
not instantiable, has only static members, and is normally derived
(possibly indirectly) from the helper class AllStatic
.
Benefits of using such classes include:
Provides access control for members, which is unavailable with namespaces.
Avoids Argument Dependent Lookup (ADL).
Closed for additional members. Namespaces allow names to be added in multiple contexts, making it harder to see the complete API.
Namespaces should be used only in cases where one of those "benefits" is actually a hindrance.
In particular, don't use anonymous namespaces. They seem like they should be useful, and indeed have some real benefits for naming and generated code size on some platforms. Unfortunately, debuggers don't seem to like them at all.
https://groups.google.com/forum/#!topic/mozilla.dev.platform/KsaG3lEEaRM
Suggests Visual Studio debugger might not be able to refer to anonymous
namespace symbols, so can't set breakpoints in them. Though the
discussion seems to go back and forth on that.
https://firefox-source-docs.mozilla.org/code-quality/coding-style/coding_style_cpp.html
Search for "Anonymous namespaces" Suggests preferring "static" to
anonymous namespaces where applicable, because of poor debugger support
for anonymous namespaces.
https://sourceware.org/bugzilla/show_bug.cgi?id=16874
Bug for similar gdb problems.
Avoid using the C++ Standard Library.
Historically, HotSpot has mostly avoided use of the Standard Library.
(It used to be impossible to use most of it in shared code, because the build configuration for Solaris with Solaris Studio made all but a couple of pieces inaccessible. Support for header-only parts was added in mid-2017. Support for Solaris was removed in 2020.)
Some reasons for this include
Exceptions. Perhaps the largest core issue with adopting the use of Standard Library facilities is exceptions. HotSpot does not use exceptions and, for platforms which allow doing so, builds with them turned off. Many Standard Library facilities implicitly or explicitly use exceptions.
assert
. An issue that is quickly encountered is the
assert
macro name collision (JDK-8007770).
Some mechanism for addressing this would be needed before much of the
Standard Library could be used. (Not all Standard Library
implementations use assert in header files, but some do.)
Memory allocation. HotSpot requires explicit control over where
allocations occur. The C++98/03 std::allocator
class is too
limited to support our usage. (Changes in more recent Standards may
remove this limitation.)
Implementation vagaries. Bugs, or simply different implementation choices, can lead to different behaviors among the various Standard Libraries we need to deal with.
Inconsistent naming conventions. HotSpot and the C++ Standard use different naming conventions. The coexistence of those different conventions might appear jarring and reduce readability.
There are a few exceptions to this rule.
#include <new>
to use placement new
,
std::nothrow
, and std::nothrow_t
.#include <limits>
to use
std::numeric_limits
.#include <type_traits>
with some restrictions,
listed below.#include <cstddef>
to use
std::nullptr_t
and std::max_align_t
.Certain restrictions apply to the declarations provided by
<type_traits>
.
alignof
operator should be used rather than
std::alignment_of<>
.TODO: Rather than directly #including (permitted) Standard Library headers, use a convention of #including wrapper headers (in some location like hotspot/shared/stdcpp). This provides a single place for dealing with issues we might have for any given header, esp. platform-specific issues.
Use type deduction only if it makes the code clearer or safer. Do not use it merely to avoid the inconvenience of writing an explicit type, unless that type is itself difficult to write. An example of the latter is a function template return type that depends on template parameters in a non-trivial way.
There are several contexts where types are deduced.
Function argument deduction. This is always permitted, and indeed encouraged. It is nearly always better to allow the type of a function template argument to be deduced rather than explicitly specified.
auto
variable declarations (n1984)
For local variables, this can be used to make the code clearer by
eliminating type information that is obvious or irrelevant. Excessive
use can make code much harder to understand.
Function return type deduction (n3638)
Only
use if the function body has a very small number of return
statements, and generally relatively little other code.
Also see lambda expressions.
Substitution Failure Is Not An Error (SFINAE) is a template metaprogramming technique that makes use of template parameter substitution failures to make compile-time decisions.
C++11 relaxed the rules for what constitutes a hard-error when attempting to substitute template parameters with template arguments, making most deduction errors be substitution errors; see (n2634). This makes SFINAE more powerful and easier to use. However, the implementation complexity for this change is significant, and this seems to be a place where obscure corner-case bugs in various compilers can be found. So while this feature can (and indeed should) be used (and would be difficult to avoid), caution should be used when pushing to extremes.
Here are a few closely related example bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95468
https://developercommunity.visualstudio.com/content/problem/396562/sizeof-deduced-type-is-sometimes-not-a-constant-ex.html
Where appropriate, scoped-enums should be used. (n2347)
Use of unscoped-enums is permitted, though ordinary constants may be preferable when the automatic initializer feature isn't used.
The underlying type (the enum-base) of an unscoped enum type should always be specified explicitly. When unspecified, the underlying type is dependent on the range of the enumerator values and the platform.
The underlying type of a scoped-enum should also be specified explicitly if conversions may be applied to values of that type.
Due to bugs in certain (very old) compilers, there is widespread use of enums and avoidance of in-class initialization of static integral constant members. Compilers having such bugs are no longer supported. Except where an enum is semantically appropriate, new code should use integral constants.
Alignment-specifiers (alignas
n2341)
are permitted, with restrictions.
Alignment-specifiers are permitted when the requested
alignment is a fundamental alignment (not greater than
alignof(std::max_align_t)
C++14
3.11/2).
Alignment-specifiers with an extended alignment
(greater than alignof(std::max_align_t)
C++14
3.11/3) may only be used to align variables with static or automatic
storage duration (C++14
3.7.1, 3.7.3). As a consequence, over-aligned types are
forbidden; this may change if HotSpot updates to using C++17 or later
(p0035r4).
Large extended alignments should be avoided, particularly for stack allocated objects. What is a large value may depend on the platform and configuration. There may also be hard limits for some platforms.
An alignment-specifier must always be applied to a definition (C++14 10.6.2/6). (C++ allows an alignment-specifier to optionally also be applied to a declaration, so long as the definition has equivalent alignment. There isn't any known benefit from duplicating the alignment in a non-definition declaration, so such duplication should be avoided in HotSpot code.)
Enumerations are forbidden from having alignment-specifiers. Aligned enumerations were originally permitted but insufficiently specified, and were later (C++20) removed (CWG 2354). Permitting such usage in HotSpot now would just cause problems in the future.
Alignment-specifiers are forbidden in typedef
and alias-declarations. This may work or may have worked in
some versions of some compilers, but was later (C++14) explicitly
disallowed (CWG
1437).
The HotSpot macro ATTRIBUTE_ALIGNED
provides similar
capabilities for platforms that define it. This macro predates the use
by HotSpot of C++ versions providing alignas
. New code
should use alignas
.
Avoid use of thread_local
(n2659);
and instead, use the HotSpot macro THREAD_LOCAL
, for which
the initializer must be a constant expression. When
thread_local
must be used, use the Hotspot macro
APPROVED_CPP_THREAD_LOCAL
to indicate that the use has been
given appropriate consideration.
As was discussed in the review for JDK-8230877,
thread_local
allows dynamic initialization and destruction
semantics. However, that support requires a run-time penalty for
references to non-function-local thread_local
variables
defined in a different translation unit, even if they don't need dynamic
initialization. Dynamic initialization and destruction of non-local
thread_local
variables also has the same ordering problems
as for ordinary non-local variables. So we avoid use of
thread_local
in general, limiting its use to only those
cases where dynamic initialization or destruction are essential. See JDK-8282469 for
further discussion.
Use nullptr
(n2431)
rather than NULL
. See the paper for reasons to avoid
NULL
.
Don't use (constant expression or literal) 0 for pointers. Note that C++14 removed non-literal 0 constants from null pointer constants, though some compilers continue to treat them as such. For historical reasons there may be lingering uses of 0 as a pointer.
Do not use facilities provided by the <atomic>
header (n2427),
(n2752);
instead, use the HotSpot Atomic
class and related
facilities.
Atomic operations in HotSpot code must have semantics which are consistent with those provided by the JDK's compilers for Java. There are platform-specific implementation choices that a C++ compiler might make or change that are outside the scope of the C++ Standard, and might differ from what the Java compilers implement.
In addition, HotSpot Atomic
has a concept of
"conservative" memory ordering, which may differ from (may be stronger
than) sequentially consistent. There are algorithms in HotSpot that are
believed to rely on that ordering.
The use of uniform initialization (n2672), also known as brace initialization, is permitted.
Some relevant sections from cppreference.com:
Although related, the use of std::initializer_list
remains forbidden, as part of the avoidance of the C++ Standard Library
in HotSpot code.
[&]
as the capture list of a lambda
expression.mutable
.Single-use function objects can be defined locally within a function, directly at the point of use. This is an alternative to having a function or function object class defined at class or namespace scope.
This usage was somewhat limited by C++03, which does not permit such a class to be used as a template parameter. That restriction was removed by C++11 (n2657). Use of this feature is permitted.
Many HotSpot protocols involve "function-like" objects that involve some named member function rather than a call operator. For example, a function that performs some action on all threads might be written as
void do_something() {
struct DoSomething : public ThreadClosure {
virtual void do_thread(Thread* t) {
... do something with t ...
}
} closure;
Threads::threads_do(&closure);
}
HotSpot code has historically usually placed the DoSomething class at
namespace (or sometimes class) scope. This separates the function's code
from its use, often to the detriment of readability. It requires giving
the class a globally unique name (if at namespace scope). It also loses
the information that the class is intended for use in exactly one place,
and does not have any subclasses. (However, the latter can now be
indicated by declaring it final
.) Often, for simplicity, a
local class will skip things like access control and accessor functions,
giving the enclosing function direct access to the implementation and
eliminating some boilerplate that might be provided if the class is in
some outer (more accessible) scope. On the other hand, if there is a lot
of surrounding code in the function body or the local class is of
significant size, defining it locally can increase clutter and reduce
readability.
C++11 added lambda expressions as a new way to write a function object. Simple lambda expressions can be significantly more concise than a function object, eliminating a lot of boiler-plate. On the other hand, a complex lambda expression may not provide much, if any, readability benefit compared to an ordinary function object. Also, while a lambda can encapsulate a call to a "function-like" object, it cannot be used in place of such.
A common use for local functions is as one-use RAII objects. The amount of boilerplate for a function object class (local or not) makes such usage somewhat clumsy and verbose. But with the help of a small amount of supporting utility code, lambdas work particularly well for this use case.
Another use for local functions is partial application. Again here, lambdas are typically much simpler and less verbose than function object classes.
Because of these benefits, lambda expressions are permitted in HotSpot code, with some restrictions and usage guidance. An anonymous lambda is one which is passed directly as an argument. A named lambda is the value of a variable, which is its name.
Lambda expressions should only be passed downward. In particular, a
lambda should not be returned from a function or stored in a global
variable, whether directly or as the value of a member of some other
object. Lambda capture is syntactically subtle (by design), and
propagating a lambda in such ways can easily pass references to captured
values to places where they are no longer valid. In particular, members
of the enclosing this
object are effectively captured by
reference, even if the default capture is by-value. For such uses-cases
a function object class should be used to make the desired value
capturing and propagation explicit.
Limiting the capture list to [&]
(implicitly capture
by reference) is a simplifying restriction that still provides good
support for HotSpot usage, while reducing the cases a reader must
recognize and understand.
Many common lambda uses require reference capture. Not permitting it would substantially reduce the utility of lambdas.
Referential transparency. Implicit reference capture makes variable references in the lambda body have the same meaning they would have in the enclosing code. There isn't a semantic barrier across which the meaning of a variable changes.
Explicit reference capture introduces significant clutter, especially when lambda expressions are relatively small and simple, as they should be in HotSpot code.
There are a number of reasons why by-value capture might be used, but for the most part they don't apply to HotSpot code, given other usage restrictions.
A primary use-case for by-value capture is to support escaping uses, where values captured by-reference might become invalid. That use-case doesn't apply if only downward lambdas are used.
By-value capture can also make a lambda-local copy for mutation,
which requires making the lambda mutable
; see
below.
By-value capture might be viewed as an optimization, avoiding any overhead for reference capture of cheap to copy values. But the compiler can often eliminate any such overhead.
By-value capture by a non-mutable
lambda makes the
captured values const, preventing any modification by the lambda and
making the captured value unaffected by modifications to the outer
variable. But this only applies to captured auto variables, not member
variables, and is inconsistent with referential transparency.
Non-capturing lambdas (with an empty capture list -
[]
) have limited utility. There are cases where no captures
are required (pure functions, for example), but if the function is small
and simple then that's obvious anyway.
Capture initializers (a C++14 feature - N3649) are not permitted. Capture initializers inherently increase the complexity of the capture list, and provide little benefit over an additional in-scope local variable.
The use of mutable
lambda expressions is forbidden
because there don't seem to be many, if any, good use-cases for them in
HotSpot. A lambda expression needs to be mutable in order to modify a
by-value captured value. But with only downward lambdas, such usage
seems likely to be rare and complicated. It is better to use a function
object class in any such cases that arise, rather than requiring all
HotSpot developers to understand this relatively obscure feature.
While it is possible to directly invoke an anonymous lambda expression, that feature should not be used, as such a form can be confusing to readers. Instead, name the lambda and call it by name.
Some reasons to prefer a named lambda instead of an anonymous lambda are
The body contains non-trivial control flow or declarations or other nested constructs.
Its role in an argument list is hard to guess without examining the function declaration. Give it a name that indicates its purpose.
It has an unusual capture list.
It has a complex explicit return type or parameter types.
Lambda expressions, and particularly anonymous lambda expressions, should be simple and compact. One-liners are good. Anonymous lambdas should usually be limited to a couple lines of body code. More complex lambdas should be named. A named lambda should not clutter the enclosing function and make it long and complex; do continue to break up large functions via the use of separate helper functions.
An anonymous lambda expression should either be a one-liner in a one-line expression, or isolated in its own set of lines. Don't place part of a lambda expression on the same line as other arguments to a function. The body of a multi-line lambda argument should be indented from the start of the capture list, as if that were the start of an ordinary function definition. The body of a multi-line named lambda should be indented one step from the variable's indentation.
Some examples:
foo([&] { ++counter; });
foo(x, [&] { ++counter; });
foo([&] { if (predicate) ++counter; });
foo([&] { auto tmp = process(x); tmp.f(); return tmp.g(); })
Separate one-line lambda from other arguments:
foo(c.begin(), c.end(),
[&] (const X& x) { do_something(x); return x.value(); });
Indentation for multi-line lambda:
c.do_entries([&] (const X& x) {
do_something(x, a);
do_something1(x, b);
do_something2(x, c);
});
Separate multi-line lambda from other arguments:
foo(c.begin(), c.end(),
[&] (const X& x) {
do_something(x, a);
do_something1(x, b);
do_something2(x, c);
});
Multi-line named lambda:
auto do_entry = [&] (const X& x) {
do_something(x, a);
do_something1(x, b);
do_something2(x, c);
};
Item 4, and especially items 6 and 7, are pushing the simplicity limits for anonymous lambdas. Item 6 might be better written using a named lambda:
c.do_entries(do_entry);
Note that C++11 also added bind expressions as a way to
write a function object for partial application, using
std::bind
and related facilities from the Standard Library.
std::bind
generalizes and replaces some of the binders from
C++03. Bind expressions are not permitted in HotSpot code. They don't
provide enough benefit over lambdas or local function classes in the
cases where bind expressions are applicable to warrant the introduction
of yet another mechanism in this space into HotSpot code.
References:
References from C++17
References from C++20
References from C++23
Do not use inheriting constructors (n2540).
C++11 provides simple syntax allowing a class to inherit the constructors of a base class. Unfortunately there are a number of problems with the original specification, and C++17 contains significant revisions (p0136r1 opens with a list of 8 Core Issues). Since HotSpot doesn't support use of C++17, use of inherited constructors could run into those problems. Such uses might also change behavior in a future HotSpot update to use C++17 or later, potentially in subtle ways that could lead to hard to diagnose problems. Because of this, HotSpot code must not use inherited constructors.
Note that gcc7 provides the -fnew-inheriting-ctors
option to use the p0136r1 semantics. This is enabled by default when
using C++17 or later. It is also enabled by default for
fabi-version=11
(introduced by gcc7) or higher when using
C++11/14, as the change is considered a Defect Report that applies to
those versions. Earlier versions of gcc don't have that option, and
other supported compilers may not have anything similar.
The use of some attributes (n2761) (listed below) is permitted. (Note that some of the attributes defined in that paper didn't make it into the final specification.)
Attributes are syntactically permitted in a broad set of locations, but specific attributes are only permitted in a subset of those locations. In some cases an attribute that appertains to a given element may be placed in any of several locations with the same meaning. In those cases HotSpot has a preferred location.
Only the following attributes are permitted:
[[noreturn]]
The following attributes are expressly forbidden:
[[carries_dependency]]
- Related to
memory_order_consume
.[[deprecated]]
- Not relevant in HotSpot code.alignof
(n2341)
Sized deallocation (n3778)
Static assertions (n1720)
Right angle brackets (n1757)
Default template arguments for function templates (CWG D226)
Template aliases (n2258)
Delegating constructors (n1986)
Explicit conversion operators (n2437)
Standard Layout Types (n2342)
Defaulted and deleted functions (n2346)
Dynamic initialization and destruction with concurrency (n2660)
final
virtual specifiers for classes and virtual
functions (n2928),
(n3206),
(n3272)
override
virtual specifiers for virtual functions
(n2928),
(n3206),
(n3272)
Unrestricted Unions (n2544)
New string and character literals
HotSpot doesn't need any of the new character and string literal types.
User-defined literals (n2765) — User-defined literals should not be added casually, but only through a proposal to add a specific UDL.
Inline namespaces (n2535) — HotSpot makes very limited use of namespaces.
using namespace
directives. In particular, don't use
using namespace std;
to avoid needing to qualify Standard
Library names.
Propagating exceptions (n2179) — HotSpot does not permit the use of exceptions, so this feature isn't useful.
Avoid non-local variables with non-constexpr initialization. In particular, avoid variables with types requiring non-trivial initialization or destruction. Initialization order problems can be difficult to deal with and lead to surprises, as can destruction ordering. HotSpot doesn't generally try to cleanup on exit, and running destructors at exit can also lead to problems.
Avoid most operator overloading, preferring named functions. When operator overloading is used, ensure the semantics conform to the normal expected behavior of the operation.
Avoid most implicit conversion constructors and (implicit or
explicit) conversion operators. (Note that conversion to
bool
isn't needed in HotSpot code because of the "no
implicit boolean" guideline.)
Avoid goto
statements.
This list is incomplete; it serves to explicitly call out some features that have not yet been discussed.