Ecoop2024_TIforWildFJ/introduction.tex
2024-05-31 00:10:22 +02:00

386 lines
19 KiB
TeX

%TODO:
% -Explain <c
% - Kapitel 1.2 + 1.3 ist noch zu komplex
% - Constraints und Unifikation erklären, bevor Unifikation erwähnt wird
%Inhalt:
% Global type inference example (the sameList example)
% Wildcards are needed because Java has side effects
% Capture Conversion
% Explain difference between local and global type inference
% \begin{recap}\textbf{TI for FGJ without Wildcards:}
% \TFGJ{} generates subtype constraints $(\type{T} \lessdot \type{T})$ consisting of named types and type placeholders.
% For example the method invocation \texttt{concat(l, new Object())} generates the constraints
% $\tv{l} \lessdot \exptype{List}{\tv{a}}, \type{Object} \lessdot \tv{a}$.
% Subtyping without the use of wildcards is invariant \cite{FJ}:
% Therefore the only possible solution for the type placeholder $\tv{a}$ is $\tv{a} \doteq \type{Object}$. % in Java and Featherweight Java.
% The correct type for the variable \texttt{l} is $\exptype{List}{\type{Object}}$ (or a direct subtype).
% %- usually the type of e must be subtypes of the method parameters
% %- in case of a polymorphic method: type placeholders resemble type parameters
% The type inference algorithm for Featherweight Generic Java \cite{TIforFGJ} (called \TFGJ{}) is complete and sound:
% It is able to find a type solution for a Featherweight Generic Java program, which has no type annotations at all,
% if there is any.
% It's only restriction is that no polymorphic recursion is allowed.
% \end{recap}
\section{Introduction}
\label{sec:introduction}
Type inference is a hallmark of functional programming, where it
improves readability and conciseness while
enabling safe rapid prototyping by allowing the compiler to deduce
types automatically without explicit type annotations. It allows
developers to safely refactor their programs before fixing function
signatures. In object-oriented programming (OOP), however, the use of
type inference is more restricted. It is often employed for local
variables and the instantiation of generic type parameters, offering
similar readability and conciseness benefits as in functional
programming. In languages like Java, the overall architecture of
method signatures must be designed by the programmer up-front.
Java features a limited form of local type inference for local
variables defined with the \texttt{var} keyword, lambda expressions,
and method calls. Java infers the type of variables defined with
\texttt{var}. The rationale here is that instantiations of generic
types (e.g., maps of maps) are often unwiedly and they can be easily
inferred from the initializing expression. The type of a lambda
expression is inferred from the target type, i.e., the type of the
method argument that accepts the lambda. The rationale is the intended
use of lambdas as callback functions for listeners etc. For calls
to generic methods, Java infers the most specific instantiation of the
generic parameters for each individual call.
The local type inference features of Java add some convenience, but
programmers still have to provide method signatures beforehand. So there
is scope for a global type inference algorithm that infers
method signatures even if no type information (except class headers
and types of field variables) is given.
We present such a global type inference algorithm for Featherweight
Generic Java with wildcards, a Java core calculus with generics and
wildcards in the tradition of Featherweight Java \cite{FJ} and its
extensions \cite{WildFJ,WildcardsNeedWitnessProtection}. Our approach can also be used for regular Java programs,
but we limit the formal presentation to this core calculus.
%The goal is to find a correct typing for a given Java program.
Our algorithm enables programmers to write Java code with only a
minimum of type annotations (class headers and types of field
variables), it can fill in missing type annotations in partially
type-annotated programs, and it can suggest better (more general)
types for ordinary, typed Java programs.
Listing~\ref{lst:intro-example-typeless} shows an example input of a
method implementation without any type annotation. Our algorithm
infers the type annotations in Listing~\ref{lst:intro-example-typed}:
it adds the type arguments to \texttt{List} at object creation and it
infers the most specific return type \texttt{List<?>}, which is a
wildcard type. Our algorithm is the first to infer wildcard types that
comes with a soundness proof.
Previous work on global type inference for Java either does not
consider wildcards or it simplifies the problem by not modeling key
features of Java wildcards.
%The algorithm proposed in this paper can determine a correct typing for the untyped Java source code example shown in listing \ref{lst:intro-example-typeless}.
%In this case our algorithm would also be able to propose a solution including wildcards as shown in listing \ref{lst:intro-example-typed}.
%This paper extends a type inference algorithm for Featherweight Java \cite{TIforFGJ} by adding wildcards.
%The last step to create a type inference algorithm compatible to the Java type system.
\begin{figure}[tp]
\begin{minipage}{0.43\textwidth}
\begin{lstlisting}[style=java,label=lst:intro-example-typeless,caption=Missing return type]
genList() {
if( ... ) {
return new List(1);
} else {
return new List("Str");
}
}
\end{lstlisting}
\end{minipage}%
\hfill
\begin{minipage}{0.55\textwidth}
\begin{lstlisting}[style=tfgj,caption=Type inference solution,label=lst:intro-example-typed]
List<?> genList() {
if( ... ) {
return new List<Integer>(1);
} else {
return new List<String>("Str");
}
}
\end{lstlisting}
\end{minipage}
\end{figure}
Global Type Inference for Featherweight Java \cite{TIforFGJ} is a
sound, but incomplete inference algorithm for Featherweight Generic
Java. It does not support wildcards, which means that it infers the
return type \texttt{Object} for the method \texttt{genList} in
Listing~\ref{lst:intro-example-typeless}. Like our algorithm, their
algorithm restricts to monomorphic recursion, which leads to
incompleteness.
A type unification algorithm for Java with wildcards \cite{plue09_1} states the same capabilities,
but it does not handle capture conversion correctly because it only supports types which are expressible in Java syntax (more details in
section~\ref{challenges}).
Moreover, it appears that the subtype relation changes depending on whether a type is used as an argument to a method invocation.
We resolve this problem by modeling Java wildcards as existential
types \cite{WildFJ,WildcardsNeedWitnessProtection}, which also serve
as the basis of our soundness proof.
% forces us to define a new kind of subtype constraint. %, the current state of the art
%and is able to deal with types that are not directly denotable in Java.
%The algorithm presented in this paper is able to solve all those challenges correctly
%and it's correctness is proven using a Featherweight Java calculus \cite{WildFJ}.
%But they are all correctly solved by our new type inference algorithm presented in this paper.
\begin{lstlisting}[caption=Part of a valid Java program, style=tfgj, label=lst:tiExample,float=tp]
<T> List<T> emptyList() { ... }
List<String> ls = emptyList();
\end{lstlisting}
Standard Java provides type inference in a restricted form % namely {Local Type Inference}.
which only works for local environments where the surrounding context has known types.
The example in Listing~\ref{lst:tiExample} exhibits the main differences to our global type inference algorithm.
The method call \texttt{emptyList} lacks its type parameters.
Java relies on a matching algorithm \cite{javaTIisBroken} to instantiate \texttt{T} with \texttt{String}
resulting in the correct expected type \texttt{List<String>}.
This local type inference depends on the type annotation on the left side of the assignment.
When calling the \texttt{emptyList} method without this type context
its return value will be inferred as \texttt{List<Object>}.
Therefore, Java rejects the code snippet in Listing~\ref{lst:tiLimit}:
it infers the type \texttt{List<Object>} for
\texttt{emptyList()} instead of the required
$\exptype{List}{\exptype{List}{String}}$.
\begin{lstlisting}[caption=Limitations of Java's Type Inference,
label=lst:tiLimit, float=tp]
emptyList().add(new List<String>())
.get(0)
.get(0); //Typing Error
\end{lstlisting}
%List<String> <. A
%List<A> <: List<B>, B <: List<C>
% B = A and therefore A on the left and right side of constraints.
% this makes matching impossible
Hence, the second call to \texttt{get} produces a type error.
The type inference algorithm presented in this paper correctly instantiates the type parameter \texttt{T} of the \texttt{emptyList}
method with \texttt{List<List<String>>} and render this code snippet correct.
% %motivate constraints:
% To solve this example our Type Inference algorithm will create constraints
% $
% \exptype{List}{\tv{a}} \lessdot \tv{b},
% \tv{b} \lessdot \exptype{List}{\tv{c}},
% \tv{c} \lessdot \exptype{List}{\tv{d}}
% $
%\subsection{Conclusion}
% Additions: TODO
% - Global Type Inference Algorithm, no type annotations are needed
% - Soundness Proof
% - Easy to implement
% - Capture Conversion support
% - Existential Types support
We summarize our contributions:
\begin{itemize}
\item
We introduce the language \tifj{} (section \ref{sec:tifj}),
a Featherweight Java derivative including generics, wildcards, and type inference.
\item
Our algorithm handles existential types in a form which is not
denotable by Java syntax \cite{aModelForJavaWithWildcards}. Thus, we support capture conversion and Java style method calls.
\item
We present a novel approach to deal with existential types and capture conversion during constraint unification.
% \item The algorithm is split in two parts. A constraint generation step and an unification step.
% Input language and constraint generations can be extended without altering the unification part.
\item
We prove soundness and aim for a good compromise between completeness and time complexity.
\end{itemize}
% Our algorithm finds a correct type solution for the following example, where the Java local type inference fails:
% \begin{verbatim}
% class SuperPair<A,B>{
% A a;
% B b;
% }
% class Pair<A,B> extends SuperPair<B,A>{
% A a;
% B b;
% <X> X choose(X a, X b){ return b; }
% String m(List<? extends Pair<Integer, String>> a, List<? extends Pair<Integer, String>> b){
% return choose(choose(a,b).value.a,b.value.b);
% }
% }
% \end{verbatim}
% \subsection{Wildcards}
% Java subtyping involving generics is invariant.
% For example \texttt{List<String>} is not a subtype of \texttt{List<Object>}.
% %Wildcards introduce variance by allowing \texttt{List<String>} to be a subtype of \texttt{List<?>}.
% \texttt{List<Object>} is not a valid return type for the method \texttt{genList}.
% The type inference algorithm has to find the correct type involving wildcards (\texttt{List<?>}).
\section{Java Wildcards}
\label{sec:java-wildcards}
As Java is an imperative language, subtyping for generic types is invariant.
%but it incooperates use-site variance via so called wildcard types.
Even though \texttt{String} is subtype of \texttt{Object},
a \texttt{List<String>} is not a subtype of \texttt{List<Object>}:
because someone might store an \texttt{Integer} in the list, which is
compatible with \texttt{Object}, but not with \texttt{String} (see Listing~\ref{lst:invarianceExample}).
Invariance is overly restrictive in read-only or write-only
contexts. Hence, Java incooperates use-site variance by allowing wildcards (\texttt{?}) in types.
For example, the type \texttt{List<?>} (short for \texttt{List<? extends Object>}, with \texttt{?} being a placeholder for any type)
is a supertype of \texttt{List<String>} and \texttt{List<Object>}.
%The \texttt{?} is a wildcard type which can be replaced by any type as needed.
%
Listing~\ref{lst:wildcardIntro} shows a use of wildcards that renders the assignment \texttt{lo = ls} correct.
The program still does not compile, because the addition of an \texttt{Integer} to \texttt{lo} is still incorrect.
\begin{figure}[tp]
\begin{minipage}{0.48\textwidth}
\begin{lstlisting}[caption=Java Invariance Example,label=lst:invarianceExample]{java}
List<String> ls = ...;
List<Object> lo = ...;
lo = ls; // typing error!
lo.add(new Integer(1));
\end{lstlisting}
\end{minipage}
\hfill
\begin{minipage}{0.5\textwidth}
\begin{lstlisting}[caption=Use-Site Variance Example,label=lst:wildcardIntro]{java}
List<String> ls = ...;
List<? extends Object> lo = ...;
lo = ls; // correct
lo.add(new Integer(1)); // error!
\end{lstlisting}
\end{minipage}
\end{figure}
Wildcard types like \texttt{List<?>} are virtual types, i.e., the run-time
type of an object is always a fully instantiated type like
\texttt{List<String>} or \texttt{List<Object>}.
The issue is that the run-time type underlying a wildcard type can
change at any time, for example when multiple threads share a reference to the same field.
Hence, a wildcard \texttt{?} must be considered a different type everytime it is accessed.
For that reason, the call to the method \texttt{concat} with two wildcard lists in Listing~\ref{lst:concatError} is rejected.
% The \texttt{concat} method does not create a new list,
% but adds all elements from the second argument to the list given as the first argument.
% This is allowed in Java, because both lists are of the polymorphic type \texttt{List<X>}.
% As shown in listing \ref{lst:concatError} this leads to a inconsistent \texttt{List<String>}
% if Java would treat \texttt{?} as a regular type and instantiate the type variable \texttt{X}
% of the \texttt{concat} function with \texttt{?}.
\begin{figure}[tp]
\begin{lstlisting}[caption=Wildcard Example with faulty call to a concat method,label=lst:concatError]{java}
<X> List<X> concat(List<X> l1, List<X> l2){
return l1.addAll(l2);
}
List<?> l1 = new List<String>("foo");
List<?> l2 = new List<Integer>(1); // List containing Integer
concat(l1, l2); // Error! Would concat two different lists
\end{lstlisting}
\end{figure}
To determine the correctness of method calls involving wildcard types Java's typechecker
makes use of a concept called \textbf{capture conversion}.
% was designed to make Java wildcards useful.
% - without capture conversion
% - is used to open wildcard types
% -
One way to formalize this concept is by replacing wildcards with
existential types and modeling capture conversion with suitably
inserted let statements \cite{WildcardsNeedWitnessProtection}.
Our Featherweight Java derivative called \TamedFJ{} is modeled after
Bierhoff's calculus \cite{WildcardsNeedWitnessProtection} (see section~\ref{sec:tifj}).
To express the example in Listing~\ref{lst:wildcardIntro} in our calculus we first translate the wildcard types:
\texttt{List<? extends Object>} becomes
$\wctype{\wildcard{A}{\type{Object}}{\bot}}{List}{\rwildcard{A}}$,
where the existentially bound variable \texttt{A} has a lower bound
$\bot$ and an upper bound $\type{Object}$.
Before we can call the \texttt{add} method on this type we perform
capture conversion by inserting a let statement:
\begin{lstlisting}
let v : (*@$\wctype{\wildcard{A}{\type{Object}}{\bot}}{List}{\rwildcard{A}}$@*) = lo in v.<A>add(new Integer(1));
\end{lstlisting}
The variable \expr{lo} (from Listing~\ref{lst:wildcardIntro}) is
assigned to a new immutable variable \expr{v} with type
$\wctype{\wildcard{A}{\type{Object}}{\bot}}{List}{\rwildcard{X}}$,
but inside the let statement the variable \expr{v} will be treated as
$\exptype{List}{\rwildcard{A}}$.
Here $\rwildcard{A}$ is a fresh variable or a captured wildcard.
The only information we have about $\rwildcard{A}$ is that it is a
supertype of $\bot$ and a subtype of $\type{Object}$
It is important to give the captured wildcard type $\rwildcard{A}$ an unique name which is used nowhere else.
This approach also clarifies why the method call to \texttt{concat}
in listing \ref{lst:concatError} is rejected (see Listing~\ref{lst:concatTamedFJ}).
\begin{figure}[tp]
\begin{lstlisting}[style=TamedFJ,caption=\TamedFJ{} representation of the concat call from listing \ref{lst:concatError}, label=lst:concatTamedFJ]
let l1' : (*@$\wctype{\rwildcard{X}}{List}{\exptype{List}{\rwildcard{X}}}$@*) = l1 in
let l2' : (*@$\wctype{\rwildcard{Y}}{List}{\exptype{List}{\rwildcard{Y}}}$@*) = l2 in
concat(l1', l2') // Error!
\end{lstlisting}
\end{figure}
% % TODO intro to Featherweight Java
% is a formal model of the Java programming language reduced to a core set of instructions.
% - We extend this model by existential types and let expressions.
% - We copy this from \ref{WildFJ} but make type annotations optional
% - Our calculus is called \TamedFJ{}
% - \TamedFJ{} binds every method argument with a let statement.
% To enable the use of wildcards in argument types of a method invocation
% Java uses a process called \textit{Capture Conversion}.
% This behaviour is emulated by our language \TamedFJ{};
% a Featherweight Java \cite{FJ} derivative with added wildcard support
% and a global type inference feature (syntax definition in section \ref{sec:tifj}).
% %\TamedFJ{} is basically the language described by \textit{Bierhoff} \cite{WildcardsNeedWitnessProtection} with optional type annotations.
% Let's have a look at a representation of the \texttt{add} call from the last line in listing \ref{lst:wildcardIntro} with our calculus \TamedFJ{}:
% %The \texttt{add} call in listing \ref{lst:wildcardIntro} needs to be encased by a \texttt{let} statement in our calculus.
% %This makes the capture converion explicit.
% \begin{lstlisting}
% let v : (*@$\wctype{\wildcard{A}{\type{Object}}{\bot}}{List}{\rwildcard{A}}$@*) = lo in v.<A>add(new Integer(1));
% \end{lstlisting}
% The method call is encased in a \texttt{let} statement and
% \expr{lo} is assigned to a new variable \expr{v} of \linebreak[2]
% \textbf{Existential Type} $\wctype{\wildcard{A}{\type{Object}}{\bot}}{List}{\rwildcard{A}}$.
% Our calculus uses existential types \cite{WildFJ} to formalize wildcards:
% \texttt{List<? extends Object>} is translated to $\wctype{\wildcard{X}{\type{Object}}{\bot}}{List}{\rwildcard{X}}$
% and \texttt{List<? super String>} is expressed as $\wctype{\wildcard{X}{\type{Object}}{\type{String}}}{List}{\rwildcard{X}}$.
% The syntax used here allows for wildcard parameters to have a name, an uppper and lower bound,
% and a type they are bound to.
% In this case the name is $\rwildcard{A}$ and it's bound to the the type \texttt{List}.
% Inside the \texttt{let} statement the variable \expr{v} has the type
% $\exptype{List}{\rwildcard{A}}$.
% This is an explicit version of \linebreak[2]
% \textbf{Capture Conversion},
% which makes use of the fact that a concrete type must be behind every wildcard type.
% There is no instantiation of a \texttt{List<?>},
% but there exists some unknown type $\exptype{List}{\rwildcard{A}}$, with $\rwildcard{A}$ inbetween the bounds $\bot$ (bottom type, subtype of all types) and
% \texttt{Object}.
% Inside the body of the let statement \expr{v} is treated as a value with the constant type $\exptype{List}{\rwildcard{A}}$.
% Existential types enable us to formalize \textit{Capture Conversion}.
% Polymorphic method calls need to be wrapped in a process which \textit{opens} existential types \cite{addingWildcardsToJava}.
% In Java this is done implicitly in a process called capture conversion (as proposed in Wild FJ \cite{WildFJ}).