Ecoop2024_TIforWildFJ/introduction.tex
2024-02-11 21:59:57 +01:00

379 lines
16 KiB
TeX

\section{Type Inference for Java}
%The goal is to find a correct typing for a given Java program.
Type inference for Java has many use cases and could be used to help programmers by inserting correct types for them,
Finding better type solutions for already typed Java programs (for example more generical ones),
or allowing to write typeless Java code which is then type infered and thereby type checked by our algorithm.
The algorithm proposed in this paper can determine a correct typing for the untyped Java source code example shown in figure \ref{fig:intro-example-typeless}.
Our algorithm is also capable of finding solutions involving wildcards as shown in figure \ref{fig:intro-example-typed}.
%This paper extends a type inference algorithm for Featherweight Java \cite{TIforFGJ} by adding wildcards.
%The last step to create a type inference algorithm compatible to the Java type system.
The algorithm presented in this paper is a slightly improved version of the one in \cite{TIforFGJ} including wildcard support.
%a modified version of the \unify{} algorithm presented in \cite{plue09_1}.
The input to the type inference algorithm is a Featherweight Java program (example in figure \ref{fig:nested-list-example-typeless}) conforming to the syntax shown in figure \ref{fig:syntax}.
The \fjtype{} algorithm calculates constraints based on this intermediate representation,
which are then solved by the \unify{} algorithm
resulting in a correctly typed program (see figure \ref{fig:nested-list-example-typed}).
We support capture conversion and Java style method calls.
This requires existential types in a form which is not denotable by Java syntax \cite{aModelForJavaWithWildcards}.
The algorithm is able find the correct type for the method \texttt{m} in the following example:
\begin{verbatim}
<X> Pair<X,X> make(List<X> l)
Boolean compare(Pair<X,X> p)
List<?> b;
Boolean m() = this.compare(this.make(b));
\end{verbatim}
We present a novel approach to deal with existential types and capture conversion during constraint unification.
The algorithm is split in two parts. A constraint generation step and an unification step.
We proof soundness and aim for a good compromise between completeness and time complexity.
Our algorithm finds a correct type solution for the following example, where the Java local type inference fails:
\begin{verbatim}
class SuperPair<A,B>{
A a;
B b;
}
class Pair<A,B> extends SuperPair<B,A>{
A a;
B b;
<X> X choose(X a, X b){ return b; }
String m(List<? extends Pair<Integer, String>> a, List<? extends Pair<Integer, String>> b){
return choose(choose(a,b).value.a,b.value.b);
}
}
\end{verbatim}
\begin{figure}[tp]
\begin{subfigure}[t]{\linewidth}
\begin{lstlisting}[style=fgj]
class List<A> {
List<A> add(A v) { ... }
}
class Example {
m(l, la, lb){
return l
.add(la.add(1))
.add(lb.add("str"));
}
}
\end{lstlisting}
\caption{Java method with missing return type}
\label{fig:nested-list-example-typeless}
\end{subfigure}
~
% \begin{subfigure}[t]{\linewidth}
% \begin{lstlisting}[style=tfgj]
% class List<A> {
% List<A> add(A v) { ... }
% }
% class Example {
% m(l, la, lb){
% return let r2 : (*@$\wctype{\wildcard{X}{\type{Object}}{\bot}}{List}{\rwildcard{X}}$@*) = {
% let r1 : (*@$\wctype{\wildcard{X}{\type{Object}}{\bot}}{List}{\rwildcard{X}}$@*) = l in {
% let p1 : (*@$\exptype{List}{\type{Integer}}$@*) = {
% let xa = la in xa.add(1)
% } in x1.add(p1)
% } in {
% let p2 = {
% let xb = lb in xb.add("str")
% } in x2.add(p2)
% };
% }
% }
% \end{lstlisting}
% \caption{Featherweight Java Representation}
% \label{fig:nested-list-example-let}
% \end{subfigure}
% ~
\begin{subfigure}[t]{\linewidth}
\begin{lstlisting}[style=tfgj]
class List<A> {
List<A> add(A v) { ... }
}
class Example {
m(List<List<? extends Object>> l, List<Integer> la, List<String> lb){
return l
.add(la.add(1))
.add(lb.add("str"));
}
}
\end{lstlisting}
\caption{Java Representation}
\label{fig:nested-list-example-typed}
\end{subfigure}
%\caption{Example code}
%\label{fig:intro-example-code}
\end{figure}
%TODO
% The goal is to proof soundness in respect to the type rules introduced by \cite{aModelForJavaWithWildcards}
% and \cite{WildcardsNeedWitnessProtection}.
\begin{figure}[tp]
\begin{subfigure}[t]{0.49\linewidth}
\begin{lstlisting}[style=fgj]
genList() {
if( ... ) {
return new List<String>();
} else {
return new List<Integer>();
}
}
\end{lstlisting}
\caption{Java method with missing return type}
\label{fig:intro-example-typeless}
\end{subfigure}
~
\begin{subfigure}[t]{0.49\linewidth}
\begin{lstlisting}[style=tfgj]
List<?> genList() {
if( ... ) {
return new List<String>();
} else {
return new List<Integer>();
}
}
\end{lstlisting}
\caption{Correct type}
\label{fig:intro-example-typed}
\end{subfigure}
%\caption{Example code}
%\label{fig:intro-example-code}
\end{figure}
% \subsection{Wildcards}
% Java subtyping involving generics is invariant.
% For example \texttt{List<String>} is not a subtype of \texttt{List<Object>}.
% %Wildcards introduce variance by allowing \texttt{List<String>} to be a subtype of \texttt{List<?>}.
% \texttt{List<Object>} is not a valid return type for the method \texttt{genList}.
% The type inference algorithm has to find the correct type involving wildcards (\texttt{List<?>}).
\subsection{Global Type Inference}
A global type inference algorithm works on an input with no type annotations at all.
%TODO: Describe global type inference and lateron why it is so hard to
\subsection{Java Wildcards}
Wildcards are expressed by a \texttt{?} in Java and can be used as parameter types.
Wildcards can be formalized as existential types \cite{WildFJ}.
\texttt{List<? extends Object>} and \texttt{List<? super String>} are both wildcard types
denoted in our syntax by $\wctype{\wildcard{X}{\type{Object}}{\bot}}{List}{\rwildcard{X}}$ and
$\wctype{\wildcard{X}{\type{Object}}{\type{String}}}{List}{\rwildcard{X}}$.
Wildcards add variance to Java type parameters.
In Java a \texttt{List<String>} is not a subtype of \texttt{List<Object>}
even though it seems intuitive with \texttt{String} being a subtype of \texttt{Object}.
Here wildcards come into play.
$\exptype{List}{String} <: \wctype{\wildcard{X}{\bot}{\type{Object}}}{List}{\rwildcard{X}}$
means \texttt{List<String>} is a subtype of \texttt{List<? extend Object>}.
The syntax used here allows for wildcard parameters to have a name, an uppper and lower bound at the same time,
and a type they are bound to.
In this case the name is $\rwildcard{X}$ and it's bound to the the type \texttt{List}.
Those properties are needed to formalize capture conversion.
Polymorphic method calls need to be wraped in a process which \textit{opens} existential types \cite{addingWildcardsToJava}.
In Java this is done implicitly in a process called capture conversion.
When calling a polymorphic method like \texttt{<X> List<X> m(List<X> l1, List<X> l2)} with a \texttt{List<?>}
it is not possible to substitute \texttt{?} for \texttt{X}.
This would lead to the method header \texttt{List<?> m(List<?> l1, List<?> l2)}
where now a method invocation with \texttt{List<String>} and \texttt{List<Integer>} would be possible,
because both are subtypes of \texttt{List<?>}.
Capture conversion solves this problem by generating a fresh type variable for every wildcard.
Calling \texttt{<X> X head(List<X> l1)} with the type \texttt{List<?>} ($\wctype{\rwildcard{X}}{List}{\rwildcard{X}}$)
creates a fresh type variable $\rwildcard{Y}$ resulting in
$\generics{\rwildcard{Y}}\texttt{head}(\exptype{List}{\rwildcard{Y}})$
with $\rwildcard{Y}$ being used as generic parameter \texttt{X}.
The $\rwildcard{Y}$ in $\exptype{List}{\rwildcard{Y}}$ is a free variable now.
%TODO: Read taming wildcards and see if we solve some of the problems presented in section 5 and 6
% Additionally they can hold a upper or lower bound restriction like \texttt{List<? super String>}.
% Our representation of this type is: $\wctype{\wildcard{X}{\type{String}}{\type{Object}}}{List}{\rwildcard{X}}$
% Every wildcard has a name ($\rwildcard{X}$ in this case) and an upper and lower bound (respectively \texttt{Object} and \texttt{String}).
% \subsection{Extensibility} % NOTE: this thing is useless, because final local variables do not need to contain wildcards
% In \cite{WildcardsNeedWitnessProtection} capture converson is made explicit with \texttt{let} statements.
% We chain let statements in a way that emulates Java behaviour. Allowing the example in \cite{aModelForJavaWithWildcards}
% % TODO: Explain the advantage of constraints and how we control the way capture conversion is executed
% But it would be also possible to alter the constraint generation step to include additional features.
% Final variables or effectively final variables could be expressed with a
% \begin{verbatim}
% <X> concat(List<X> l1, List<X> l2){...}
% \end{verbatim}
% \begin{minipage}{0.50\textwidth}
% Java:
% \begin{verbatim}
% final List<?> l = ...;
% concat(l, l);
% \end{verbatim}
% \end{minipage}%
% \begin{minipage}{0.50\textwidth}
% Featherweight Java:
% \begin{verbatim}
% let l : X.List<X> = ... in
% concat(l, l)
% \end{verbatim}
% \end{minipage}
\subsection{Challenges}\label{challenges}
%TODO: Wildcard subtyping is infinite see \cite{TamingWildcards}
One problem is the divergence between denotable and expressable types in Java \cite{semanticWildcardModel}.
A wildcard in the Java syntax has no name and is bound to its enclosing type.
\texttt{List<List<?>>} equates to $\exptype{List}{\wctype{\rwildcard{X}}{List}{\rwildcard{X}}}$.
During type checking intermediate types like $\wctype{\rwildcard{X}}{List}{\exptype{List}{\rwildcard{X}}}$
or $\wctype{\rwildcard{X}}{Pair}{\rwildcard{X}, \rwildcard{X}}$ can emerge, which have no equivalent in the Java syntax.
Our type inference algorithm uses existential types internally but spawns type solutions compatible with Java.
The introduction of wildcards adds additional challenges.
% we cannot replace every type variable with a wildcard
Type variables can also be used as type parameters, for example
$\exptype{List}{String} \lessdot \exptype{List}{\tv{a}}$.
A problem arises when replacing type variables with wildcards.
% Wildcards are not reflexive.
% ( on the equals property ), every wildcard has to be capture converted when leaving its scope
% do not substitute free type variables
Lets have a look at two examples:
\begin{itemize}
\item \begin{example} \label{intro-example1}
The first one is a valid Java program.
The type \texttt{List<? super String>} is \textit{capture converted} to a fresh type variable $\rwildcard{X}$
which is used as the generic method parameter \texttt{A}.
Java uses capture conversion to replace the generic \texttt{A} by a capture converted version of the
\texttt{? super String} wildcard.
Knowing that the type \texttt{String} is a subtype of any type the wildcard \texttt{? super String} can inherit
it is safe to pass \texttt{"String"} for the first parameter of the function.
\begin{verbatim}
<A> List<A> add(A a, List<A> la) {}
List<? super String> list = ...;
add("String", list);
\end{verbatim}
The constraints representing this code snippet are:
\begin{displaymath}
\type{String} \lessdotCC \wtv{a},\,
\wctype{\wildcard{X}{\type{Object}}{\type{String}}}{List}{\rwildcard{X}} \lessdotCC \exptype{List}{\wtv{a}}
\end{displaymath}
Here $\sigma(\tv{a}) = \rwildcard{X}$ is a valid solution.
\end{example}
\item \begin{example}\label{intro-example2}
This example displays an incorrect Java program.
The method call to \texttt{concat} with two wildcard lists is unsound.
Each list could be of a different kind and therefore the \texttt{concat} cannot succeed.
\begin{verbatim}
<A> List<A> concat(List<A> l1, List<A> l2) {}
List<?> list = ... ;
concat(list, list);
\end{verbatim}
The constraints for this example are:
$\wctype{\wildcard{X}{\type{Object}}{\bot}}{List}{\rwildcard{X}} \lessdot \exptype{List}{\wtv{a}}, \\
\wctype{\wildcard{X}{\type{Object}}{\bot}}{List}{\rwildcard{X}} \lessdot \exptype{List}{\wtv{a}}$
%$\exptype{List}{?} \lessdot \exptype{List}{\tv{a}},
%\exptype{List}{?} \lessdot \exptype{List}{\tv{a}}$
Remember that the given constraints cannot have a valid solution.
%This is due to the fact that the wildcard variables cannot leave their scope.
In this example the \unify{} algorithm should not replace $\tv{a}$ with the captured wildcard $\rwildcard{X}$.
\end{example}
\end{itemize}
The \unify{} algorithm only sees the constraints with no information about the program they originated from.
The main challenge was to find an algorithm which computes $\sigma(\tv{a}) = \rwildcard{X}$ for example \ref{intro-example1} but not for example \ref{intro-example2}.
% Solution: A type variable can only take one type and not a wildcard type.
% A wildcard type is only treated like a wildcard while his definition is in scope (during the reduce rule)
% \subsection{Capture Conversion}
% \begin{verbatim}
% <A> List<A> add(A a, List<A> la) {}
% List<? super String> list = ...;
% add("String", list);
% \end{verbatim}
% The constraints representig this code snippet are:
% $\type{String} \lessdot \tv{a},
% \wctype{\wildcard{X}{\type{Object}}{\type{String}}}{List}{\rwildcard{X}} \lessdot \exptype{List}{\tv{a}}$
% Under the hood the typechecker has to find a replacement for the generic method parameter \texttt{A}.
% Java allows a method \texttt{<A> List<A> add(A a, List<A> la)} to be called with
% \texttt{String} and \texttt{List<? super String>}.
% A naive approach would be to treat wildcards like any other type and replace \texttt{A}
% with \texttt{? super String}.
% Generating the method type \texttt{List<? super String> add(? super String a, List<? super String> la)}.
% This does not work for a method like
% \texttt{<A> void merge(List<A> l1, List<A> l2)}.
% It is crucial for this method to be called with two lists of the same type.
% Substituting a wildcard for the generic \texttt{A} leads to
% \texttt{void merge(List<?> l1, List<?> l2)},
% which is unsound.
% Capture conversion utilizes the fact that instantiated classes always have an actual type.
% \texttt{new List<String>()} and \texttt{new List<Object>()} are
% valid instances.
% \texttt{new List<? super String>()} is not.
% Every type $\wcNtype{\Delta}{N}$ contains an underlying instance of a type $\type{N}$.
% Knowing this fact allows to make additional assumptions.
% \wildFJ{} type rules allow for generic parameters to be replaced by wildcard types.
% This is possible because wildcards are split into two parts.
% Their declaration at the environment part $\Delta$ of a type $\wcNtype{\Delta}{N}$
% and their actual use in the body of the type $\type{N}$.
% Replacing the generic \texttt{A} in the type \texttt{List<A>}
% by a wildcard $\type{X}$ will not generate the type \texttt{List<?>}.
% \texttt{List<?>} is the equivalent of $\wctype{\wildcard{X}{\type{Object}}{\bot}}{List}{\type{X}}$.
% The generated type here is $\exptype{List}{\type{X}}$, which has different subtype properties.
% There is no information about what the type $\exptype{List}{\type{X}}$ actually is.
% It has itself as a subtype and for example a type like $\exptype{ArrayList}{\type{X}}$.
% A method call for example only works with values, which are correct instances of classes.
% Capture conversion automatically converts wildcard type to a concrete class type,
% which then can be used as a parameter for a method call.
% In \wildFJ{} this is done via let statements.
% Written in FJ syntax: $\type{N}$ is a initialized type and
% $\wcNtype{\Delta}{N}$ is a type containing wildcards $\Delta$.
% It is crucial for the wildcard names to never be used twice.
% Otherwise the members of a list with type $\wctype{\type{X}}{List}{\type{X}}$ and
% a list with type $\wctype{\type{X}}{List}{\type{X}}$ would be the same. $\type{X}$ in fact.
% The let statement adds wildcards to the environment.
% Those wildcards are not allowed to escape the scope of the let statement.
% Which is a problem for our type inference algorithm.
% The capture converted version of the type $\wctype{\rwildcard{X}}{Box}{\rwildcard{X}}$ is
% $\exptype{Box}{\rwildcard{X}}$.
% The captured type is only used as an intermediate type during a method call.