Add Motivation and Bib

This commit is contained in:
Andreas Stadelmeier 2024-07-01 00:12:38 +02:00
parent df402dfb2b
commit 0373eb69e8
3 changed files with 5857 additions and 54 deletions

View File

@ -26,19 +26,20 @@
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{enumitem}
\usepackage{xcolor}
%\usepackage{amsthm}
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\title{Global Type Inference for Java using SAT Solvers}
\title{Global Type Inference for Java using Answer Set Programming}
\author{Andreas Stadelmeier
\institute{DHBW\\ Stuttgart}
\email{a.stadelmeier@hb.dhbw-stuttgart.de}
}
\newcommand{\authorrunning}{Andreas Stadelmeier}
\newcommand{\titlerunning}{Type Unification to SAT}
\newcommand{\titlerunning}{Type Unification with ASP}
\hypersetup{
bookmarksnumbered,
pdftitle = {\titlerunning},
@ -57,14 +58,16 @@
%
\begin{abstract}
Global type inference for Java is able to compute correct types for an input program
which has no type annotations at all.
Global type inference for Featherweigt Generic Java is NP-hard.
Former implementations of the algorithm struggled with bad runtime performance.
In this paper we translated the type inference algorithm for Featherweight Generic Java to an
which has no type annotations at all, but turns out to be NP-hard.
Former implementations of the algorithm therefore struggled with bad runtime performance.
In this paper we translate the type inference algorithm for Featherweight Generic Java to an
Answer Set Program.
Answer Set Programming (ASP) promises to solve complex computational search problems
as long as they are finite domain.
This paper shows that it is possible to implement global type inference for FGJ as an ASP program.
% We also show soundness, completeness and termination for our ASP implementation.
Another advantage is that the specification of the algorithm can be translated one on one to ASP code
leading to less errors in the respective implementation.
%TODO: Is this correct?
\end{abstract}
%
@ -94,6 +97,94 @@ fast execution times.
These programs tend to be in exponential time upper bounded by the amount of ambiguous method calls and subtype relations.
%TODO: Example with multiple calls to m.toString(). All possibilities are checked in a naive implementation!
\section{Motivation}
The nature of the global type inference algorithm causes the search space of the unification algorithm to
increase exponentially.
Java allows overloaded methods causing our type inference algorithm to create so called Or-Constraints.
This can also happen if multiple classes implement a method with the same name and the same amount of parameters.
Given the following input program we do not know, which method \texttt{self} is meant for the method call
\texttt{var.self()}, because there is no type annotation for \texttt{var}.
%\begin{figure}[h]
%\begin{minipage}{0.49\textwidth}
\begin{lstlisting}
class C1 {
C1 self(){
return this;
}
}
class C2 {
C2 self(){
return this;
}
}
class Example {
untypedMethod(var){
return var.self(); (*@$\implies \set{\tv{var} \lessdot \type{C1}, \tv{ret} \doteq \type{C1} }\ ||\ \set{\tv{var} \lessdot \type{C2}, \tv{ret} \doteq \type{C2}}$@*)
}
}
\end{lstlisting}
% \end{minipage}%
% \hfill
% \begin{minipage}{0.49\textwidth}
% \begin{lstlisting}[style=constraints]
% (*@$\tv{var} \lessdot \type{C1} | $@*),
% (*@$\sigma(\tv{xp}) = \wctype{\wildcard{A}{\type{String}}{\bot}}{List}{\rwildcard{A}}$@*)
% \end{lstlisting}
% \end{minipage}
%\end{figure}
An Or-Constraint like $\set{\tv{var} \lessdot \type{C1}, \tv{ret} \doteq \type{C1} }\ ||\ \set{\tv{var} \lessdot \type{C2}, \tv{ret} \doteq \type{C2}}$
means that either the the constraint set $\set{\tv{var} \lessdot \type{C1}, \tv{ret} \doteq \type{C1} }$
or $\set{\tv{var} \lessdot \type{C2}, \tv{ret} \doteq \type{C2}}$ has to be satisfied.
In this example the type placeholder $\tv{var}$ is a placeholder for the type of the \texttt{var} variable.
The $\tv{ret}$ placeholder represents the return type of the \texttt{untypedMethod} method.
If we set the type of \texttt{var} to \texttt{C1}, then the return type of the method will be \texttt{C1} aswell.
If we set it to \texttt{C2} then also the return type will be \texttt{C2}.
There are two possibilities therefore the Or-Constraint.
If we chain multiple overloaded method calls together we end up with multiple Or-Constraints.
The type unification algorithm \unify{} only sees the supplied constraints and has to potentially try all combinations of them.
This is proven in \cite{TIforGFJ} and is the reason why type inference for Featherweight Generic Java is NP-hard.
Let's have a look at the following alternation of our example:
\begin{lstlisting}
untypedMethod(var){
return var.self().self().self().self();
}
\end{lstlisting}
Now there are four chained method calls leading to four Or-Constraints:
\begin{align*}
\set{\tv{var} \lessdot \type{C1}, \tv{r1} \doteq \type{C1} }\ ||\ \set{\tv{var} \lessdot \type{C2}, \tv{r1} \doteq \type{C2}} \\
\set{\tv{r1} \lessdot \type{C1}, \tv{r2} \doteq \type{C1} }\ ||\ \set{\tv{r1} \lessdot \type{C2}, \tv{r2} \doteq \type{C2}} \\
\set{\tv{r2} \lessdot \type{C1}, \tv{r3} \doteq \type{C1} }\ ||\ \set{\tv{r2} \lessdot \type{C2}, \tv{r3} \doteq \type{C2}} \\
\set{\tv{r3} \lessdot \type{C1}, \tv{ret} \doteq \type{C1} }\ ||\ \set{\tv{r3} \lessdot \type{C2}, \tv{ret} \doteq \type{C2}} \\
\end{align*}
The placeholder $\tv{r1}$ stands for the return type of the first call to \texttt{self} and
$\tv{r2}$ for the return type of the second call and so on.
It is obvious that this constraint set only has two solutions.
The variable \texttt{var} and the return type of the method aswell as all intermediate placeholders $\tv{r1}-\tv{r3}$
get either the type \texttt{C1} or \texttt{C2}.
A first prototype implementation of the \unify{} algorithm simply created the cartesian product of all Or-Constraints,
16 possibilities in this example, and iteratively tried all of them.
This leads to a exponential runtime increase with every added overloaded method call.
Eventhough the current algorithm is equipped with various optimizations as presented in \cite{plue181} and \cite{plue231},
there is still a runtime increas sensible when dealing with many Or-Constraints.
Our global type inference algorithm should construct type solutions in real time.
Then it can effectively used as a Java compiler which can deal with large inputs of untyped Java code.
Another use case scenario is as an editor plugin which helps a Java programmer by enabling him to write untyped Java code
and letting our tool insert the missing type statements.
For both applications a short execution time is vital.
Answer Set programming promises to solve complex search problems in a highly optimized way.
The idea in this paper is to implement the algorithm presented in \cite{TIforGFJ}
as an ASP program and see how well it handles our type unification problem.
\section{Subtyping}
\begin{mathpar}
\inferrule[Refl]{}{
@ -125,6 +216,9 @@ These programs tend to be in exponential time upper bounded by the amount of amb
}
\end{mathpar}
%Subtyping has no bounds for generic type parameters.
% but this is propably not needed
\section{Unify}
Input: Every type placeholder must have an upper bound.
Output: Every $\tv{a} \lessdot \type{T}$ constraint gets a
@ -136,28 +230,22 @@ We have to formulate the unification algorithm with implication rules.
Those can be directly translated to ASP.
\label{sec:implicationRules}
\begin{mathpar}
\inferrule[Subst]{
\tv{a} \doteq \type{N}
}{
\tv{a} \mapsto \type{N}
}
\and
\inferrule[Subst-L]{
\tv{a} \mapsto \type{T}_1 \\
\tv{a} \doteq \type{T}_1 \\
\tv{a} \lessdot \type{T}_2
}{
\type{T}_1 \lessdot \type{T}_2
}
\and
\inferrule[Subst-R]{
\tv{a} \mapsto \type{T}_1 \\
\tv{a} \doteq \type{T}_1 \\
\type{T}_2 \lessdot \tv{a}
}{
\type{T}_2 \lessdot \type{T}_1
}
\and
\inferrule[Subst-Equal]{
\tv{a} \mapsto \type{T}_1 \\
\tv{a} \doteq \type{T}_1 \\
\tv{a} \doteq \type{T}_2
}{
\type{T}_1 \doteq \type{T}_2
@ -176,12 +264,19 @@ Those can be directly translated to ASP.
}
\and
\inferrule[Subst-Param]{
\tv{a} \mapsto \type{S} \\
\tv{a} \doteq \type{S} \\
\type{T} \doteq \exptype{C}{\type{T}_1 \ldots, \tv{a}, \ldots \type{T}_n} \\
}{
\type{T} \doteq \exptype{C}{\type{T}_1, \ldots \type{S}, \ldots \type{T}_n}
}
\and
\inferrule[Subst-Param']{
\tv{a} \doteq \type{S} \\
\tv{b} \lessdot \exptype{C}{\type{T}_1 \ldots, \tv{a}, \ldots \type{T}_n} \\
}{
\tv{b} \lessdot \exptype{C}{\type{T}_1, \ldots \type{S}, \ldots \type{T}_n}
}
\and
\inferrule[S-Object]{}{\tv{a} \lessdot \type{Object}}
\and
\inferrule[Match]{
@ -224,10 +319,10 @@ Those can be directly translated to ASP.
\begin{mathpar}
\text{Apply only once per constraint:}\quad
\inferrule[Super]{
\type{T} \lessdot \tv{a}\\
\type{T} <: \type{N}
\type{N} \lessdot \tv{a}\\
\type{N} <: \type{N}'
}{
\tv{a} \doteq \type{N}
\tv{a} \doteq \type{N}'
}
\end{mathpar}
@ -254,40 +349,11 @@ Those can be directly translated to ASP.
Result:
\begin{mathpar}
% \inferrule[Solution]{
% \tv{a} \doteq \type{N} \\
% \tv{a} \notin \type{N}
% }{
% \sigma(\tv{a}) = \type{N}
% }
\inferrule[Solution]{
\tv{a} \doteq \type{G}
}{
\sigma(\tv{a}) = \type{G}
}
% \and
% \inferrule[Generic]{
% \tv{a} \lessdot \type{N} %, \ldots, \tv{a} \lessdot \exptype{C_n}{\ol{T_n}} \\
% \\
% \text{not}\ \tv{a} \doteq \type{N}'
% }{
% \tv{a} \mapsto \type{A}
% }
% \and
% \inferrule[Solution-Sub]{
% \tv{a} \lessdot \exptype{C_1}{\ol{T_1}}, \ldots, \tv{a} \lessdot \exptype{C_n}{\ol{T_n}} \\
% \forall i: \type{C_m} << \type{C_i} \\
% \text{not}\ \tv{a} \doteq \type{N}
% }{
% \Delta(\type{A}) = \exptype{C_m}{\ol{T_m}} \\ \sigma(\tv{a}) = \type{A}
% }
% \and
% \inferrule[Solution-Gen]{
% \tv{a} \lessdot \type{G}_1, \ldots, \tv{a} \lessdot \type{G}_n \\
% \forall i: \type{G} <: \type{G}_i \\
% }{
% \Delta(\type{A}) = \type{G} \\ \sigma(\tv{a}) = \type{A}
% }
\and
\inferrule[Solution-Gen]{
\tv{a} \lessdot \type{C}_1, \ldots, \tv{a} \lessdot \type{C}_n \\
@ -295,6 +361,22 @@ Result:
}{
\tv{a} \doteq \type{C}_m
}
% \and
% \inferrule[Solution-Gen]{
% \tv{a} \lessdot \type{C}\\
% \sigma(\tv{a}) = \emptyset
% }{
% \tv{a} \doteq \type{A} \\ \sigma'(\tv{a}) = \type{A}
% }
% \and
% \inferrule[Solution-Gen]{
% \tv{a} \lessdot \type{G} \\
% \tv{a} \lessdot \type{G}_1, \ldots, \tv{a} \lessdot \type{G}_n \\
% \forall i: \type{G} << \type{G}_i \\
% \sigma'(\tv{a}) = \type{A}
% }{
% \Delta(\type{A}) = \type{G}
% }
\end{mathpar}
Fail:
@ -314,8 +396,15 @@ Fail:
\emptyset
}
\and
\inferrule[Fail-Generic]{
\type{X} \doteq \type{T}\\
\type{X} \neq \type{T}
}{
\emptyset
}
\and
\inferrule[Fail-Sigma]{
\tv{a} \mapsto \type{N} \\
\tv{a} \doteq \type{N} \\
\tv{a} \in \type{N}
}{
\emptyset
@ -347,14 +436,48 @@ The algorithm terminates if every type placeholder in the input constraint set h
The implication rules defined in chapter \ref{sec:implicationRules} can be translated to an ASP program.
\begin{tabular}{l | r}
$\exptype{C}{\ol{X}}$ & \texttt{type("C", paramX)}\\
$\tv{a}$ & \texttt{tph("a")}\\
$\exptype{C}{}$ & \texttt{type("C", null)}\\
$\exptype{C}{\ol{X}}$ & \texttt{type("C", params(\ldots))}\\
\end{tabular}
\newcommand{\aspify}{\nabla}
\begin{align*}
\aspify(\tv{a}) =& \texttt{tph("a")}\\
\exptype{C}{} =& \texttt{type("C", null)}\\
\exptype{C}{\type{T}_1, \type{T}_2, \ldots} =& \texttt{type("C", params(}\aspify(\type{T}_1), \aspify(\type{T}_1), \ldots\texttt{))}\\
\aspify(\type{T} \doteq \type{T}') =& \texttt{equalcons}(\aspify(\type{T}), \aspify(\type{T}')) \\
\aspify(\type{T} \lessdot \type{T}') =& \texttt{subcons}(\aspify(\type{T}), \aspify(\type{T}')) \\
\aspify{} \left(\inferrule{
c_1, c_2
}{
c
}\right) =& \aspify(c)\ \texttt{:-} \aspify(c_1), \aspify(c_2)
\end{align*}
The S-Class rule contains a substitution and has to be encoded with variables.
Given a extends relation $\texttt{class}\ \exptype{C}{\type{X}} \triangleleft \exptype{D}{\type{X}}$
we generate the ASP code:\\
{\small{\texttt{subtype(type("C", params(X)), type("D", params(X))) :- subtype(type("C", params(X))).}}}
Capital letters like \texttt{X} are variables in ASP and the former statement causes any
literal like \texttt{subtype(type("C", params(tph("a"))))} leads to the literal
\texttt{subtype(type("C", params(tph("a"))), type("D", params(tph("a"))))}.
\section{Proofs}
\begin{lemma}{Substitution}
\begin{lemma}{Substitution:}
For every $\tv{a} \doteq \type{T}$ and every constraint $c$,
there also exists a constraint $[\type{T}/\tv{a}]c$.
\noindent
\normalfont
This lemma proofs that equal constraints lead to a substitution in every other constraint.
For exmaple $\tv{b} \doteq \type{String}$ and $\tv{b} \lessdot \exptype{Comparable}{\tv{b}}$
lead to a constraint $\type{String} \lessdot \exptype{Comparable}{\type{String}}$
\end{lemma}
\textit{Proof:}
%TODO
\section{Termination}
@ -369,6 +492,9 @@ Otherwise there has to be a loop and this woul lead to an incorrect constraint s
The Subst-Param rule can only be applied a finite number of times.
Due to the substitution lemma and the Sigma-Fail rule the Subst-Param rule can only be applied once per type placeholder.
The Subst-Param rule can only be applied once per type placeholder.
If $\type{T} \doteq \exptype{C}{\tv{a}}$ is substituted to $\type{T} \doteq \exptype{C}{\type{N}}$
then there is no $\tv{a} \in \type{N}$.
\section{Completeness}
To proof completeness we have to show that every type can be replaced by a placeholder in a correct constraint set.

5677
martin.bib Executable file

File diff suppressed because it is too large Load Diff

View File

@ -7,7 +7,7 @@
%breaklines=true,
showstringspaces=false,
%breakatwhitespace=true,
basicstyle=\ttfamily\fontsize{8}{9.6}\selectfont, %\footnotesize
basicstyle=\ttfamily, %\footnotesize
escapeinside={(*@}{@*)},
captionpos=t,float,abovecaptionskip=-\medskipamount
}