WLP2024/aspUnify.tex

% This is samplepaper.tex, a sample chapter demonstrating the
% LLNCS macro package for Springer Computer Science proceedings;
% Version 2.21 of 2022/01/12
%

\documentclass[]{ceurart}

%% This command is for the conference information
%% CHOOSE the one corresponding to your conference
\conference{ASPOCP 2024: 17$^{th}$ Workshop on Answer Set Programming and Other Computing Paradigms, October, 2024, Dallas, USA.}

\usepackage{underscore}
\usepackage[T1]{fontenc}
% T1 fonts will be used to generate the final print and online PDFs,
% so please use T1 fonts in your manuscript whenever possible.
% Other font encondings may result in incorrect characters.
%
\usepackage{graphicx}
% Used for displaying a sample figure. If possible, figure files should
% be included in EPS format.
%
% If you use the hyperref package, please uncomment the following two lines
% to display URLs in blue roman font according to Springer's eBook style:
%\usepackage{color}
%\renewcommand\UrlFont{\color{blue}\rmfamily}
%\urlstyle{rm}
%
\include{prolog}
\usepackage{mathpartir}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{enumitem}
\usepackage{xcolor}
%\usepackage{amsthm}

\newcommand{\aspAlgo}[0]{\ensuremath{\Omega}}

\title{Global Type Inference for Java using ASP}

\author[1]{Andreas Stadelmeier}[%
orcid=0009-0007-8414-5977,
email=andi@paulus.haus,
%url=http://www. ,
]
\address[1]{Duale Hochschule Baden-Württemberg, Stuttgart, Germany}
%

%% Rights management information.
%% CC-BY is default license.
\copyrightyear{2024}
\copyrightclause{Copyright for this paper by its authors.
  Use permitted under Creative Commons License Attribution 4.0
  International (CC BY 4.0).}


\sloppy

\begin{document}


\begin{abstract}
Global type inference for Java is able to compute correct types for an input program
which has no type annotations at all, but turns out to be NP-hard.
Former implementations of the algorithm therefore struggled with bad runtime performance.
In this paper we translate the type inference algorithm for Featherweight Generic Java to an
Answer Set Program.
Answer Set Programming (ASP) promises to solve complex computational search problems
as long as they are finite domain.
This paper shows that it is possible to implement global type inference for Featherweight Generic Java (FGJ) as an ASP program.
Afterwards we compared the ASP implementation with our own implementation in Java
exposing promising results.
% We also show soundness, completeness and termination for our ASP implementation.
Another advantage is that the specification of the algorithm can be translated one on one to ASP code
leading to less errors in the respective implementation.
Unfortunately ASP can only be used for type inference for a subset of the Java type system that excludes wildcard types.
\end{abstract}
%

\begin{keywords}
  typeinference, java, programming languages, answer set programming
\end{keywords}

\maketitle              % typeset the header of the contribution
%
%
%
\section{Global Type Inference}
% Every major typed programming language uses some form of type inference.
% Rust, Java, C++, Haskell, etc... %(see: https://en.wikipedia.org/wiki/Type_inference)
% Type inference adds alot of value to a programming language.
% \begin{itemize}
% \item Code is more readable.
% \item Type inference usually does a better job at finding types than a programmer.
% \item Type inference can use types that are not denotable by the programmer.
% \item Better reusability.
% \item Allows for faster development and the programmer to focus on the actual task instead of struggling with the type system.
% \end{itemize}

% Java has adopted more and more type inference features over time.
% %TODO: list type inference additions
% Currently Java only has local type inference.
% We want to bring type inference for Java to the next level.

Java is a strictly typed programming language.
The current versions come with a local type inference feature that allows the programmer to omit
type annotations in some places like argument types of lambda expressions for example.
But methods still have to be fully typed including argument and return types.
We work at a global type inference algorithm for Java which is able to compute correct types
for a Java program with no type annotations at all.

\begin{figure}[h]
\begin{lstlisting}
class C1 {
    C1 self(){
        return this;
    }
}
class C2 {
    C2 self(){
        return this;
    }
}
class Example {
    untypedMethod(var){
        return var.self();   //  (*@$\implies \set{\tv{var} \lessdot \type{C1}, \tv{ret} \doteq \type{C1} }\ ||\ \set{\tv{var} \lessdot \type{C2}, \tv{ret} \doteq \type{C2}}$@*)
    }
}
\end{lstlisting}
\caption{Java program missing type annotations}\label{fig:introExample}
\end{figure}

A possible input for our algorithm is shonw in figure \ref{fig:introExample}.
Here the method \texttt{untypedMethod} is missing its argument type for \texttt{var} and its return type.
Global type inference works in two steps.
First we generate a set of subtype constraints, which are later unified resulting in a type solution consisting of type unifiers $\sigma$.
Every missing type is replaced by a type placeholder.
In this example the type placeholder $\tv{var}$ is a placeholder for the type of the \texttt{var} variable.
The $\tv{ret}$ placeholder represents the return type of the \texttt{untypedMethod} method.
Also shown as a comment behind the respective method call.
There are two types of type constraints.
A subtype constraint like $\tv{var} \lessdot \type{C1}$ meaning that the unification algorithm has to find
a type replacement for $\tv{var}$ which is a subtype of $\type{C1}$.
Due to the Java type system being reflexive one possible solution would be $\sigma(\tv{var}) = \type{C1}$.
A constraint $\tv{var} \doteq \type{C1}$ directly results in $\sigma(\tv{ret}) = \type{C1}$.

Our type inference algorithms for Java are described in a formal manner in
\cite{MartinUnify}, \cite{TIforGFJ}.
The first prototype implementation put the focus on a correct implementation rather than
fast execution times.
Depending on the input it currently takes upto several minutes or even days to compute all or even one possible type solution.
To make them usable in practice we now focus on decreasing the runtime to a feasible level.

\section{Motivation}
The nature of the global type inference algorithm causes the search space of the unification algorithm to
increase exponentially with every ambiguous method call.
Java allows overloaded methods causing our type inference algorithm to create so called Or-Constraints.
This happens if multiple classes implement a method with the same name and the same amount of parameters.
Given the input program in figure \ref{fig:introExample} we do not know, which method \texttt{self} is meant for the method call
\texttt{var.self()}, because there is no type annotation for \texttt{var}.


%\begin{figure}[h]
%\begin{minipage}{0.49\textwidth}
%     \end{minipage}%
%     \hfill
%     \begin{minipage}{0.49\textwidth}
% \begin{lstlisting}[style=constraints]
% (*@$\tv{var} \lessdot \type{C1} | $@*),
% (*@$\sigma(\tv{xp}) = \wctype{\wildcard{A}{\type{String}}{\bot}}{List}{\rwildcard{A}}$@*)
% \end{lstlisting}
% \end{minipage}
%\end{figure}

An Or-Constraint like $\set{\tv{var} \lessdot \type{C1}, \tv{ret} \doteq \type{C1} }\ ||\ \set{\tv{var} \lessdot \type{C2}, \tv{ret} \doteq \type{C2}}$
means that either the the constraint set $\set{\tv{var} \lessdot \type{C1}, \tv{ret} \doteq \type{C1} }$
or $\set{\tv{var} \lessdot \type{C2}, \tv{ret} \doteq \type{C2}}$ has to be satisfied.
If we set the type of \texttt{var} to \texttt{C1}, then the return type of the method will be \texttt{C1} aswell.
If we set it to \texttt{C2} then also the return type will be \texttt{C2}.
There are two possibilities therefore the Or-Constraint.

If we chain multiple overloaded method calls together we end up with multiple Or-Constraints.
The type unification algorithm \unify{} only sees the supplied constraints and has to potentially try all combinations of them.
This is proven in \cite{TIforGFJ} and is the reason why type inference for Featherweight Generic Java is NP-hard.
Let's have a look at the following alternation of our example:

\begin{figure}[h]
\begin{lstlisting}
    untypedMethod(var){
        return var.self().self().self().self();
    }
\end{lstlisting}
\caption{Stacked ambiguous method calls}\label{fig:benchmark}
\end{figure}

Now there are four chained method calls leading to four Or-Constraints:
\begin{align*}
    \set{\tv{var} \lessdot \type{C1}, \tv{r1} \doteq \type{C1} }\ ||\ \set{\tv{var} \lessdot \type{C2}, \tv{r1} \doteq \type{C2}} \\
    \set{\tv{r1} \lessdot \type{C1}, \tv{r2} \doteq \type{C1} }\ ||\ \set{\tv{r1} \lessdot \type{C2}, \tv{r2} \doteq \type{C2}} \\
    \set{\tv{r2} \lessdot \type{C1}, \tv{r3} \doteq \type{C1} }\ ||\ \set{\tv{r2} \lessdot \type{C2}, \tv{r3} \doteq \type{C2}} \\
    \set{\tv{r3} \lessdot \type{C1}, \tv{ret} \doteq \type{C1} }\ ||\ \set{\tv{r3} \lessdot \type{C2}, \tv{ret} \doteq \type{C2}} \\
\end{align*}

The placeholder $\tv{r1}$ stands for the return type of the first call to \texttt{self} and
$\tv{r2}$ for the return type of the second call and so on.
It is obvious that this constraint set only has two solutions.
The variable \texttt{var} and the return type of the method aswell as all intermediate placeholders $\tv{r1}-\tv{r3}$
get either the type \texttt{C1} or \texttt{C2}.
A first prototype implementation of the \unify{} algorithm simply created the cartesian product of all Or-Constraints,
16 possibilities in this example, and iteratively tried all of them.
This leads to a exponential runtime increase with every added overloaded method call.
Eventhough the current algorithm is equipped with various optimizations as presented in \cite{plue181} and \cite{plue231}
there are still runtime issues when dealing with many Or-Constraints.

Our global type inference algorithm should construct type solutions in real time.
Then it can effectively used as a Java compiler which can deal with large inputs of untyped Java code.
Another use case scenario is as an editor plugin which helps a Java programmer by enabling him to write untyped Java code
and letting our tool insert the missing type statements.
For both applications a short execution time is vital.

Answer Set programming promises to solve complex search problems in a highly optimized way.
The idea in this paper is to implement the algorithm presented in \cite{TIforGFJ}
as an ASP program and see how well it handles our type unification problem.

\section{ASP Implementation}

Our ASP implementation replaces the unification step of the type inference algorithm.
So the input is a set of constraints $c$ and the output is a set of unifiers $\sigma$.
An ASP program consists of implication rules and facts.
Here the facts are the input constraints and the rules are the ones depicted in figure \ref{fig:aspUnifyAlgorithm}.
After running the resulting program through an interpreter the output
holds the desired $\sigma(\tv{a}) = \type{T}$ literals.

\begin{figure}[h]
  \center
\begin{tabular}{lcll}
    $c $ &$::=$ & $\type{T} \lessdot \type{T} \mid \type{T} \doteq \type{T}$ & Constraint \\
    $\type{T}, \type{S} $ & $::=$ & $\tv{a} \mid \type{N}$ & Type placeholder or Type \\
    $\ntype{N}$ & $::=$ & $\exptype{C}{\ol{P}}$ & Class Type containing type placeholders\\
    $\gtype{G}$ & $::=$ & $\exptype{C}{\ol{G}}$ & Class Type not containing type placeholders \\
    $\type{P}$ & $::=$ & $\tv{a} \mid \gtype{G}$ & Well-Formed Parameter \\
\end{tabular}
  \caption{Syntax of types and constraints}
  \label{fig:syntax-constraints}
\end{figure}

\begin{figure}
\fbox{
\begin{minipage}{\textwidth}
\textbf{Subtyping:}
\begin{mathpar}
\inferrule[S-Refl]{}{
\type{T} <: \type{T}
}
\and
\inferrule[S-Trans]{\type{T}_1 <: \type{T}_2 \\ \type{T}_2 <: \type{T}_3}{
\type{T}_1 <: \type{T}_3
}
\and
% \inferrule[S-Var]{}{\type{A} <: \Delta(\type{A})}
% \and
\inferrule[S-Class]{\texttt{class}\ \exptype{C}{\ol{X}} \triangleleft \type{N}}{
\exptype{C}{\ol{T}} <: [\ol{T}/\ol{X}]\type{N}
}
\end{mathpar}
% \begin{mathpar}
% \inferrule[N-Refl]{}{
% \type{C} << \type{C}
% }
% \and
% \inferrule[N-Trans]{\type{C}_1 << \type{C}_2 \\ \type{C}_2 << \type{C}_3}{
% \type{C}_1 << \type{C}_3
% }
% \and
% \inferrule[N-Class]{\texttt{class}\ \exptype{C}{\ldots} \triangleleft \exptype{D}{\ldots}}{
% \type{C} << \type{D}
% }
% \end{mathpar}
\textbf{Unify:}

\label{sec:implicationRules}
\begin{mathpar}
\inferrule[Subst-L]{
    \tv{a} \doteq \type{T}_1 \\
    \tv{a} \lessdot \type{T}_2
}{
    \type{T}_1 \lessdot \type{T}_2
}
\and
\inferrule[Subst-R]{
    \tv{a} \doteq \type{T}_1 \\
    \type{T}_2 \lessdot \tv{a}
}{
    \type{T}_2 \lessdot \type{T}_1
}
\and
\inferrule[Subst-Equal]{
    \tv{a} \doteq \type{T}_1 \\
    \tv{a} \doteq \type{T}_2
}{
    \type{T}_1 \doteq \type{T}_2
}
\and
\inferrule[Swap]{
    \type{T}_1 \doteq \type{T}_2
}{
    \type{T}_2 \doteq \type{T}_1
}
\and
% \inferrule[Unfold]{
%     \type{T} \doteq \exptype{C}{\type{T}_1 \ldots \type{T}_n}
% }{
%     \type{T}_i \doteq \type{T}_i
% }
% \and
% \inferrule[Unfold-Right]{
%     \type{T} \lessdot \exptype{C}{\type{T}_1 \ldots \type{T}_n}
% }{
%     \type{T}_i \doteq \type{T}_i
% }
% \and
% \inferrule[Unfold-Left]{
%     \exptype{C}{\type{T}_1 \ldots \type{T}_n} \lessdot \type{T}
% }{
%     \type{T}_i \doteq \type{T}_i
% }
% \and
\inferrule[Subst-Param]{
    \tv{a} \doteq \type{G} \\
    \type{T} \doteq \exptype{C}{\type{P}_1 \ldots, \tv{a}, \ldots \type{P}_n} \\
}{
    \type{T} \doteq \exptype{C}{\type{P}_1, \ldots \type{G}, \ldots \type{P}_n}
}
\and
\inferrule[Subst-Param-Right]{
    \tv{a} \doteq \type{G} \\
    \tv{b} \lessdot \exptype{C}{\type{P}_1 \ldots, \tv{a}, \ldots \type{P}_n} \\
}{
    \tv{b} \lessdot \exptype{C}{\type{P}_1, \ldots \type{G}, \ldots \type{P}_n}
}
\and
\inferrule[Subst-Param-Left]{
    \tv{a} \doteq \type{G} \\
    \exptype{C}{\type{P}_1 \ldots, \tv{a}, \ldots \type{P}_n} \lessdot \tv{b} \\
}{
    \exptype{C}{\type{P}_1, \ldots \type{G}, \ldots \type{P}_n} \lessdot \tv{b}
}
\and
\inferrule[S-Object]{}{\tv{a} \lessdot \type{Object}}
% \and
% \inferrule[Match]{
%     \tv{a} \lessdot \type{N}_1 \\
%     \tv{a} \lessdot \type{N}_2 \\
%     \type{N}_1 << \type{N}_2
% }{
%     \type{N}_1 \lessdot \type{N}_2
% }
% \and
% \inferrule[Adopt]{
%     \tv{a} \lessdot \tv{b} \\
%     \tv{b} \lessdot \type{T}
% }{
%     \tv{a} \lessdot \type{T}
% }
% \and
% \inferrule[Subst-Param]{
%     \tv{a} \doteq \type{N} \\
%     \tv{a} = \type{T}_i \\
%     \exptype{C}{\type{T}_1 \ldots \type{T}_n} \lessdot \type{T} \\
% }{
%     \type{T}_i \doteq \type{N} \\
% }
\and
\inferrule[Adapt]{
    \type{N} \lessdot \exptype{C}{\type{P}_1 \ldots \type{P}_n} \\
    \type{N} <: \exptype{C}{\type{P'}_1 \ldots \type{P'}_n}
}{
    \type{P}_1 \doteq \type{P'}_1 \ \ldots \ \type{P}_n \doteq \type{P'}_n \\
}
\and
\inferrule[Reduce]{
    \exptype{C}{\type{P}_1 \ldots \type{P}_n} \doteq \exptype{C}{\type{P'}_1 \ldots \type{P'}_n} \\
}{
    \type{P}_i \doteq \type{P'}_i \\
}
\and
\inferrule[Super]{
    \type{N} \lessdot \tv{a}\\
    \type{N} <: \type{N}_1, \ldots, \type{N} <: \type{N}_n \\
    \type{N}_1, \ldots, \type{N}_n \ \ok
}{
    \text{for one}\ m \in \set{1, \ldots, n}: \quad  \tv{a} \doteq \type{N}_m
}
\end{mathpar}
\textbf{Solution:}
\begin{mathpar}
\inferrule[Solution]{
    \tv{a} \doteq \type{G}
}{
    \sigma(\tv{a}) = \type{G}
}
\and
\inferrule[Solution-Gen]{
    \tv{a} \lessdot \type{N}
}{
    \tv{a} \doteq \type{N} \ \text{or}\ \tv{a} \lessdot \type{N}
}
\and
\inferrule[Solution-Requirement]{
\tv{a} \lessdot \type{N}_1
%, \tv{a} \doteq \type{N}_2
, \ldots, \tv{a} \lessdot \type{N}_n \\
}{
    \text{there must exist atleast one}: \quad  \tv{a} \doteq \type{N}
}
% \and
% \inferrule[Solution-Gen]{
%     \tv{a} \lessdot \type{C}\\
%     \sigma(\tv{a}) = \emptyset
% }{
%     \tv{a} \doteq \type{A} \\ \sigma'(\tv{a}) = \type{A}
% }
% \and
% \inferrule[Solution-Gen]{
%     \tv{a} \lessdot \type{G} \\
%     \tv{a} \lessdot \type{G}_1, \ldots, \tv{a} \lessdot \type{G}_n \\
%     \forall i: \type{G} << \type{G}_i \\
%     \sigma'(\tv{a}) = \type{A}
% }{
%     \Delta(\type{A}) = \type{G}
% }
\end{mathpar}
\textbf{Fail Conditions:}
\begin{mathpar}
\inferrule[Fail-Subtype]{
    \exptype{C}{\ldots} \lessdot \exptype{D}{\ldots}\\
    \type{C}\ \kw{not extends}\ \type{D}
}{
    \emptyset
}
\and
\inferrule[Fail-Equals]{
    \exptype{C}{\ldots} \doteq \exptype{D}{\ldots}\\
    \type{C} \neq \type{D}
}{
    \emptyset
}
% \and
% \inferrule[Fail-Generic]{
%     \type{X} \doteq \type{T}\\
%     \type{X} \neq \type{T}
% }{
%     \emptyset
% }
\and
\inferrule[Fail-Sigma]{
    \tv{a} \doteq \type{N} \\
    \tv{a} \in \type{N}
}{
    \emptyset
}
% \and
% \inferrule[Fail]{
% \tv{a} \lessdot \type{N}_1 \\
% \tv{a} \lessdot \type{N}_2 \\
% \text{not}\ \type{N}_1 << \type{N}_2 \\
% \text{not}\ \type{N}_2 << \type{N}_1
% }{
%     \emptyset
% }
\end{mathpar}
\end{minipage}
}
\caption{Unify Algorithm}\label{fig:aspUnifyAlgorithm}
\end{figure}


\section{Outcome and Conclusion}
ASP handles Or-Constraints surprisingly well.
We tested our ASP implementation of the unification algorithm with the constraints originating from the program in
figure \ref{fig:benchmark}.
We compared it with the current implementation of the Unification algorithm in Java
\footnote{\url{https://gitea.hb.dhbw-stuttgart.de/JavaTX/JavaCompilerCore}}.
We increased the complexity by adding more stacked method calls up to ten stacked calls and the interpreter clingo \footnote{\url{https://potassco.org/clingo/}}
was still able to handle it easily and finish computation in under 50 milliseconds.
By contrast our Java implementation already takes multiple seconds processing time for the same input.

We are using this example for a fair comparision between the two implementations because it does not include generic type parameters causing the unification algorithm
not to consider wildcard types.
The Java implementation also supports Java wildcard types whereas the ASP implementation does not.
%It is only possible to implement a type inference algorithm for Java as long as we omit wildcard types.
It is unfortunately not possible to implement the unification algorithm including wildcard support with Answer Set Programming.
The reason is that subtype checking in Java is turing complete \cite{javaTuringComplete}.
While the grounding process of the ASP interpreter clingo would not terminate if the problem is turing complete
\cite{kaufmann2016grounding};
the Java implementation is still able to spawn atleast one solution in most cases and only never terminates for specific inputs,
whereas the ASP program never terminates as soon as wildcards are involved.


%\section{Future Work}
% Benchmarks
% Integrating the ASP Unify implementation into existing Java-TX Compiler
% Checking how many programs are abel to be build without wildcards

\bibliographystyle{eptcs}
\bibliography{martin}

\end{document}