Restructure Unify

This commit is contained in:
Andreas Stadelmeier 2024-05-28 00:44:06 +02:00
parent 3dbdce8e29
commit 583b4acd5c

332
unify.tex
View File

@ -6,6 +6,102 @@
% the algorithm only removes wildcards, never adds them
\section{Unify}\label{sec:unify}
%\newcommand{\tw}[1]{\tv{#1}_?}
\begin{description}
\item[Input:] An environment $\Delta'$
and a set of constraints $C = \set{ \type{T} \lessdot \type{T}, \type{T} \lessdotCC \type{T}, \type{T} \doteq \type{T} \ldots}$
\item[Output:]
Set of unifiers $Uni = \set{\sigma_1, \ldots, \sigma_n}$ and an environment $\Delta$
\end{description}
%The transformation steps are not applied all at once but in a specific order:
\unify{} executes the following steps until a type solution is found:
\begin{description}
\item[Step 1:]
Apply the rules depicted in the figures \ref{fig:normalizing-rules}, \ref{fig:reduce-rules} and \ref{fig:wildcard-rules} exhaustively,
starting with the \rulename{circle} rule.
\item[Step 2:]
%If there are no $(\type{T} \lessdot \tv{a})$ constraints remaining in the constraint set $C$
%resume with step 4.
The second step is nondeterministic.
%\unify{} has to pick the right transformation for each constraint of the form $\type{N} \lessdot \tv{a}$.
%The rules in figure \ref{fig:step2-rules} offer three possibilities to deal with constraints $\type{N} \lessdot \tv{a}$.
For every $\type{T} \lessdot \tv{a}$ constraint \unify{} has to pick exactly one transformation from figure \ref{fig:step2-rules}.
The same principle goes for constraints of the form $\tv{a} \lessdot \type{N}, \tv{a} \lessdot \tv{b}$ and the two transformations in figure \ref{fig:step2-rules2}.
%They have to be applied until the constraint set holds no constraints of the form $\tv{a} \lessdot \type{N}, \tv{a} \lessdot \tv{b}$.
If atleast one transformation was applied in this step revert to step 1.
Otherwise proceed with step 3.
%This builds a search tree over multiple possible solutions.
%\unify{} has to try each branch and accumulate the resulting solutions into a solution set.
%$\type{T} \lessdot \ntv{a}$ constraints have three and $\type{T} \lessdot \wtv{a}$ constraints have five possible transformations.
\textit{Hint:}
When implementing this step via backtracking
the rules \rulename{General} and \rulename{Super} should be tried first.
% is the Same rule really necessary?
The \rulename{Settle} and \rulename{Raise} rules should only be used when none of the rules in figure \ref{fig:step2-rules} can be applied.
\item[Step 3:]
Apply the rules in figure \ref{fig:cleanup-rules} exhaustively.
\rulename{Ground} and \rulename{Flatten} deal with constraints containing free variables.
If a type placeholder is solely used as lower bound \rulename{Ground} can replace it with the bottom type.
Otherwise the \rulename{Flatten} rule has to remove the wildcards responsible for the free variable.
\text{Note:} Only one of those rules has to be applied per constraint.
If the constraint set has been changed by one of these rules the algorithm must return to step 1.
But if only the \rulename{SubElim} rule is applied or the constraint set is not changed at all,
the algorithm can proceed with step 4.
The cleanup step prepares the constraint set for the last step by applying the following concepts:
%Two transformations are done which also help to remove unnecessary types from the solution set.
\begin{description}
\item[Bottom type]
The bottom type $\bot$ is used to generate \texttt{? extends} wildcard definitions.
This is the only possible solution when dealing with multiple upper bounds:
$\tv{a} \lessdot \type{T}, \tv{a} \lessdot \type{S}$ is usually not a correct solution (given $\type{S}$ and $\type{T}$ are no subtypes of eachother).
But if $\tv{a}$ is a lower bound of a wildcard it can be set to $\bot$.
Those constraints only stay in the constraint set after the first step if $\type{S}$ and $\type{T}$ do not have a common subtype.
The \rulename{Ground} rule uses this concept to generate \texttt{extends} Wildcards.
\end{description}
\item[Step 4:] \textit{(Generating Result)}
Apply the rules in figure \ref{fig:generation-rules} until $\wildcardEnv = \emptyset$ and $C = \emptyset$.
The resulting $\Delta, \sigma$ is a correct solution.
For this step to succeed there should only be four kinds of constraints left.
\begin{enumerate}
%\item\label{item:3} $\tv{a} \lessdot \tv{b}$ %, with $a$ and $b$ both isolated type variables
\item $\tv{a} \doteq \tv{b}$
%\item $\wtv{a} \doteq \type{G}$
\item\label{item:1} $\tv{a} \lessdot \wctype{\ol{\wtype{W}}}{C}{\ol{\type{T}}}$, with $\text{fv}(\wctype{\ol{\wtype{W}}}{C}{\ol{\type{T}}}) \subseteq \Delta_in$
\item\label{item:2} $\tv{a} \doteq \wctype{\ol{\wtype{W}}}{C}{\ol{\type{T}}}$, with $\tv{a} \notin \ol{\type{T}}$ % and $\text{fv}(\wctype{\ol{\wtype{W}}}{C}{\ol{\type{T}}}) = \emptyset$
\item\label{item:3} $\tv{a} \doteq \rwildcard{X}$
\end{enumerate}
%Each type placeholder $\tv{a}$ must solely appear on the left side of a constraint.
\unify{} fails if there is any $\tv{a} \doteq \type{T}$ such that $\tv{a}$ occurs in $\type{T}$.
For the cases \ref{item:1}, \ref{item:2}, and \ref{item:3} the placeholder $\tv{a}$
cannot appear anywhere else in the constraint set.
Otherwise the generation rules \rulename{GenSigma} and \rulename{GenDelta} will not be able to process every constraint.
% \begin{displaymath}
% \deduction{
% \wildcardEnv \cup \set{\wildcard{B}{\type{G}}{\type{G'}}} \vdash C \implies \Delta, \sigma
% }{
% \wildcardEnv \vdash C \implies \Delta \cup \set{\wildcard{B}{\type{G}}{\type{G'}}}, \sigma
% }\quad \text{tph}(\type{G}) = \emptyset, \text{tph}(\type{G'}) = \emptyset,
% \rwildcard{B} \notin \text{dom}(\Delta)
% \quad \rulename{AddDelta}
% \end{displaymath}
\end{description}
\subsection{Transformation Rules (Step 1)}
Our \unify{} process uses a similar concept as the standard unification by Martelli and Montanari \cite{MM82},
consisting of terms, relations and variables.
Instead of terms we have types of the form $\exptype{C}{\ol{T}}$ and
@ -462,7 +558,75 @@ and every added wildcard gets a fresh unique name.
This ensures the wildcard environment $\wildcardEnv{}$ never
gets the same wildcard twice.
\subsection{Branching Step}
\begin{figure}
\begin{center}
\leavevmode
\fbox{
\begin{tabular}[t]{l@{~}l}
\rulename{Subst} &
$\begin{array}[c]{l}
\wildcardEnv \vdash C \cup \set{\ntv{a} \doteq \type{T}}\\
\hline
[\type{T}/\tv{a}]\wildcardEnv \vdash [\type{T}/\tv{a}]
C \cup \set{\ntv{a} \doteq \type{T}}
\end{array}
\quad \begin{array}{c}
\ntv{a} \notin \type{T} \\
\text{fv}(\type{T}) \subseteq \Delta', \, \text{wtv}(\type{T}) = \emptyset
\end{array}$\\
\\
\rulename{Subst-WC} &$
\begin{array}[c]{l}
\wildcardEnv \vdash C \cup \set{\wtv{a} \doteq \rwildcard{T}}\\
\hline
[\type{T}/\wtv{a}]\wildcardEnv \vdash [\type{T}/\wtv{a}]C
\end{array} \quad \wtv{a} \notin \type{T}
$\\
\\
\rulename{Normalize} &
$\begin{array}[c]{l}
\wildcardEnv \vdash C \cup \set{\ntv{a} \doteq \type{T}}\\
\hline
[\ntv{b}/\wtv{b}]\wildcardEnv \vdash [\ntv{b}/\wtv{b}]
C \cup \set{\ntv{a} \doteq [\ntv{b}/\wtv{b}]\type{T}}
\end{array}
\quad \begin{array}{c}
\wtv{b} \in \text{wtv}(\type{T})\\
\ntv{b} \ \text{fresh}
\end{array}$\\
\\
\rulename{Contract} &
$\begin{array}[c]{l}
\wildcardEnv \cup \set{\wildcard{A}{\type{U}}{\type{L}}} \vdash C \cup \set{\ntv{a} \doteq \type{T}}\\
\hline
[\type{U}/\type{A}]\wildcardEnv \vdash [\type{U}/\type{A}]
C \cup [\type{U}/\type{A}]\set{\ntv{a} \doteq \type{T}, \type{L} \doteq \type{U}}
\end{array}
\quad \begin{array}{c}
\rwildcard{A} \in \text{fv}(\type{T})\\
\end{array}$\\
\end{tabular}}
\end{center}
\caption{Substitution rules}\label{fig:subst-rules}
\end{figure}
\unify{} must not replace normal type placeholders with free variables
except variables initially passed by $\Delta_{in}$.
The \rulename{Subst} rule checks if a type $\type{T}$ contains any
free variables or wildcard placeholders before replacing a normal type placeholder with it.
%This ensures that free variables are never substituted for normal type placeholders.
This ensures that a normal type placeholder is never replaced by a type containing free variables.
A type solution for a normal type placeholder will never contain free variables.
This is needed to guarantee well-formed type solutions and also keep free variables inside their scope
(see challenge \ref{challenge3}).
\rulename{Subst-WC} does not need to do that and can freely replace wildcard placehodlers with types despite their free variables.
We do not keep replacements for wildcard placeholders and they will not show up in the final type solution.
If the \rulename{Subst} rule is not applicable then either the \rulename{Normalize}
or \rulename{Contract} transformation has to be used to remove wildcard placeholders and
wildcards.
\subsection{Branching Step (Step 2)}
\unify{} is described as a nondeterministic algorithm.
Some constraints allow for multiple transformations from which
the algorithm has to pick the right one.
@ -474,7 +638,9 @@ and gathers all possible type solutions.
We skip the definition of this practice, because it is already described in \cite{TIforFGJ}
and only needed for a proof of completeness.
\subsection{Generate Result (Step 4)}
The generation rules defined in figure \ref{fig:generation-rules} are similar to the other transformation rules but contain an additional part,
the result output consisting of a wildcard environment and a set of unifier $\sigma$.
\subsection{Adding Wildcards to the mix}
%\unify{} is able to create wildcard solutions even when the input set of constraints do not contain any wildcard variables.
@ -622,101 +788,6 @@ The \texttt{concat} method takes two lists of the same generic type.
\subsection{Algorithm}
%\newcommand{\tw}[1]{\tv{#1}_?}
\begin{description}
\item[Input:] An environment $\Delta'$
and a set of constraints $C = \set{ \type{T} \lessdot \type{T}, \type{T} \lessdotCC \type{T}, \type{T} \doteq \type{T} \ldots}$
\item[Output:]
Set of unifiers $Uni = \set{\sigma_1, \ldots, \sigma_n}$ and an environment $\Delta$
\end{description}
%The transformation steps are not applied all at once but in a specific order:
\unify{} executes the following steps until a type solution is found:
\begin{description}
\item[Step 1:]
Apply the rules depicted in the figures \ref{fig:normalizing-rules}, \ref{fig:reduce-rules} and \ref{fig:wildcard-rules} exhaustively,
starting with the \rulename{circle} rule.
\item[Step 2:]
%If there are no $(\type{T} \lessdot \tv{a})$ constraints remaining in the constraint set $C$
%resume with step 4.
The second step is nondeterministic.
%\unify{} has to pick the right transformation for each constraint of the form $\type{N} \lessdot \tv{a}$.
%The rules in figure \ref{fig:step2-rules} offer three possibilities to deal with constraints $\type{N} \lessdot \tv{a}$.
For every $\type{T} \lessdot \tv{a}$ constraint \unify{} has to pick exactly one transformation from figure \ref{fig:step2-rules}.
The same principle goes for constraints of the form $\tv{a} \lessdot \type{N}, \tv{a} \lessdot \tv{b}$ and the two transformations in figure \ref{fig:step2-rules2}.
%They have to be applied until the constraint set holds no constraints of the form $\tv{a} \lessdot \type{N}, \tv{a} \lessdot \tv{b}$.
If atleast one transformation was applied in this step revert to step 1.
Otherwise proceed with step 3.
%This builds a search tree over multiple possible solutions.
%\unify{} has to try each branch and accumulate the resulting solutions into a solution set.
%$\type{T} \lessdot \ntv{a}$ constraints have three and $\type{T} \lessdot \wtv{a}$ constraints have five possible transformations.
\textit{Hint:}
When implementing this step via backtracking
the rules \rulename{General} and \rulename{Super} should be tried first.
% is the Same rule really necessary?
The \rulename{Settle} and \rulename{Raise} rules should only be used when none of the rules in figure \ref{fig:step2-rules} can be applied.
\item[Step 3:]
Apply the rules in figure \ref{fig:cleanup-rules} exhaustively.
\rulename{Ground} and \rulename{Flatten} deal with constraints containing free variables.
If a type placeholder is solely used as lower bound \rulename{Ground} can replace it with the bottom type.
Otherwise the \rulename{Flatten} rule has to remove the wildcards responsible for the free variable.
\text{Note:} Only one of those rules has to be applied per constraint.
If the constraint set has been changed by one of these rules the algorithm must return to step 1.
But if only the \rulename{SubElim} rule is applied or the constraint set is not changed at all,
the algorithm can proceed with step 4.
The cleanup step prepares the constraint set for the last step by applying the following concepts:
%Two transformations are done which also help to remove unnecessary types from the solution set.
\begin{description}
\item[Bottom type]
The bottom type $\bot$ is used to generate \texttt{? extends} wildcard definitions.
This is the only possible solution when dealing with multiple upper bounds:
$\tv{a} \lessdot \type{T}, \tv{a} \lessdot \type{S}$ is usually not a correct solution (given $\type{S}$ and $\type{T}$ are no subtypes of eachother).
But if $\tv{a}$ is a lower bound of a wildcard it can be set to $\bot$.
Those constraints only stay in the constraint set after the first step if $\type{S}$ and $\type{T}$ do not have a common subtype.
The \rulename{Ground} rule uses this concept to generate \texttt{extends} Wildcards.
\end{description}
\item[Step 4:] \textit{(Generating Result)}
Apply the rules in figure \ref{fig:generation-rules} until $\wildcardEnv = \emptyset$ and $C = \emptyset$.
The resulting $\Delta, \sigma$ is a correct solution.
For this step to succeed there should only be four kinds of constraints left.
\begin{enumerate}
%\item\label{item:3} $\tv{a} \lessdot \tv{b}$ %, with $a$ and $b$ both isolated type variables
\item $\tv{a} \doteq \tv{b}$
%\item $\wtv{a} \doteq \type{G}$
\item\label{item:1} $\tv{a} \lessdot \wctype{\ol{\wtype{W}}}{C}{\ol{\type{T}}}$, with $\text{fv}(\wctype{\ol{\wtype{W}}}{C}{\ol{\type{T}}}) \subseteq \Delta_in$
\item\label{item:2} $\tv{a} \doteq \wctype{\ol{\wtype{W}}}{C}{\ol{\type{T}}}$, with $\tv{a} \notin \ol{\type{T}}$ % and $\text{fv}(\wctype{\ol{\wtype{W}}}{C}{\ol{\type{T}}}) = \emptyset$
\item\label{item:3} $\tv{a} \doteq \rwildcard{X}$
\end{enumerate}
%Each type placeholder $\tv{a}$ must solely appear on the left side of a constraint.
\unify{} fails if there is any $\tv{a} \doteq \type{T}$ such that $\tv{a}$ occurs in $\type{T}$.
For the cases \ref{item:1}, \ref{item:2}, and \ref{item:3} the placeholder $\tv{a}$
cannot appear anywhere else in the constraint set.
Otherwise the generation rules \rulename{GenSigma} and \rulename{GenDelta} will not be able to process every constraint.
% \begin{displaymath}
% \deduction{
% \wildcardEnv \cup \set{\wildcard{B}{\type{G}}{\type{G'}}} \vdash C \implies \Delta, \sigma
% }{
% \wildcardEnv \vdash C \implies \Delta \cup \set{\wildcard{B}{\type{G}}{\type{G'}}}, \sigma
% }\quad \text{tph}(\type{G}) = \emptyset, \text{tph}(\type{G'}) = \emptyset,
% \rwildcard{B} \notin \text{dom}(\Delta)
% \quad \rulename{AddDelta}
% \end{displaymath}
\end{description}
With \texttt{C} being class names and \texttt{A} being wildcard names.
The wildcard type $\wildcard{X}{U}{L}$ consist of an upper bound $\type{U}$, a lower bound $\type{L}$
and a name $\mathtt{X}$.
@ -820,73 +891,6 @@ which are used for the upper and lower bounds.
% if there are a <. List<x?> constraints remaining in the end, then this can be a sign of a irregular input constraint set.
\begin{figure}
\begin{center}
\leavevmode
\fbox{
\begin{tabular}[t]{l@{~}l}
\rulename{Subst} &
$\begin{array}[c]{l}
\wildcardEnv \vdash C \cup \set{\ntv{a} \doteq \type{T}}\\
\hline
[\type{T}/\tv{a}]\wildcardEnv \vdash [\type{T}/\tv{a}]
C \cup \set{\ntv{a} \doteq \type{T}}
\end{array}
\quad \begin{array}{c}
\ntv{a} \notin \type{T} \\
\text{fv}(\type{T}) \subseteq \Delta', \, \text{wtv}(\type{T}) = \emptyset
\end{array}$\\
\\
\rulename{Subst-WC} &$
\begin{array}[c]{l}
\wildcardEnv \vdash C \cup \set{\wtv{a} \doteq \rwildcard{T}}\\
\hline
[\type{T}/\wtv{a}]\wildcardEnv \vdash [\type{T}/\wtv{a}]C
\end{array} \quad \wtv{a} \notin \type{T}
$\\
\\
\rulename{Normalize} &
$\begin{array}[c]{l}
\wildcardEnv \vdash C \cup \set{\ntv{a} \doteq \type{T}}\\
\hline
[\ntv{b}/\wtv{b}]\wildcardEnv \vdash [\ntv{b}/\wtv{b}]
C \cup \set{\ntv{a} \doteq [\ntv{b}/\wtv{b}]\type{T}}
\end{array}
\quad \begin{array}{c}
\wtv{b} \in \text{wtv}(\type{T})\\
\ntv{b} \ \text{fresh}
\end{array}$\\
\\
\rulename{Contract} &
$\begin{array}[c]{l}
\wildcardEnv \cup \set{\wildcard{A}{\type{U}}{\type{L}}} \vdash C \cup \set{\ntv{a} \doteq \type{T}}\\
\hline
[\type{U}/\type{A}]\wildcardEnv \vdash [\type{U}/\type{A}]
C \cup [\type{U}/\type{A}]\set{\ntv{a} \doteq \type{T}, \type{L} \doteq \type{U}}
\end{array}
\quad \begin{array}{c}
\rwildcard{A} \in \text{fv}(\type{T})\\
\end{array}$\\
\end{tabular}}
\end{center}
\caption{Substitution rules}\label{fig:subst-rules}
\end{figure}
\unify{} must not replace normal type placeholders with free variables
except variables initially passed by $\Delta_{in}$.
The \rulename{Subst} rule checks if a type $\type{T}$ contains any
free variables or wildcard placeholders before replacing a normal type placeholder with it.
%This ensures that free variables are never substituted for normal type placeholders.
This ensures that a normal type placeholder is never replaced by a type containing free variables.
A type solution for a normal type placeholder will never contain free variables.
This is needed to guarantee well-formed type solutions and also keep free variables inside their scope
(see challenge \ref{challenge3}).
\rulename{Subst-WC} does not need to do that and can freely replace wildcard placehodlers with types despite their free variables.
We do not keep replacements for wildcard placeholders and they will not show up in the final type solution.
If the \rulename{Subst} rule is not applicable then either the \rulename{Normalize}
or \rulename{Contract} transformation has to be used to remove wildcard placeholders and
wildcards.
% \begin{example}{Free variables must not leave the scope of the surrounding \texttt{let} statement}
% \noindent