TextSpan SyntaxTree annotation
In my previous post, I explained how to get a SyntaxTree from a U-SQL query.
Now we will see in future posts that we can change the tree significantly.
Why changing the tree? The main reason is to help the optimizer generating the best plan as possible.
For this, we are doing constant folding for example (I will write about it later). This means that we try to do some calculation in the compiler itself in order to give constants to the optimizer.
For example, imagine that we write a query like this:
U-SQL, as a smart compiler, avoids useless operations at runtime by folding constants and transform the query to:
For the following query, the compilation will fail (division by 0).
Roslyn can give us the TextSpan of a node. The TextSpan struct includes the starting position of an expression and its length.
But at this point, the expression will no longer be “(1 + 2) + 1 / @X” but will be “3 + 1 / 0”. So if we want to give a pertinent error position to the user (“(1 + 2) + 1 ### / @X”) we need to have a way to find the original expression position and ideally, it would be great to have a way to get the original node.
To update the tree, we generally use the visitor pattern.
Roslyn provides the CSharpSyntaxRewriter for this.
Roslyn syntax nodes are immutable.
So that means that the BinaryExpressionSyntax associated to the division “(1.0 + 2 + @X) / @X” can’t be the same instance than “3.0 / 0” if we replace “(1.0 + 2 + @X)” by “3.0” and “@X” by “0”.
Note that it’s also true when you update a parent node. E.g. In the following sample e1 and n1 are not the same instance than e2 and n2.
So we can’t just use a basic dictionary for example.
In order to keep an information on a SyntaxNode even after tree update, Roslyn provides a way using SyntaxAnnotation.
SyntaxAnnotation have 2 properties: Kind and Data.
So if you want to add a Boolean information you can use considering the value true if the annotation is present and false if it is not.
If you want to add a value information you can use
This is very easy to use but the fact that SyntaxAnnotation is a sealed class and that Data is a string limits the usage.
So as a workaround we can create our own annotations logic.
Basically we can use a Singleton with a Dictionary containing a list of our custom annotations as value.
However, as I wrote previously, in order to keep the annotation on tree after transformation we need to use Roslyn SyntaxAnnotation.
So we define a string key that is our dictionary key and our annotation data.
In our Singleton, we define two methods:
- AddAnnotation that adds the SyntaxAnnotation with the key on the node and adds the custom annotation with the key in the dictionary
- GetAnnotations that gets the key from the node SyntaxAnnotation and then return our custom annotations from the dictionary.
Then we can define two extension methods to make usage easier: AddAnnotation and GetAnnotations<T>.
Finally, to keep original tree TextSpan, we can define our TextSpanAnnotation and add it to our SyntaxTree nodes using a SyntaxRewriter.
Then we can improve the TextSpan position using specific Visit override:
Finally, to add our TextSpanAnnotation into our tree nodes, we just need to use our TextSpanAnnotator:
Then to get the position of a node in the original SyntaxTree, we can use the following code:
Note that contrary to Roslyn annotations, adding a new IUSqlAnnotation to a node that already has one will not generate a new node.
This is probably better for performance but this is sometimes less convenient.
In this post, you saw how we extend the Roslyn annotation logic using our USqlAnnotationPool class.
With it, we are now capable to add whatever information we want on our SyntaxTree nodes.
These information are persisted with immutable syntax nodes along the different transformations we will apply on our tree and you will see in future posts that we are using this mechanism in many different contexts.