I would like to use Antrl4 in Python to process SQL/PLSQL scripts and extract JOIN conditions.
In order to do I am trying, first, to understand how these are represented in the parser tree returned by PlSqlParser.sql_script
.
I have a simple SQL:
SELECT ta.col1, tb.col5FROM mytabA taJOIN mayTabB tb ON ta.col1 = tb.col2WHERE ta.col3 = 'AXA';
I use the following Python script to process the SQL script:
from antlr4 import *from antlr4.tree.Tree import TerminalNodeImpl#from antlr4.tree.Tree import ParseTreefrom antlr4.tree.Trees import Treesfrom PlSqlLexer import PlSqlLexerfrom PlSqlParser import PlSqlParserfrom PlSqlParserListener import PlSqlParserListenerdef handleTree(tree, lvl=0): for child in tree.getChildren(): if isinstance(child, TerminalNode): print(lvl*'│'+'└─', child) else: handleTree(child, lvl+1)class KeyPrinter(PlSqlParserListener): def enterSelect_statement(self, ctx): handleTree(ctx, 0)def main(): with open( "myscript.sql" ) as file: filesrc = file.read() lexer = PlSqlLexer(InputStream(filesrc)) tokens = CommonTokenStream(lexer) #tokens.fill() parser = PlSqlParser(CommonTokenStream(lexer)) tree = parser.sql_script() printer = KeyPrinter() walker = ParseTreeWalker() walker.walk(printer, tree)if __name__ == '__main__': main()
And the output is:
││││└─ SELECT│││││││││││││││││││││└─ ta│││││││││││││││││└─ .││││││││││││││││││││└─ col1│││││└─ ,│││││││││││││││││││││└─ tb│││││││││││││││││└─ .││││││││││││││││││││└─ col5│││││└─ FROM││││││││││││││└─ mytabA││││││││││││└─ ta││││││││└─ JOIN│││││││││││││││└─ mayTabB│││││││││││││└─ tb│││││││││└─ ON││││││││││││││││││││││││││└─ ta││││││││││││││││││││││└─ .│││││││││││││││││││││││││└─ col1││││││││││││││││└─ =││││││││││││││││││││││││││└─ tb││││││││││││││││││││││└─ .│││││││││││││││││││││││││└─ col2│││││└─ WHERE││││││││││││││││││││││└─ ta││││││││││││││││││└─ .│││││││││││││││││││││└─ col3││││││││││││└─ =│││││││││││││││││││└─'AXA'
I change the handleTree
function to print the child
, to try and understand what info can be used to extract the information I am after:
def handleTree(tree, lvl=0): for child in tree.getChildren(): print ( child ) if isinstance(child, TerminalNode): print(lvl*'│'+'└─', child) else: handleTree(child, lvl+1)
The output now is:
[16068 15857 2535 2384][16066 16068 15857 2535 2384][16256 16066 16068 15857 2535 2384][16263 16256 16066 16068 15857 2535 2384]SELECT││││└─ SELECT[16284 16263 16256 16066 16068 15857 2535 2384][16309 16284 16263 16256 16066 16068 15857 2535 2384][16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384]ta│││││││││││││││││││││└─ ta.│││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384]col1││││││││││││││││││││└─ col1,│││││└─ ,[16311 16284 16263 16256 16066 16068 15857 2535 2384][16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384]tb│││││││││││││││││││││└─ tb.│││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384]col5││││││││││││││││││││└─ col5[16288 16263 16256 16066 16068 15857 2535 2384]FROM│││││└─ FROM[16320 16288 16263 16256 16066 16068 15857 2535 2384][16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17152 16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19376 17152 16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20207 19376 17152 16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 20207 19376 17152 16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]mytabA││││││││││││││└─ mytabA[16358 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19174 16358 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20207 19174 16358 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 20207 19174 16358 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]ta││││││││││││└─ ta[16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]JOIN││││││││└─ JOIN[16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17152 16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19376 17152 16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20207 19376 17152 16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 20207 19376 17152 16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]mayTabB│││││││││││││││└─ mayTabB[16358 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19174 16358 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20207 19174 16358 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 20207 19174 16358 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]tb│││││││││││││└─ tb[16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]ON│││││││││└─ ON[16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]ta││││││││││││││││││││││││││└─ ta.││││││││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]col1│││││││││││││││││││││││││└─ col1[17342 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]=││││││││││││││││└─ =[17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]tb││││││││││││││││││││││││││└─ tb.││││││││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]col2│││││││││││││││││││││││││└─ col2[16289 16263 16256 16066 16068 15857 2535 2384]WHERE│││││└─ WHERE[19182 16289 16263 16256 16066 16068 15857 2535 2384][17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384]ta││││││││││││││││││││││└─ ta.││││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384]col3│││││││││││││││││││││└─ col3[17342 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384]=││││││││││││└─ =[17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17667 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][20185 17667 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384]'AXA'│││││││││││││││││││└─'AXA'
It seems what I need to do is keep track of the child
array values and identify which one(s) signal a table name, column name, JOIN etc.
Is using Antlr meant to be this complicated?
Is there a way to 'query' the tree structure for particular nodes, and if so what query to get this specific info?
Is there a way to print out the unique node types, such that we can figure out which ones to search for?