Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12141

Extract JOIN conditions from SQL using Antlr4 and Python

$
0
0

I would like to use Antrl4 in Python to process SQL/PLSQL scripts and extract JOIN conditions.

In order to do I am trying, first, to understand how these are represented in the parser tree returned by PlSqlParser.sql_script.

I have a simple SQL:

SELECT ta.col1, tb.col5FROM   mytabA taJOIN  mayTabB tb ON ta.col1 = tb.col2WHERE ta.col3 = 'AXA';

I use the following Python script to process the SQL script:

from antlr4 import *from antlr4.tree.Tree import TerminalNodeImpl#from antlr4.tree.Tree import ParseTreefrom antlr4.tree.Trees import Treesfrom PlSqlLexer import PlSqlLexerfrom PlSqlParser import PlSqlParserfrom PlSqlParserListener import PlSqlParserListenerdef handleTree(tree, lvl=0):    for child in tree.getChildren():        if isinstance(child, TerminalNode):            print(lvl*'│'+'└─', child)        else:            handleTree(child, lvl+1)class KeyPrinter(PlSqlParserListener):    def enterSelect_statement(self, ctx):        handleTree(ctx, 0)def main():        with open( "myscript.sql" ) as file:            filesrc = file.read()            lexer = PlSqlLexer(InputStream(filesrc))            tokens = CommonTokenStream(lexer)            #tokens.fill()            parser = PlSqlParser(CommonTokenStream(lexer))            tree = parser.sql_script()            printer = KeyPrinter()            walker = ParseTreeWalker()            walker.walk(printer, tree)if __name__ == '__main__':    main()

And the output is:

││││└─ SELECT│││││││││││││││││││││└─ ta│││││││││││││││││└─ .││││││││││││││││││││└─ col1│││││└─ ,│││││││││││││││││││││└─ tb│││││││││││││││││└─ .││││││││││││││││││││└─ col5│││││└─ FROM││││││││││││││└─ mytabA││││││││││││└─ ta││││││││└─ JOIN│││││││││││││││└─ mayTabB│││││││││││││└─ tb│││││││││└─ ON││││││││││││││││││││││││││└─ ta││││││││││││││││││││││└─ .│││││││││││││││││││││││││└─ col1││││││││││││││││└─ =││││││││││││││││││││││││││└─ tb││││││││││││││││││││││└─ .│││││││││││││││││││││││││└─ col2│││││└─ WHERE││││││││││││││││││││││└─ ta││││││││││││││││││└─ .│││││││││││││││││││││└─ col3││││││││││││└─ =│││││││││││││││││││└─'AXA'

I change the handleTree function to print the child, to try and understand what info can be used to extract the information I am after:

def handleTree(tree, lvl=0):    for child in tree.getChildren():        print ( child )        if isinstance(child, TerminalNode):            print(lvl*'│'+'└─', child)        else:            handleTree(child, lvl+1)

The output now is:

[16068 15857 2535 2384][16066 16068 15857 2535 2384][16256 16066 16068 15857 2535 2384][16263 16256 16066 16068 15857 2535 2384]SELECT││││└─ SELECT[16284 16263 16256 16066 16068 15857 2535 2384][16309 16284 16263 16256 16066 16068 15857 2535 2384][16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384]ta│││││││││││││││││││││└─ ta.│││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16309 16284 16263 16256 16066 16068 15857 2535 2384]col1││││││││││││││││││││└─ col1,│││││└─ ,[16311 16284 16263 16256 16066 16068 15857 2535 2384][16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384]tb│││││││││││││││││││││└─ tb.│││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 17318 17280 17264 17255 16326 16311 16284 16263 16256 16066 16068 15857 2535 2384]col5││││││││││││││││││││└─ col5[16288 16263 16256 16066 16068 15857 2535 2384]FROM│││││└─ FROM[16320 16288 16263 16256 16066 16068 15857 2535 2384][16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17152 16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19376 17152 16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20207 19376 17152 16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 20207 19376 17152 16361 16351 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]mytabA││││││││││││││└─ mytabA[16358 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19174 16358 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20207 19174 16358 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 20207 19174 16358 16340 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]ta││││││││││││└─ ta[16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]JOIN││││││││└─ JOIN[16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17152 16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19376 17152 16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20207 19376 17152 16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 20207 19376 17152 16361 16351 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]mayTabB│││││││││││││││└─ mayTabB[16358 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19174 16358 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20207 19174 16358 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 20207 19174 16358 16397 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]tb│││││││││││││└─ tb[16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]ON│││││││││└─ ON[16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]ta││││││││││││││││││││││││││└─ ta.││││││││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]col1│││││││││││││││││││││││││└─ col1[17342 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]=││││││││││││││││└─ =[17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]tb││││││││││││││││││││││││││└─ tb.││││││││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 16414 16401 16341 16332 16320 16288 16263 16256 16066 16068 15857 2535 2384]col2│││││││││││││││││││││││││└─ col2[16289 16263 16256 16066 16068 15857 2535 2384]WHERE│││││└─ WHERE[19182 16289 16263 16256 16066 16068 15857 2535 2384][17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][19747 19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][20209 19747 19724 2342 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384]ta││││││││││││││││││││││└─ ta.││││││││││││││││││└─ .[19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][19747 19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][20209 19747 19733 17669 17555 17472 17409 17350 17339 2070 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384]col3│││││││││││││││││││││└─ col3[17342 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384]=││││││││││││└─ =[17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][17667 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384][20185 17667 17555 17472 17409 17350 17339 17343 17318 17280 17264 17255 17217 19182 16289 16263 16256 16066 16068 15857 2535 2384]'AXA'│││││││││││││││││││└─'AXA'

It seems what I need to do is keep track of the child array values and identify which one(s) signal a table name, column name, JOIN etc.

Is using Antlr meant to be this complicated?

Is there a way to 'query' the tree structure for particular nodes, and if so what query to get this specific info?

Is there a way to print out the unique node types, such that we can figure out which ones to search for?


Viewing all articles
Browse latest Browse all 12141

Trending Articles