Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

spark for hive delete the table and create a new table with the same name ,The information already exists in the local file spark-warehouse

$
0
0

I'm new to spark,I learned that spark can use hive's metastore, so I tried to use that。I copy hive-site, core-site, and hdfs-site to spark's conf,And put the jar of mysql-connector into the jar directory of spark.Then I can see the databases and table information of previous hive through spark sql, and I can query the data in hive.Then I created a database, created a table, and loaded some data.When you look at hdfs, there are no related database, table, and data, but spark-warehouse in spark_home can see the library tables and data.My question is Why is the data stored locally instead of in hdfs when I configure hive.And when I delete this table and recreate the same table, it indicates that the table file already exists in the local, that is, my delete operation does not delete the local data.I'll add that when I perform a delete table operation, my show tables really don't show the table I deleted.

Upload me the following snippet of my code

    _SPARK_HOST = "local[3]"    _APP_NAME = "test"    spark = SparkSession.builder \        .master(_SPARK_HOST) \        .appName(_APP_NAME) \        .config("spark.sql.shuffle.partitions", "4") \        .config("spark.sql.warehouse.dir", "hdfs://node1:9870/user/hive/warehouse") \        .config("hive.metastore.uris", "thrift://node1:9083") \        .enableHiveSupport() \        .getOrCreate()    spark.sparkContext.setLogLevel("WARN")    spark.sql("show databases").show()    spark.sql("use sparkhive").show()    spark.sql("show tables").show()    spark=SparkSession.builder.master(_SPARK_HOST).appName(_APP_NAME).config("spark.sql.shuffle.partitions","4").config("spark.sql.warehouse.dir", "hdfs://192.168.150.102:9870/user/hive/warehouse") \        .config("hive.metastore.uris", "thrift://192.168.150.102:9083") \        .enableHiveSupport().getOrCreate()    spark.sql( "LOAD DATA LOCAL INPATH '/export/pyworkspace/pyspark_sparksql_chapter3/data/hive/student.csv' INTO TABLE person")

The error message is

SparkRuntimeException: [LOCATION_ALREADY_EXISTS] Cannot name the managed table as `spark_catalog`.`sparkhive`.`person`, as its associated location 'file:/home/spark-3.5.1-bin-hadoop3/spark-warehouse/sparkhive.db/person' already exists. Please pick a different table name, or remove the existing location first.

I want to adress this issue


Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>