This blog walks readers through the proper use case pre-requisites and steps to read data from Hive and write to MS SQL Server Table use case.
create database organization_db; create table employee(id int,name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
[~]$ cat emp.csv 1,rohan 2,amar 3,jeet [~]$ hadoop fs -put emp.csv hdfs://hdp314/warehouse/tablespace/managed/hive/organization_db.db/employee [~]$ hadoop fs -cat hdfs://hdp314/warehouse/tablespace/managed/hive/organization_db.db/employee/emp.csv 1,rohan 2,amar 3,jeet
mssql-cli -U <username> -P <password> -S <sql server ip/hostname> -d <db_name> db_name> create database orgdb; Time: 1.005s (a second) Commands completed successfully. db_name> use orgdb; Time: 0.502s Commands completed successfully. orgdb;> create schema org; Time: 0.453s Commands completed successfully. orgdb;> create table org.employee(id int primary key , name varchar(1024)); Time: 0.453s Commands completed successfully.
val hiveDf = spark.sql("select * from organization_db.employee") hiveDf.show() val df = hiveDf.write.format("jdbc").mode(org.apache.spark.sql.SaveMode.Overwrite) df.option("url","jdbc:sqlserver://<sql-server-ip>:1433;databaseName=orgdb").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable", "org.employee").option("user", "sa").option("password","xxxxx").save()
spark-shell --master yarn --jars /<path-to-driver>/sqljdbc4-2.0.jar
[~]$ spark-shell --master yarn --jars <driver-path>/sqljdbc4-2.0.jar Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at https://xxxxx:4041 Spark context available as 'sc' (master = yarn, app id = application_1630517835182_0012). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.2.3.1.4.0-315 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_271) scala> val hiveDf = spark.sql("select * from organization_db.employee") hiveDf: org.apache.spark.sql.DataFrame = [id: int, name: string] scala> hiveDf.show() +---+-----+ | id| name| +---+-----+ | 1|rohan| | 2| amar| | 3| jeet| +---+-----+ scala> val df = hiveDf.write.format("jdbc").mode(org.apache.spark.sql.SaveMode.Overwrite) df: org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row] = org.apache.spark.sql.DataFrameWriter@21838ea3 scala> df.option("url","jdbc:sqlserver://xxxxx:1433;databaseName=orgdb").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable", "org.employee").option("user", "sa").option("password","xxxx").save()
[amar@crrt-edge ~]$ mssql-cli -U <user_name> -P <password> -S <sql server ip/hostname>-d orgdb orgdb> select * from org."employee"; Time: 4.674s (4 seconds) +------+--------+ | id | name | |------+--------| | 3 | jeet | | 1 | rohan | | 2 | amar | +------+--------+ (3 rows affected)
News By: Team Zaloni
Blogs By: Matthew Caspento
Blogs By: Haley Teeples
Blogs By: Team Zaloni
Blogs By: Pranjal Goswami
Papers By: Team Zaloni