This blog describes one of the methods of integration of BigInsights with Netezza by using sqoop. I have used Netezza emulator and BigInsights QSE VM as reference for this article.

1. Pre-requisties

1.1 Softwares

a) Netezza Emulator
b) BigInsights QSE VM Image

1.2 Hardware requirements

a)Workstation with atleast 12 GB RAM and 4 cores.

1.3 Business pre-requisties

a) The audience of the article is expected to be aware of database concepts and terminologies, db2 commands, netezza nzsql client and biginsights components.

2. Steps for importing tables using sqoop

a) Start the Netezza emulator . Once its started and running you should see ip address against Host VM field in the emulator window
Netezza_emulator

b) Start QSE VM image , run Start BigInsights icon on the desktop to start all the services.

c) Copy the jdbc driver from Netezza emulator to sqoop lib directory in QSE image using following command

biadmin@bivm:/opt/ibm/biginsights/sqoop/lib> scp nz@192.168.10.131:/nz/kit/sbin/nzjdbc3.jar . 
#replace the ip with ip in your environment
#default password for nz user is nz.

d) Now import netezza table to BigInsights QSE Image as hive table. In the following command table we are importing ‘temp’ table from netezza , jdbc username and password is admin/password.

biadmin@bivm:/opt/ibm/biginsights/sqoop/bin> ./sqoop import --driver org.netezza.Driver --connect jdbc:netezza://192.168.10.131/system --username admin --password password --table temp --hive-import  --hive-home $BIGINSIGHTS_HOME/hive --create-hive-table --hive-table temp --hive-overwrite -m 1
15/02/25 01:15:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/02/25 01:15:33 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
15/02/25 01:15:33 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
15/02/25 01:15:33 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
15/02/25 01:15:33 INFO manager.SqlManager: Using default fetchSize of 1000
15/02/25 01:15:33 INFO tool.CodeGenTool: Beginning code generation
15/02/25 01:15:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM temp AS t WHERE 1=0
15/02/25 01:15:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM temp AS t WHERE 1=0
15/02/25 01:15:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/ibm/biginsights/IHC
15/02/25 01:15:34 INFO orm.CompilationManager: Found hadoop core jar at: /opt/ibm/biginsights/IHC/hadoop-core.jar
Note: /tmp/sqoop-biadmin/compile/020bb11bbc599637ccb7549518f15550/temp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/02/25 01:15:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-biadmin/compile/020bb11bbc599637ccb7549518f15550/temp.jar
15/02/25 01:15:35 INFO mapreduce.ImportJobBase: Beginning import of temp
15/02/25 01:15:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM temp AS t WHERE 1=0
15/02/25 01:15:38 INFO mapred.JobClient: Running job: job_201502250021_0005
15/02/25 01:15:39 INFO mapred.JobClient:  map 0% reduce 0%
15/02/25 01:15:50 INFO mapred.JobClient:  map 100% reduce 0%
15/02/25 01:15:51 INFO mapred.JobClient: Job complete: job_201502250021_0005
15/02/25 01:15:51 INFO mapred.JobClient: Counters: 18
15/02/25 01:15:51 INFO mapred.JobClient:   File System Counters
15/02/25 01:15:51 INFO mapred.JobClient:     FILE: BYTES_WRITTEN=190936
15/02/25 01:15:51 INFO mapred.JobClient:     HDFS: BYTES_READ=87
15/02/25 01:15:51 INFO mapred.JobClient:     HDFS: BYTES_WRITTEN=2
15/02/25 01:15:51 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.JobCounter
15/02/25 01:15:51 INFO mapred.JobClient:     TOTAL_LAUNCHED_MAPS=1
15/02/25 01:15:51 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=7974
15/02/25 01:15:51 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
15/02/25 01:15:51 INFO mapred.JobClient:     FALLOW_SLOTS_MILLIS_MAPS=0
15/02/25 01:15:51 INFO mapred.JobClient:     FALLOW_SLOTS_MILLIS_REDUCES=0
15/02/25 01:15:51 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.TaskCounter
15/02/25 01:15:51 INFO mapred.JobClient:     MAP_INPUT_RECORDS=1
15/02/25 01:15:51 INFO mapred.JobClient:     MAP_OUTPUT_RECORDS=1
15/02/25 01:15:51 INFO mapred.JobClient:     SPLIT_RAW_BYTES=87
15/02/25 01:15:51 INFO mapred.JobClient:     SPILLED_RECORDS=0
15/02/25 01:15:51 INFO mapred.JobClient:     CPU_MILLISECONDS=280
15/02/25 01:15:51 INFO mapred.JobClient:     PHYSICAL_MEMORY_BYTES=158998528
15/02/25 01:15:51 INFO mapred.JobClient:     VIRTUAL_MEMORY_BYTES=1769664512
15/02/25 01:15:51 INFO mapred.JobClient:     COMMITTED_HEAP_BYTES=1048576000
15/02/25 01:15:51 INFO mapred.JobClient:   File Input Format Counters 
15/02/25 01:15:51 INFO mapred.JobClient:     Bytes Read=0
15/02/25 01:15:51 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.output.FileOutputFormat$Counter
15/02/25 01:15:51 INFO mapred.JobClient:     BYTES_WRITTEN=2
15/02/25 01:15:51 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 14.7424 seconds (0.1357 bytes/sec)
15/02/25 01:15:51 INFO mapreduce.ImportJobBase: Retrieved 1 records.
15/02/25 01:15:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM temp AS t WHERE 1=0
15/02/25 01:15:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM temp AS t WHERE 1=0
15/02/25 01:15:51 INFO hive.HiveImport: Removing temporary files from import process: hdfs://bivm.ibm.com:9000/user/biadmin/temp/_logs
15/02/25 01:15:51 INFO hive.HiveImport: Loading uploaded data into Hive
15/02/25 01:15:54 INFO hive.HiveImport: 
15/02/25 01:15:54 INFO hive.HiveImport: Logging initialized using configuration in file:/opt/ibm/biginsights/hive/conf/hive-log4j.properties
15/02/25 01:16:00 INFO hive.HiveImport: OK
15/02/25 01:16:00 INFO hive.HiveImport: Time taken: 4.775 seconds
15/02/25 01:16:01 INFO hive.HiveImport: Loading data to table default.temp
15/02/25 01:16:01 INFO hive.HiveImport: rmr: DEPRECATED: Please use 'rm -r' instead.
15/02/25 01:16:01 INFO hive.HiveImport: Deleted /biginsights/hive/warehouse/temp
15/02/25 01:16:01 INFO hive.HiveImport: OK
15/02/25 01:16:01 INFO hive.HiveImport: Time taken: 1.398 seconds
15/02/25 01:16:01 INFO hive.HiveImport: Hive import complete.
biadmin@bivm:/opt/ibm/biginsights/sqoop/bin> ../../hive/bin/hive

Logging initialized using configuration in file:/opt/ibm/biginsights/hive/conf/hive-log4j.properties

d) Verify if table ‘temp’ is imported successfully by querying from hive shell.

hive> show tables;
OK
temp
Time taken: 3.224 seconds, Fetched: 1 row(s)
hive> select * from temp;
OK
2
Time taken: 0.747 seconds, Fetched: 1 row(s)
hive> 

e) Now its time to export the hive table from BigInsights to Netezza using sqoop-export. In the following command table we are importing ‘test’ table from BigInsights to Netezza, jdbc username and password is admin/password.

biadmin@bivm:/opt/ibm/biginsights/sqoop/bin> ./sqoop-export --connect jdbc:netezza://192.168.10.131/system --username admin -password password --export-dir hdfs://bivm.ibm.com:9000/biginsights/hive/warehouse/test --table TEST  -m 1 
15/03/02 18:33:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/03/02 18:33:10 INFO manager.SqlManager: Using default fetchSize of 1000
15/03/02 18:33:10 INFO tool.CodeGenTool: Beginning code generation
15/03/02 18:33:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "TEST" AS t WHERE 1=0
15/03/02 18:33:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "TEST" AS t WHERE 1=0
15/03/02 18:33:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/ibm/biginsights/IHC
15/03/02 18:33:10 INFO orm.CompilationManager: Found hadoop core jar at: /opt/ibm/biginsights/IHC/hadoop-core.jar
Note: /tmp/sqoop-biadmin/compile/b8a7f29227416cbb49a476a1e53564bc/TEST.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/03/02 18:33:12 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-biadmin/compile/b8a7f29227416cbb49a476a1e53564bc/TEST.jar
15/03/02 18:33:12 WARN manager.NetezzaManager: It looks like you are exporting to Netezza.
15/03/02 18:33:12 WARN manager.NetezzaManager: This transfer can be faster! Use the --direct
15/03/02 18:33:12 WARN manager.NetezzaManager: option to exercise a Netezza-specific fast path.
15/03/02 18:33:12 WARN manager.NetezzaManager: It looks like you are exporting to Netezza in non-batch 
15/03/02 18:33:12 WARN manager.NetezzaManager: mode.  Still this transfer can be made faster! Use the 
15/03/02 18:33:12 WARN manager.NetezzaManager: --batch option to exercise a Netezza-specific fast path.
15/03/02 18:33:12 WARN manager.NetezzaManager: Forcing records per statement to 1 in non batch mode
15/03/02 18:33:12 INFO mapreduce.ExportJobBase: Beginning export of TEST
15/03/02 18:33:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "TEST" AS t WHERE 1=0
15/03/02 18:33:13 INFO input.FileInputFormat: Total input paths to process : 1
15/03/02 18:33:13 INFO input.FileInputFormat: Total input paths to process : 1
15/03/02 18:33:14 INFO mapred.JobClient: Running job: job_201503012223_0010
15/03/02 18:33:15 INFO mapred.JobClient:  map 0% reduce 0%
15/03/02 18:33:24 INFO mapred.JobClient:  map 100% reduce 0%
15/03/02 18:33:24 INFO mapred.JobClient: Job complete: job_201503012223_0010
15/03/02 18:33:24 INFO mapred.JobClient: Counters: 18
15/03/02 18:33:24 INFO mapred.JobClient:   File System Counters
15/03/02 18:33:24 INFO mapred.JobClient:     FILE: BYTES_WRITTEN=191004
15/03/02 18:33:24 INFO mapred.JobClient:     HDFS: BYTES_READ=129
15/03/02 18:33:24 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.JobCounter
15/03/02 18:33:24 INFO mapred.JobClient:     TOTAL_LAUNCHED_MAPS=1
15/03/02 18:33:24 INFO mapred.JobClient:     DATA_LOCAL_MAPS=1
15/03/02 18:33:24 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3834
15/03/02 18:33:24 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
15/03/02 18:33:24 INFO mapred.JobClient:     FALLOW_SLOTS_MILLIS_MAPS=0
15/03/02 18:33:24 INFO mapred.JobClient:     FALLOW_SLOTS_MILLIS_REDUCES=0
15/03/02 18:33:24 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.TaskCounter
15/03/02 18:33:24 INFO mapred.JobClient:     MAP_INPUT_RECORDS=2
15/03/02 18:33:24 INFO mapred.JobClient:     MAP_OUTPUT_RECORDS=2
15/03/02 18:33:24 INFO mapred.JobClient:     SPLIT_RAW_BYTES=122
15/03/02 18:33:24 INFO mapred.JobClient:     SPILLED_RECORDS=0
15/03/02 18:33:24 INFO mapred.JobClient:     CPU_MILLISECONDS=270
15/03/02 18:33:24 INFO mapred.JobClient:     PHYSICAL_MEMORY_BYTES=150945792
15/03/02 18:33:24 INFO mapred.JobClient:     VIRTUAL_MEMORY_BYTES=1757937664
15/03/02 18:33:24 INFO mapred.JobClient:     COMMITTED_HEAP_BYTES=1048576000
15/03/02 18:33:24 INFO mapred.JobClient:   File Input Format Counters 
15/03/02 18:33:24 INFO mapred.JobClient:     Bytes Read=0
15/03/02 18:33:24 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.output.FileOutputFormat$Counter
15/03/02 18:33:24 INFO mapred.JobClient:     BYTES_WRITTEN=0
15/03/02 18:33:24 INFO mapreduce.ExportJobBase: Transferred 129 bytes in 11.1612 seconds (11.5579 bytes/sec)
15/03/02 18:33:24 INFO mapreduce.ExportJobBase: Exported 2 records.

f) Verify if test table in Netezza is populated with values we exported.

SYSTEM.ADMIN(ADMIN)=> select * from test;
 X
---
 3
 2
(2 rows)

Points to remember:

When using sqoop-export make sure the destination table is already present in the RDBMS ,currently sqoop-export cannot create tables in RDBMS.

Conclusion:

After referring the article you should be comfortable importing and exporting the data from Netezza to BigInsights using sqoop.

Join The Discussion

Your email address will not be published. Required fields are marked *