Mobile Monitoring Solutions

Search
Close this search box.

How to integrate Hadoop and Teradata using SQL-H

MMS Founder
MMS RSS

Article originally posted on Data Science Central. Visit Data Science Central

I have tried Hadoop Connector for Teradata, Teradata Connector for Hadoop, Teradata Studio Express, Aster SQL-H, and many more cumbersome alternatives, finally to reach the Hadoop-Teradata integration without purchasing QueryGrid current version. However, without QueryGrid, you cannot do cross-platform querying. Here, we just demonstrate bidirectional data transfer between Teradata and Hadoop. All that I needed for Teradata seamlessly integrate with Hadoop were these:

  1. Hadoop Sandbox 2.1 for VMware (http://hortonworks.com/hdp/downloads)
  2. Teradata Express 15 for VMware (http://downloads.teradata.com/downloads)
  3. Teradata Connector for Hadoop (TDCH) (http://downloads.teradata.com/downloads)
  4. Teradata Studio (http://downloads.teradata.com/downloads)

I didnt need to connect Teradata Aster, because all I needed was querying and data transfer between Hadoop and TD. Here is how it happened:

1. I converted the OVA file I got from Hortonworks Sandbox download page, into a VMX file for running into VMware Server. The command for converting is this ovftool.exe Hortonworks_Sandbox_2.1.ova D:/HDP_2.1_Extracted/HDP_2.1_vmware where HDP_2.1_vmware is the VMDK file extracted. The extraction took an hour on a fast server.

2. I loaded the HDP_2.1_vmware.vmdk into VMware Server by choosing to add a new virtual machine. VMDK file made the VMX as I specified the VM configurations. I chose NAT for network connection, also chose USB driver option for VM. When turning on the VM, it asked the question that SCSI device (USB) is not working so should the VM boot from IDE. Thats the recommended option so I chose it. VM worked, run and I could browse into Hortonworks Sandbox by typing http://sandbox.hortonworks.com:8000. I could also use the port 50070 to access WebHDFS. I just changed the password for hue in the user admin section of the site at http://sandbox.hortonworks.com:8000.

3. Now I needed to install Teradata 15 and Teradata Studio and connect the two. It worked well, and there is a lot of documentation to troubleshoot if anything comes in connecting TD15 to Teradata Studio. When I could not connect TD15 the first time, I got error in Teradata Administrator “Connection Refused”. I just restarted the SUSE Linux OS on which TD 15 VM resides, and I could connect well.

4. Now the last part was to install an RPM file of Teradata Connector for Hadoop (TDCH) in the Hadoop Hortonworks Sandbox I just launched in step 2. For this, I used Putty to connect to HDP2.1 shell. I put the IP designated to sandbox.hortonworks.com in PUTTY, and connected on default port 22. I logged in as root, hadoop as username, password. Then I went to /usr/lib/ . There were installations of java 1.7 , hive, sqoop, etc. I just needed to check that java version is 1.7 or above. Now using FileZilla I transferred TDCH rpm file to /usr/lib. Then I run the command to install rpm rpm -Uvh teradata-connector-1.3.2-hdp2.1.noarch.rpm It installed the rpm as verbose (-v), showing me all the details.

5. Now I needed to run the oozie configurations as specified on the Teradata Studio download page in the installation instuctions. namenode was set to sandbox.hortonworks.com . webHDFS hostname and webHDFS port need not be set as they default to name node and 50070 respectively which works.

6. Now open the Teradata Studio. Add a new database connection. Specify the Hadoop Database credentials including WebHDFS host name: sandbox.hortonworks.com WebHDFS port:50070 username: hue I tested the connection. Firstly ping failed. But after long pause of waiting, which meant that it was in the middle of processing. The java error exception showed “cannot access oozie service”. So I closed the root connection through PUTTY as I was first trying to give username root. I also later closed hue connections online on sandbox.hortonworks.com so that the connection does not get timed out. Then the ping succeeded after a 20 sec pause.

7. Once both Teradata and Hadoop Distributed File System were connected to Teradata Studio, I could transfer data to and from both databases. It is done.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.