By Ankhimita Paul Choudhury & Sunayan Saikia
As you might already know, Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases (MySql, Oracle, Netezza, etc.).
As we hacked into Sqoop, an interesting thing that we found is the plugin framework that it supports, which lets us create our own custom tool in Sqoop to function like any other inbuilt tools (commands) that Sqoop has, such as import, export, create-hive-table, list-tables, etc. Creating a custom tool enables us to implement our own logic into Sqoop as per our needs.
You can use the following steps as a guide to developing your own Sqoop plugin from scratch.
If we want to have a custom tool in Apache Sqoop (apart from the tools already provided by Sqoop, such as import tool, export tool, create-hive-table tool, etc.) to implement our own specific requirements, we must then create a Sqoop plugin to have this special tool that contains all of the features that we require for our use case.
That basically involves designing our classes under the realm of norms defined in the plugin architecture that Sqoop needs us to follow. The plugin that we create will have to have a base class (plugin class), which will contain the fully developed custom tool within it.
The current stable version of Sqoop, being 1.4.6, is missing some of the following features:
We can implement anything of that sort using our own custom sqoop plugin. The reason we are considering the aforementioned Sqoop version for our example is that – at the time we are writing this article – most of the Hadoop distros such as CDH, HDP, MapR officially supports only Sqoop 1.4.6.
The creation and implementation of a Sqoop plugin is illustrated in the following 6 steps:
Tip: If you are using Maven to build the project and want to do a local installation of the Sqoop dependency, download the Sqoop jar (here’s a link to download) and use the following command:mvn install:install-file -Dfile=<location-of-the-sqoop-jar> -DgroupId=org.apache.sqoop -DartifactId=sqoop -Dversion=1.4.6 -Dpackaging=JAR -DgeneratePom=true
Then we need to put the following dependency into the Maven pom.xml (resolved from either local Maven repository or central Maven repository):<dependency>
<groupId>org.apache.sqoop</groupId>
<artifactId>sqoop</artifactId>
<version>1.4.6</version>
</dependency>
Create a class with a name that ends with the word ‘Tool’ just to provide the context that it is the custom tool class (for example, AbcTool class). This custom tool class will be responsible for performing the functions that you require.
What does the custom tool class need to inherit and which methods are mandated be overridden?
BaseSqoopTool is the base class for all Sqoop Tools. So, if you intend to develop a custom tool, you need to make sure your custom tool class extends from the org.apache.sqoop.tool.BaseSqoopTool class and overrides the run(SqoopOptions options) method:
public int run(SqoopOptions options): This method acts as an entry point for execution for your custom tool.
NOTE: The following two points related to the working of the custom tool need careful consideration before proceeding:
Tip: Please note that user-defined custom options support is quite not in a working state in Sqoop v1.4.6 (even though a bug has been resolved in Sqoop v1.4.7 with regard to this, it is not available for us until CDH, HDP and MapR Hadoop Distros ship it with them). Instead, you will want to leverage the Sqoop generic arguments to pass in your custom values. The values for the generic arguments are set as configuration in Sqoop’s data transfer object and so is available throughout all the classes in Sqoop. You can set the arguments as like: -D <key>=<value>. You do not need to leverage these if you do not have any custom values to pass to Sqoop.
You can create as many classes as you want your custom tool class to be dependent on.
Create a class with a name ideally ending with the word “Plugin” to provide the essence that it is the user-defined plugin class (for example, AbcPluginclass). This is the wrapper class that en-wraps the tool class and other dependency classes as the plugin.
What does the user-defined tool plugin class need to inherit and which methods are mandated to be overridden?
ToolPlugin is the base class for Plugin. So, your custom tool plugin class should extend from org.apache.sqoop.tool.ToolPlugin and override the getTools() method. The user-defined tool plugin class is basically needed to en-wrap the custom tool with it, as already mentioned aforehand. The plugin implementation after overriding the getTools() methods should look somewhat like:public class AbcPlugin extends org.apache.sqoop.tool.ToolPlugin {
@Override
public List<ToolDesc> getTools() {
return Collections
.singletonList(new ToolDesc(
“put-name-of-command-here”,
AbcTool.class,
“Put description of the command here”));
}
This is the final step which involves registering the plugin class with Sqoop.
The following steps will make sure you accomplish this right:
This diagram is provided for the easy visualization of the inter-relations among the classes that we require for the implementation of a Sqoop plugin.
The above steps we followed so far helps us create a custom plugin. Having a Manager Factory which allows us to use a Custom Connection Manager for an RDBMS is another optional powerful thing we can accomplish. This way the Sqoop plugin you create can also enhance Sqoop to do other things such as ‘listing schema’ if required (implementing a listing schema feature is beyond the scope of this article).
And you’re done fabricating your own custom Connection Manager. Cool, isn’t it?
If your plugin is successfully registered after following the above steps, when you type ‘sqoop help’ in the terminal, you’ll be able to see your plugin getting registered something like what’s being highlighted in below image. Here, we are assuming you have named your plugin ‘db-import’.
News By: Team Zaloni
Blogs By: Matthew Caspento
Blogs By: Haley Teeples