Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#CopybookInputFormat
===============
##Overview
# CopybookInputFormat

## Overview
This project has a collections of tools to allow you to read directly from copybook data files in HDFS, using Map/Reduce, Hive, or Spark

Here is what is in this project:
Expand All @@ -12,19 +12,22 @@ Here is what is in this project:
* mapreduce.InputFormat & RecordReader: This is an mapped implementation of FileInputFormat and RecordReader.
* Spark Exampl: An example of how to read a cpl data from with Spark.

##Build
## Build
JRecord is not on a public maven repo so I have included the JRecord jars. To build you have to put these jars in your local repo under the following folders

```bash
~/.m2/repository/net/sf/JRecord/JRecord/0.80/JRecord-0.80.jar

~/.m2/repository/net/sf/cb2xml/cb2xml/1.0/cb2xml-1.0.jar
```

After you do that just do maven package and use target/copybookInputFormat.jar

##Credits
Sekou Mckissick, Susan Greslik, Gwen Shapira, Jeremy Beard, and Ted Malaska

##Internal Notes
```bash
java -jar copybookInputFormat.jar GenHiveCreateTable example.cbl createTable.hql exampleTable /user/root/exampleTable /tmp/example.cbl

hive -f createTable.hql
Expand All @@ -48,16 +51,20 @@ select * from exampleTable;
select * from exampleTable where user_id > '570'

hadoop jar SparkCopybookExample.jar com.cloudera.sa.copybook.spark.CopybookSparkExample spark://{host}:7077 hdfs://{host}:8020/tmp/example.cbl hdfs://{host}:8020/user/root/exampleTable hdfs://{host}:8020/user/root/op2
```

or

```bash
java -cp SparkCopybookExample.jar com.cloudera.sa.copybook.spark.CopybookSparkExample spark://{host}:7077 hdfs://{host}:8020/tmp/example.cbl hdfs://{host}:8020/user/root/exampleTable hdfs://{host}:8020/user/root/op3
```

##Extra Notes
## Extra Notes
```xml
<property>
<name>hive.aux.jars.path</name>
<value>hdfs:///user/root/copybook-0.0.1-SNAPSHOT.jar</value>
</property>

```