DE02-NY

Core Java

Project name is CCSystem.
Open project in your IDE and be sure to change the database parameters.
- oo In database package go to DatabaseConnection.java file and change database name as well as username and password if needed.
Clean and Build project
Run main.java file

RDBMS/mySQL Description :

To create tables and insert data for this project, run following sql file: /MappingDocuments/CASESTUDY.sql
This file will create database for you named CASESTUDY

Hadoop/hdfs/dataware housing

In HDFS we are using maria_dev user. Therefore data transferred from relational database with Sqoop is stored in /user/maria_dev/Credit_Card_System

Hive and Partition

Hive imports consist of four tables (Branch, Credit Card, Time and Customer). Each table is external table that loads data files imported by sqoop.
CreateBranchTable.sql
- oo This file creates Branch table called CDW_SAPP_D_BRANCH
- oo Loads data from /user/maria_dev/Credit_Card_System/Branch
CreateCreditCardTable.sql
- oo This is where dynamic partitioning is done.
- oo Partitioning is done on TRANSACTION_TYPE field.
- oo There are two tables created. CDW_SAPP_F_CREDIT_CARD and TEMP_CDW_SAPP_F_CREDIT_CARD in order to insert partitioned data from TEMP table to original one.
- oo Loads partitioned data from /user/maria_dev/Credit_Card_System/PartitionedCreditCard
CreateCustomerTable.sql
- oo This file creates Branch table called CDW_SAPP_D_CUSTOMER
- oo Loads data from /user/maria_dev/Credit_Card_System/Customer
CreateTimeTable.sql
- oo This file creates Branch table called CDW_SAPP_D_TIME
- oo Loads data from /user/maria_dev/Credit_Card_System/TimeID
IncrementalCreateCreditCard.sql
- oo This file insert new data in incremental way.
- oo It is used in oozie with coordinators for incremental update.

Oozie (Sqoop and Hive)

Before running any sqoop jobs it is important to run sqoop metastore.
In /SqoopImport directory there is file called sqoopjobs.sh which is shell script that will create all Sqoop jobs.
Transfer sqoopjobs.sh file to your local path
Run following command as root user: sudo chmod 777 /user/maria_dev/sqoopjobs.sh
Run it: ./sqoopjobs.sh
Transfer directory /OozieWorkflow and /HiveImports to both local and hdfs path:
- - Type: hadoop fs --put OozieWorkflow/ /user/maria_dev/
  - Type: hadoop fs --put HiveImports/ /user/maria_dev/
--Upload java-json.jar file:
- oo There will be file java-json.jar
- oo Use Ambari to upload that file to /user/oozie/share/lib/ lib_******* /sqoop/
- oo Change lib_******* to your directory name
--Before running Oozie import mysql database required for this project. File to insert is in /MappingDocuments/CASESTUDY.sql
Run this command to start Oozie to create Hive tables and import data from relational database:
- oo oozie job --oozie http://localhost:11000/oozie -config /home/maria_dev/OozieWorkflow/Initialize/job.properties -run

Oozie (Sqoop and Hive optimized)

After Initialized Oozie workflow, we can run Incremental Oozie workflow for incremental update
We are running same sqoop jobs and one additional hive query for updating data with dynamic partitioning.
To run optimized oozie run:
- oo oozie job --oozie http://localhost:11000/oozie -config /home/maria_dev/OozieWorkflow/Incremental/job.properties -run

Visualization

In /HiveVisualization directory there are two file firstQuery.q and secondQuery.q

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CCSystem		CCSystem
HiveImports		HiveImports
HiveVisualization		HiveVisualization
MappingDocuments		MappingDocuments
OozieWorkflow		OozieWorkflow
SqoopImports		SqoopImports
README.md		README.md