|I have got this assignment question on database. Not getting any idea of how to solve this. Can anyone please help me ?
“Mr. A” and “Mr. B” data warehousing experts working for “XYZ” company, currently they are developing ETL-Validator framework for big-data technology i.e. Validating data between RDBMS (Mysql/ Oracle / DB2) and Hadoop ( HDFS/ Hive).
Source database (RDBMS) constains millions of records and all the records from source are already migrated to target database (Hadoop - Hive).
They need your help in implementing following scenario's
A. Column Level comparision between source and Target Database (i.e. Comparing each column of source Database with each column of Target Database ) .
Now your task is to :
1. Assume suitable database on source side and design table structure(student / retail banking /telecommunication / insurance , any other) for the same having atleast ten columns.
2. Assuming that buffer size = 500, propose efficient strategy to reduce the number of comparision between source and target columns records.
3. Write SQL query for the solution proposed in step#2.
4. Draw query tree for the query of step#3.
5. Write psudo code or program ( Java / C# ) for proposed solution in step#2 and step #3.
B. As foreign key constraint in not implemented in Target Database (Hadoop – Hive ) , implement foreign key validator for target database.
1. Assuming that table used in #A.1 is already present in target DB, now construct one more table on target side which references to primary key of table used in #A.1
2. Assuming that buffer size=500, propose efficient strategy with min. Comparision to validate foreign key constraint.