Click here to Skip to main content
15,891,431 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want to create a Map function with the following operations:

Step 1:

I have two data sets R and S. I want to partition the two data sets into n equal-sized blocks which can be done by putting every (R/n and S/n )records into one block.

After that:

Step 2: Then every possible pair of blocks (one from R and one from S) is partitioned into a bucket at the end of Map phase so that can be taken from the Reduce Function as input with some id as key for each value pairs. e.g will be
Java
<id:(Sij,Ril)>


So my questions are:

1)Is there any implemented function that I can use for step 1? How implement this operation separately for each data-set.

2)How can I refer specifically to these data sets in step 2 so that I can take one block from R and one from S?

Note: In main I define the two data sets like this :
Java
FileInputFormat.setInputPaths(conf, new Path(args[0]), new Path(args[1]));
Posted
Comments
TorstenH. 4-Apr-14 7:37am    
Are you talinkg about a geographical map[^] or are you talking about the datatype Map[^] ?
User3490 4-Apr-14 7:44am    
Actually I talk about MapReduce https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v1.0
I try to implement something like this.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900