Click here to Skip to main content
15,860,972 members
Everything / Hadoop

Hadoop

hadoop

Great Reads

by Bert O Neill
Query Hadoop using Microsoft oriented technologies (C#, SSIS, SQL Server, Excel etc.)
by Fazlur Rahman
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
by Suffyan Asad
How to implement Joins in Hadoop Map-Reduce applications during Reduce and Map phases
by Vladimir Dorokhov
Design and development simple analytics system using Lambda Architecture principles and Microsoft Azure cloud

Latest Articles

by Fazlur Rahman
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
by YegorDovganich
Following 'Infrastructure as Code' rules we get a real project sample from the scratch which describes EMR cluster deploying and running Hive script there. It describes Analyze Big Data with Hadoop project from AWS 'Learn to Build' section.
by Mahsa Hassankashi
It is almost everything about big data.
by Michael_Churchman
Alibaba Cloud offers a range of Big Data solutions. This article outlines them and explains which types of Big Data services on the Alibaba Cloud align with various workloads.

All Articles

Sort by Score

Hadoop 

29 Dec 2015 by Bert O Neill
Query Hadoop using Microsoft oriented technologies (C#, SSIS, SQL Server, Excel etc.)
22 May 2022 by Fazlur Rahman
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
29 Jan 2015 by Suffyan Asad
How to implement Joins in Hadoop Map-Reduce applications during Reduce and Map phases
23 Mar 2017 by Vladimir Dorokhov
Design and development simple analytics system using Lambda Architecture principles and Microsoft Azure cloud
26 Mar 2016 by Amit Kumar Tiwari
.NET to Hadoop connection using Keytab file
5 Jun 2019 by YegorDovganich
Following 'Infrastructure as Code' rules we get a real project sample from the scratch which describes EMR cluster deploying and running Hive script there. It describes Analyze Big Data with Hadoop project from AWS 'Learn to Build' section.
13 Feb 2017 by Fazlur Rahman
Step by step procedure to install NetBeans on Ubuntu 16.04 operating system with Hadoop 2.7.3 version. This may work for any other versions of Hadoop and Ubuntu.
13 Dec 2017 by Michael_Churchman
Alibaba Cloud offers a range of Big Data solutions. This article outlines them and explains which types of Big Data services on the Alibaba Cloud align with various workloads.
11 Jul 2021 by Patrice T
Quote: I dont understand why. Please explain Simple, you have mixed spaces and tab differently from previous line. p="foo foo quux labs foo barquux".split() d={} s=[] count=1 for x in p: if x not in s: d.update({x:count}) # this line...
25 Apr 2016 by Richard MacCutchan
See Visual Studio and Windows SDK Command Prompts[^].
7 Jan 2017 by SrikantSahu
This tip gives basic commands to import table from Mysql to Hadoop File system and Import the files from HDFS back to Mysql.
5 May 2019 by Kornfeld Eliyahu Peter
Impala and Spark are two separate SQL engines for use with Hadoop... One can not use features from the other!!! So, no if you use Impala there is no Spark, if you use Spark there is no Impala...
21 Sep 2014 by Member 11097824
While installing hadoop in windows 8.1 pro and was ready to run mapreduce I got this error message.Unable to make directory and further more errors are mentioned below.-mkdir: java.net.URISyntaxException: Illegal character in hostname at index...
21 Nov 2014 by midhun3600
Hi,I am new to hadoop. I have managed to install and use hadoop HDFS,Hive. I am able to fetch data and insert data into hive using talend.My problem is when ever we create a table from talend (distribution: apache) it is creating in hive but i am unable to see the same in hive...
3 Dec 2014 by Syncfusion
With the Syncfusion Big Data Platform, you have complete access to the Hadoop environment. By adopting our platform, you are using an industry-tested solution currently employed by companies such as Microsoft, Facebook, Amazon, Adobe, Hulu, LinkedIn, and Yahoo.
12 Dec 2014 by ZurdoDev
The way this site works is we volunteer our time to help people that have gotten stuck on a specific code issue.In this case, you seem to be asking for someone to do everything for you and we don't do that.
31 Dec 2014 by kadriu
If you have JDK 8.x installed, uninstall and install JDK 7.x. This worked for me.
9 Jan 2015 by midhun3600
Hi,I am very new to Hadoop and some how we managed to install it with apache distribution and Derby database.My requirement is i need multi users to access hive at a time. But right now we are only able to work single user at a time.I searched some of the blogs but haven't found the...
14 Jan 2015 by midhun3600
Hi,I am trying to create a hadoop table and load data into using talend.I have successfully created table but was unable to load data to it.while i execute talend job i am getting following error.========================================================FAILED: Error in semantic...
16 Feb 2015 by Member 11456117
We are getting some warnings in our mapreduce job while reading and writing data from datanode, it is not aborting the job though. This error comes up at several places in the job. Looks like an issue with timeout variables in hdfs-site.xml and hbase-site.xml files.What timeout values should...
7 Mar 2015 by mibetty
Im trying to install hadoop single node.When I do start-all.sh name node and job tracker dont start.Do you see in my files what can be be wrong so Im having this result?Result of hadoop jps command:14878 Jps14823 TaskTracker14605 SecondaryNameNode14456...
12 Mar 2015 by RkRkRkRkk
(CAQuietExec: WINPKG: Unzip of C:\HadoopInstallFiles\HadoopPackages\hdp-2.1.3.0-winpkg.zip to C:\HadoopInstallFiles\HadoopPackages succeededCAQuietExec: WINPKG: UnzipRoot: C:\HadoopInstallFiles\HadoopPackages\hdp-2.1.3.0-winpkgCAQuietExec: WINPKG:...
20 Mar 2015 by Rabbits Foot
I am struggling for getting my HBase shell running. It throws me the above exception in subject line. I have checked that hbase-site.xml matches perfectly with hadoop one.Please help. I am struggling for 2 days and have a project due. I am attaching the two xml files of hadoop and...
17 Apr 2015 by Flowra white
I want to open hadoop source code as a project in Eclipse for the purpose of developing and studying.
10 May 2015 by Sergey Alexandrovich Kryukov
There is no definition of "good technology". Only you can decide what's good for you.If you only want to choose something, no matter what, just to be on top of things, I'm afraid you are at wrong forum. This is the forum primarily oriented to professionals (even though some are students at...
10 May 2015 by Afzaal Ahmad Zeeshan
Hello Rajasekhar, I would give you an overview of two paths that you are looking at. First one is the new one that you want to move yourself into. Second path is the one you are already on. So, coming to the first one. If you seriously want to switch your career field, from one to...
15 May 2015 by Member 11694565
I am trying to set hadoop in single-cluster node. And I need to create tables in hive and hbase inorder to handle the tables using c#.I have cygwin,hadoop-1.2.1 and hive-1.1.0 on windows 7 32bit.Running hadoop, it gives "Warning: $HADOOP_HOME is deprecated." still it works!!But when...
20 Aug 2015 by Sofia Panagiotidi
I have a cluster made by two slaves and one master and set up and I submit a jar (scala) to the spark master (192.168.1.64):spark-submit --master spark://spark-master:7077 --class tests.elements target/scala-2.10/zzz-project_2.10-1.0.jarAfter quite sometime running just fine it stops...
3 Oct 2015 by Mehdi Gholam
Start here : http://harishshan.blogspot.co.uk/2014/10/install-hadoop-251-on-windows-7-64bit.html[^]
3 Oct 2015 by Afzaal Ahmad Zeeshan
Setting up JAVA_HOME variable is a first-step for any application or program that requires JDK to work with. There are many tutorials already provided, but I will try to provide the ones that suffice your needs and are standard based.Installing the JDK Software and Setting JAVA_HOME[^] (From...
14 Oct 2015 by Member 12059854
public class MaxMinReducer extends Reducer {int max_sum=0; int mean=0;int count=0;Text max_occured_key=new Text();Text mean_key=new Text("Mean : ");Text count_key=new Text("Count : ");int min_sum=Integer.MAX_VALUE; Text min_occured_key=new Text(); public void reduce(Text key,...
5 Dec 2015 by mohitjain012
I was learning hadoop and I come to a doubt :Every slave node consists of a data node and task tracker, every data node consists of data blocks. Suppose we have a data node which has which has 10 data blocks of each size 64 MB.How the data of a data node is processed inside a slave node?...
3 Feb 2016 by Member 11726267
HI I am trying to install hadoop on windowsI am looking for the correct path for downloading the google-gson-2.2.4-release.zip file.I downloaded the file from couple of sites but not able to see the jar's files in the zip folder. I have only html,java,class files when extracted the...
18 Feb 2016 by Simon Elliston Ball
How to use NiFi to write to HDFS on the Hortonworks Sandbox
27 Feb 2016 by rehabrish
I have a topology running with parallelism as (1,8,1)(spout,logic bolt, write bolt) with number of ackers set as 12( 12 are available slots in my cluster). The max spout pending is 200 and timeout.secs is 200. I have to process 14 lac inputs.My cluster consist of 1 nimbus & 3 supervisors (...
27 Feb 2016 by Patrice T
Problem can be anywhere. You have to define/search where is the bottle neck.Your network can be in downgraded mode because of bad wiring or bad switch.Computers can be slowed down because of lack of memory.Your programs can be artificially complicated or not optimized.It can be...
4 Mar 2016 by Chendur Srinivasan
I'm self learning Hadoop and started of with installing Cloudera QuickStart on a VMware Workstation running CENT OS.I was under the impression that Quickstart VM has most the of configurations predefined. Do I need to set up any other configurations to set up data and name node? Reason being...
20 Jul 2017 by Ailsa Harvey
Hi BuddyHow do we choose the right Hadoop distribution from the numerous options that would serve our purpose? Not all of the Hadoop distributions have the common components (but, they all consists of Hadoop’s core capabilities.ThanksWhat I have tried:I have tried to choose the...
5 May 2016 by George Jonsson
Here you can find information about different distributions: Welcome to Apache™ Hadoop®![^]Here you have a discussion forum for Hadoop: Discuss Hadoop[^]I guess your specific choice depends on your requirements.
6 May 2016 by Mankuji87
Hi Ailsa i refer some helpful link.I hope it will help youSpoilt for Choice – How to choose the right Big Data / Hadoop Platform?[^]How to Choose a Hadoop Distribution - For Dummies[^]How to Choose the Right Hadoop Distribution?[^]Top 3 Hadoop distributions, which is right for...
12 Apr 2017 by Sukanya Karri
0 down vote favorite my input Department Jan_sal Feb_sal Mar_sal civil 1 5 5 mech 2 7 2 civil 3 8 9 mech 6 4 4 mech 5...
12 Apr 2017 by Intel
BigDL is a distributed deep learning library for Apache Spark. With BigDL, users can write their deep learning applications as standard Spark programs, which can run directly on top of existing Spark or Hadoop clusters.
29 May 2017 by Jayaprakash Manchi
For example, Let me explain it in detail. https://i.stack.imgur.com/DIlIT.png Like this data will be there in excel sheet as shown above with n number of rows typically huge data. Now we need to filter the column status with output as in different excel sheets or in same workbook as given...
15 Sep 2017 by Member 13258163
I have setup a single node hadoop cluster (2.71.1) on windows 7 and now trying to establish it's connection with Azure storage (wasb) with no success. I am getting the error: No FileSystem for scheme: wasb I have been following several blogs but was focused on : articles/hadoopAndWasb.md at...
20 Jul 2017 by Eshika Roy
It’s all depends on your work and working environment There are 3 most usable distributions. Cloudera - you can choose when you need support from cloudera. They will charge for service-- partially open source Hortonworks - fully open source and user friendly (processing speed slow if you...
20 Jul 2017 by Eshika Roy
First you need to follow some steps for enable WASB on Hadoop • We need to create an account on windows azure. • Than take service • Than we need to implement Hadoop. Follow this to better understanding:...
15 Sep 2017 by Leviya bl
hi here is the step by step process [^]WASB is automatically enabled in HDInsight clusters. But you can also mount a blob storage account manually to a [^]Hadoop Administration instance that lives anywhere as long as it has Internet access to the blob storage. Here are the steps: I assume...
5 Jan 2018 by Member 13609332
here is the solution for the above problem select d.department, case when (d.maxJan>=d.maxFeb) and (d.maxJan>=d.maxMarch) then 'Jan' when (d.maxFeb>=d.maxJan) and (d.maxFeb>=d.maxMarch) then 'Feb' when (d.maxMarch>=d.maxJan) ...
4 Apr 2018 by Member Hemal
You have to give all of your source files to javac Example: javac -classpath /usr/local/hadoop/hadoop-core-1.0.4.jar -sourcepath src/ -d build/ MyMain.java
28 May 2018 by Bata Omou
i try to execute and compile this code java mapreduce on my eclipse in local, but this probleme is showed up please help where is the issue? and this is the error showed up: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where...
28 May 2018 by Jochen Arndt
Quote: the line error 63 is about the output format: FileOutputFormat.setOutputPath(conf, new Path(args[1])); and the error message is java.lang.ArrayIndexOutOfBoundsException So there is no second command line argument present when executing the application. You have to execute the...
28 May 2018 by Bata Omou
yeah thank you realy it was the probleme that i didn't made a outputPath but it showed me another error alawys about native librery haddoop and another one: 2018-05-28 16:27:24,687 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop...
16 Jun 2018 by Richard MacCutchan
The data contains an item that is not a number, so you need to strip that out of your list before trying to convert.
27 Jul 2018 by anjitaa
"The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting. There are 2 approaches to fulfill the same. 1. If reducers will get all the value for a particular key and buffer...
28 Jul 2018 by anjitaa
What do you understand by Word Count implementation via Hadoop framework? Explain in detail What I have tried: I am not able to implement the Word Count implementation via the Hadoop framework?
28 Jul 2018 by Bansal himani
"Word Count Implementation will be as follows: For ex: Input File 1 contains data: “This is December Month.” Input File 2 contains data: “December is the last month of the year.” Step 1: Mapper will generate the following below output: Input File 1 output ...
16 Aug 2018 by anjitaa
"To enable the trash feature and to set the time delay for the trash removal in Hadoop, we have to edit the fs.trash.interval property in core-site.xml to the delay (and this has to be in minutes). Ex: if you want users to have 10 hours (600 minutes) to restore a deleted file, you should specify...
20 Aug 2018 by patelsandeep
How we can configure Hadoop to reuse JVM for mappers? What I have tried: I am not able to configure Hadoop to reuse JVM for mappers
20 Aug 2018 by anjitaa
"To configure Hadoop to reuse JVM for mappers, we just need to add entry in the configuration file: $HADOOP_HOME/conf/mapred-site.xml mapred.job.reuse.jvm.num.tasks -1 We need to specify a number value how many times the JVM is to be reused...
5 May 2019 by Jackie Lloyd
Could somebody please help me with this query :). We use Impala to query data, with Sentry to restrict access to data at column level. We use Spark to write code to query data stored in files. My understanding is that Sentry roles cannot control access at column level when used with Spark....
12 Dec 2020 by BedantBiswal
Below is my query which takes around 5k mappers and 1k reducers and time taken is around 2.2 hours to finish. Any scope of optimization in here? What I have tried: SELECT sum(B.item_net_amount) net_amount, sum(B.item_gross_amount) gross_amount,...
25 Jul 2021 by mgjsa
Hi, I have written a hive query language as below. It is giving me error as written in title. the query is : select clnt_nbr, case when clnt_nbr in (select clnt_NBR from crd_master where crd_typ = '198 or crd_typ = '199' ) then 1 else 0 end) as...
3 Aug 2021 by Amel Hadfi
I've been able to use Sqoop & Flume import commands perfectly fine on Ubuntu terminal. But right now, I'm trying to do so on Jupyter notebook. 1) How can I import from MySQL to HDFS using Sqoop command on Jupyter notebook? 2) what is Flume...
25 Mar 2022 by Viswanath Sitaraman
I'm trying to convert a piece of SQL code to HiveQL, and it's not working as expected. Please find below the code snippet in SQL that I'm attempting to convert: SQL Code:UPDATE C SET C.prod_l = P.prod_l, C.numprod = P.numprod, C.prod_cng...
24 Jan 2023 by User 2753469
It's 2023 now and if you have linux and a program called 'alternatives' you can use the cmd $> alternatives --config java to find path to java versions on your machine and this program lets you choose which version you want to use if you...
3 Apr 2019 by Mahsa Hassankashi
27 Dec 2015 by Mallanagouda Patil
This article helps to setup debug environment for hadoop framework on Linux Ubuntu using IntelliJ IDEA
21 Jan 2017 by Fazlur Rahman
What is Big Data and how Hadoop been introduced to overcome the problems associated with Big Data?
12 Sep 2017 by Mahsa Hassankashi
This article is the most complete essay about big data from scratch to practical.
16 Jun 2018 by LearningSpark
Hi All, I am New to Big Data World.need urs help to make it real.here is myquestion I am Reading data from txt file(1,2,3,4,4,4,4) var file=sc.textFile("file:///home/cloudera/MyData/Lab1/numbers.txt") var number=file.flatMap(line=>line.split(",")) var...
10 Aug 2018 by Richard MacCutchan
enable Trash/Recycle Bin in Hadoop - Google Search[^]
16 Mar 2015 by Suffyan Asad
Implementing joins in Hadoop Map-Reduce applications during Map-phase using MapFiles
11 Jul 2021 by Abhijit Dare
p="foo foo quux labs foo barquux".split() d={} s=[] count=1 for x in p: if x not in s: d.update({x:count}) s.append(x) else: d[x]+=1 print(d) What I have tried: Hello In this program, I intend to count the occurrence of each...
25 Apr 2016 by Member 11842305
I have Installed Windows SDK on windows 10 from herehttps://developer.microsoft.com/en-us/windows/downloads/windows-10-sdkBut I am unable to open Windows SDK command prompt to run my maven commands to install hadoop. I have searched online but didn't find anything useful. Please...
3 Oct 2015 by Member 12029885
i can't run hadoop exe file it error comes java_home is incorrectly set
25 Sep 2015 by ravi30713
Problem replicating config (bundle) to search peer 'myserver.com:8089',Reading reply to upload: rv=-2, Receive from=https://myserver.com:8089 timed out; exceeded 60sec, as per=distsearch.conf/[replicationSettings]/sendRcvTimeout
26 Sep 2014 by Member 8899038
While running Recipe.java, getting error that Mapper, Job package is not there.
25 Jul 2018 by anjitaa
Loaded library lib-native-libhadoop.so.1.0.0 might have disabled stack guard. How to resolve it? What I have tried: I have tried loaded library lib-native-libhadoop I think 1.0.0 might have disabled stack guard
1 Aug 2018 by patelsandeep
During execution of MapReduce jobs how to overwrite an existing output file/dir ? What I have tried: I am working on a MapReduce project and need to overwrite an existing output. I'm unaware of the procedure?
30 Dec 2014 by Mansoor Alikhan K
Microsoft HDInsight Emulator for Windows Azure installation via WPI 5.0 returns installation not successfully: fatal errorError Logs are here=== Verbose logging started: 06/Dec/14 19:16:34 Build type: SHIP UNICODE 5.00.7601.00 Calling process: C:\Program Files\Microsoft\Web Platform...
6 Mar 2015 by Mehdi_S
Hi,I have been trying to install HDInsight on a windows platform but without success. I'm wondering if there is a clear procedure to install it, which version of windows it is compatible with and if there is a direct link to download it (without using the web platform installer.Thank you...
6 Mar 2015 by BillWoodruff
http://azure.microsoft.com/en-...
9 Apr 2015 by shivendrapandey
actually I want to write a code that uses hash-table for storing the data just before we process, I have Mapper output.but before we process this data I want to store it in hash-table(in Reducer) ..but I am not able to write the,
24 Jan 2023 by Member 11622664
hi i am unable to set the java home for hadoop during the installation of hadoop
3 Jul 2015 by Member 11402033
I am trying to manage database of the android app. will it be good to use hadoop with mysql database for the android app
22 Sep 2015 by anto_bernad
Its urgent guys............... i need to know how to configure eclipse for hadoop in linux .. can anyone suggest me a link to download a eclipse plugin
23 Sep 2015 by Justin Zh.
Hi, all!Here is some information:Windows 10 with VMware 12Ubuntu 14.04.3 LTS with VMware tools.JDK1.8.0_60HADOOP-2.7.1It works perfectly when I try to process the job on HDFS of the Pseudo-Distributed Hadoop (without Yarn, and the job is done in several seconds). Once I have set...
18 Oct 2015 by Saman With You
Hello,We are going to start a research about data mining in our company. We've chosen Cassandra as our data store. I've heard that R tool is used for data mining too. But I don't know how I can relate these to each other? Would Cassandra be enough to do data mining or we have to use R or any...
3 Nov 2015 by sunny_sharma123
Hello, I am trying to setup a multi node cluster of hadoop using two systems. Whenever I tried to format the hdfs there will be NullPointerException occurs. I am not happy to see this code again and again. If any one have solution of this then please reply...
25 Jul 2018 by kasliwal aayush
"This error could be due to wrong JDK package. Hadoop runs on 64 bit ..so try to uninstall 32bit JDK and install 64 bit JDK8 Please add following variables to .bashrc environment file, export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_ROOT/lib/native export...
27 Jul 2018 by Bansal himani
How to sort intermediate output based on values in MapReduce ? What I have tried: How to sort intermediate output based on values in MapReduce?
31 Jul 2018 by anjitaa
" No, it is not feasible given the distributed architecture of HDFS. If ‘n’ no of clients process read/write requests simultaneously, then it will increase overhead on Namenode.To avoid these bottlenecks, a distributed system of a computing architecture in master-slave fashion is proposed. "
16 Aug 2018 by Bansal himani
How can I enable Trash/Recycle Bin in Hadoop? What I have tried: I was not being able to enable Trash/Recycle Bin in Hadoop
15 Jul 2021 by Dasisqo
format your code here Pythoniter - Pretty Python Online Formatter[python code formatter]