Click here to Skip to main content
15,892,005 members
Everything / Artificial Intelligence / Big Data

Big Data

big-data

Great Reads

by Ilia Reznik, Vladimir Shatalov
How to classify articles on Wikipedia using XML dump
by Sacha Barber
Looking at Spark/Cassandra working together
by Ryan Scott White
Converts past and real-time stock market tick data into time-sliced summaries called Briefs
by Joezer BH
Explains the benefits of using the command line in the large folder delete case and shows an example of the syntax

Latest Articles

by Fazlur Rahman
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
by Ryan Scott White
Converts past and real-time stock market tick data into time-sliced summaries called Briefs
by Ilia Reznik, Vladimir Shatalov
Parser for Wikipedia pages from XML dump is presented. Extraction of biographical data and categories with their parents is shown as an example.
by Ilia Reznik, Vladimir Shatalov
How to classify articles on Wikipedia using XML dump

All Articles

Sort by Updated

Big Data 

18 Jan 2017 by Alibaba Cloud
This post features a basic introduction to Machine Learning. This post on Machine Learning will not only help you to understand the latest trends in the Internet industry, but increase your understanding of the technology that plays a major role in many services that make our lives easier.
18 Jan 2017 by Alibaba Cloud
Cloud and the Era of AR/VR Technology: What's Next
18 Jan 2017 by Alibaba Cloud
Connected devices – popularly known as the Internet of Things (IoT) or ubiquitous computing - represent a tremendous potential for the enhancement of social and business life, and a brand new frontier for market growth
18 Jan 2017 by Alibaba Cloud
Here are five top tips from our expert team to help you maximize the benefits of your cloud infrastructure.
14 Feb 2017 by Alibaba Cloud
This post features a basic introduction to machine learning (ML). You don’t need any prior knowledge about ML to get the best out of this article. Before getting started, let’s address this question: "Is ML so important that I really need to read this post?"
25 Jul 2018 by anjitaa
Loaded library lib-native-libhadoop.so.1.0.0 might have disabled stack guard. How to resolve it? What I have tried: I have tried loaded library lib-native-libhadoop I think 1.0.0 might have disabled stack guard
27 Jul 2018 by anjitaa
"The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting. There are 2 approaches to fulfill the same. 1. If reducers will get all the value for a particular key and buffer...
28 Jul 2018 by anjitaa
What do you understand by Word Count implementation via Hadoop framework? Explain in detail What I have tried: I am not able to implement the Word Count implementation via the Hadoop framework?
31 Jul 2018 by anjitaa
" No, it is not feasible given the distributed architecture of HDFS. If ‘n’ no of clients process read/write requests simultaneously, then it will increase overhead on Namenode.To avoid these bottlenecks, a distributed system of a computing architecture in master-slave fashion is proposed. "
16 Aug 2018 by anjitaa
"To enable the trash feature and to set the time delay for the trash removal in Hadoop, we have to edit the fs.trash.interval property in core-site.xml to the delay (and this has to be in minutes). Ex: if you want users to have 10 hours (600 minutes) to restore a deleted file, you should specify...
20 Aug 2018 by anjitaa
"To configure Hadoop to reuse JVM for mappers, we just need to add entry in the configuration file: $HADOOP_HOME/conf/mapred-site.xml mapred.job.reuse.jvm.num.tasks -1 We need to specify a number value how many times the JVM is to be reused...
27 Jul 2018 by Bansal himani
How to sort intermediate output based on values in MapReduce ? What I have tried: How to sort intermediate output based on values in MapReduce?
28 Jul 2018 by Bansal himani
"Word Count Implementation will be as follows: For ex: Input File 1 contains data: “This is December Month.” Input File 2 contains data: “December is the last month of the year.” Step 1: Mapper will generate the following below output: Input File 1 output ...
16 Aug 2018 by Bansal himani
How can I enable Trash/Recycle Bin in Hadoop? What I have tried: I was not being able to enable Trash/Recycle Bin in Hadoop
12 Dec 2020 by BedantBiswal
Below is my query which takes around 5k mappers and 1k reducers and time taken is around 2.2 hours to finish. Any scope of optimization in here? What I have tried: SELECT sum(B.item_net_amount) net_amount, sum(B.item_gross_amount) gross_amount,...
6 Mar 2015 by BillWoodruff
http://azure.microsoft.com/en-...
1 Apr 2016 by Chendur Srinivasan
I have went through a lot of articles but I dont seem to get a perfectly clear answer on what exactly a BIG DATA is. In one page I saw "any data which is bigger for your usage, is big data i.e. 100 MB is considered big data for your mailbox but not your hard disc". Whereas another article said...
4 Mar 2016 by Chendur Srinivasan
I'm self learning Hadoop and started of with installing Cloudera QuickStart on a VMware Workstation running CENT OS.I was under the impression that Quickstart VM has most the of configurations predefined. Do I need to set up any other configurations to set up data and name node? Reason being...
27 Apr 2016 by Daniel Joubert
Comparing Requirements Engineering for Traditional and BigData Business Intelligence
20 Oct 2018 by DataBytzAI
How to transform raw data into actionable business insights with Azure Data Factory
13 Jul 2022 by E L 2022
I need help plotting some categorical and numerical Values in python. the code is given below: %%time import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns %%time ...
21 Jan 2017 by Fazlur Rahman
What is Big Data and how Hadoop been introduced to overcome the problems associated with Big Data?
13 Feb 2017 by Fazlur Rahman
Step by step procedure to install NetBeans on Ubuntu 16.04 operating system with Hadoop 2.7.3 version. This may work for any other versions of Hadoop and Ubuntu.
22 May 2022 by Fazlur Rahman
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system
7 Jul 2016 by GoodyGoodyGoody
I need advice on implementation of Data Lake. Any good references or examples of how to implement the data lake concept (tutorial) or pointing me to the right direction will suffice.Thanks in advanceWhat I have tried:I'm looking to set this up for my organization and I have no idea...
13 Mar 2015 by GoogleMonster
Hi All,I am trying to create OHLC data from un-homogenised data. I have googled and discovered an article at StackOverFlow How to group a time series by interval (OHLC bars) with LINQWhich to be honest I have found to be really useful. However, the results I get are not in line with...
3 Jan 2016 by Hadrich Mohamed
Working with hadoop in the big data domain is very interesting especially in the growth of data in this era. Within this tip, we are going to run our first big data example using the famous tool of Microsoft: HDInsight.
23 Feb 2018 by HusseinAl-haj
I have a question about the correct way to use the SMOTE sampling algorithm. I have been read a lot about this algorithm. I forced to use SMOTE within my code, so I can't use any tools like KNIME or WEKA. After few days in searching, I can say that there are two implementation of SMOTE, one in R...
10 Apr 2021 by Ilia Reznik, Vladimir Shatalov
Parser for Wikipedia pages from XML dump is presented. Extraction of biographical data and categories with their parents is shown as an example.
9 Apr 2021 by Ilia Reznik, Vladimir Shatalov
How to classify articles on Wikipedia using XML dump