Big Data Analytics with R and Hadoop
UNIT6:Understanding Big Data Analysis with Machine Learning
In this chapter, we are going to learn about different machine learning techniques that can be used with R and Hadoop to perform Big Data analytics with the help of the following points:
Introduction to machine learning
Types of machine-learning algorithms
Supervised machine-learning algorithms
Unsupervised machine-learning algorithms
Recommendation algorithms
Introduction to machine learning
Machine learning is a branch of artificial intelligence that allows us to make our application intelligent without being explicitly programmed. Machine learning concepts are used to enable applications to take a decision from the available datasets. A combination of machine learning and data mining can be used to develop spam mail detectors, self-driven cars, speech recognition, face recognition, and online transactional fraud-activity detection.
There are many popular organizations that are using machine-learning algorithms to make their service or product understand the need of their users and provide services as per their behavior. Google has its intelligent web search engine, which provides a number one search, spam classification in Google Mail, news labeling in Google News, and Amazon for recommender systems. There are many open source frameworks available for developing these types of applications/frameworks, such as R, Python, Apache Mahout, and Weka.
Types of machine-learning algorithms
There are three different types of machine-learning algorithms for intelligent system development:
Supervised machine-learning algorithms
Unsupervised machine-learning algorithms
Recommender systems
In this chapter, we are going to discuss well-known business problems with classification, regression, and clustering, as well as how to perform these machinelearning techniques over Hadoop to overcome memory issues.
If you load a dataset that wont be able to fit into your machine memories and you try to run it, the predictive analysis will throw an error related to machine memory, such as Error: cannot allocate vector of size 990.1 MB. The solution is to increase the machine configuration or parallelize with commodity hardware.
Supervised machine-learning algorithms
In this section, we will be learning about supervised machine-learning algorithms. The algorithms are as follows:
Linear regression
Logistic regression
Linear regression
Linear regression is mainly used for predicting and forecasting values based on historical information. Regression is a supervised machine-learning technique to identify the linear relationship between target variables and explanatory variables. We can say it is used for predicting the target variable values in numeric form. In the following section, we will be learning about linear regression with R and linear regression with R and Hadoop.
Here, the variables that are going to be predicted are considered as target variables and the variables that are going to help predict the target variables are called explanatory variables. With the linear relationship, we can identify the impact of a change in explanatory variables on the target variable.
In mathematics, regression can be formulated as follows:
Other formulae include:
The slope of the regression line is given by:
The intercept point of regression is given by:
Here, x and y are variables that form a dataset and N is the total numbers of values. Suppose we have the data shown in the following table:
x |
y |
63 |
3.1 |
64 |
3.6 |
65 |
3.8 |
66 |
4 |
If we have a new value of x, we can get the value of y with it with the help of the regression formula.
Applications of linear regression include:
Sales forecasting
Predicting optimum product price
Predicting the next online purchase from various sources and campaigns
Lets look at the statistical technique to implement the regression model for the provided dataset. Assume that we have been given n number of statistical data units.
Its formula is as follows:
Here, Y is the target variable (response variable), xi are explanatory variables, and e0 is the sum of the squared error term, which can be considered as noise. To get a more accurate prediction, we need to reduce this error term as soon as possible with the help of the call function.
Linear regression with R
Now we will see how to perform linear regression in R. We can use the in-built lm() method to build a linear regression model with R.
Model lt;- lm(target ~ ex_var1, data=train_dataset)
It will build a regression model based on the property of the provided dataset and store all of the variables coefficients and model parameters used for predicting and identifying of data pattern from the model variable values.
# Defining data variables
X = matrix(rnorm(2000), ncol = 10)
y = as.matrix(rnorm(200))
# Bundling data variables into dataframe
train_data lt;- data.frame(X,y)
# Training model for generating prediction
lmodellt;- lm(y~ train_data $X1 train_data $X2 train_data $X3 train_data $X4 train_data $X5 train_data $X6 train_data $X7 train_data $X8 train_data $X9 train_data $X10,data= train_data)
summary(lmodel)
The following are the various model parameters that can be displayed with the preceding summary command:
RSS: This is equal to .
Degrees of Freedom (DOF): This is used for identifying the degree of fit for the prediction model, which should be as small as possible (log
剩余内容已隐藏,支付完成后下载完整资料
设计(论文)题目:大数据环境下的数据挖掘及其应用研究
Big Data Analytics with R and Hadoop
UNIT6:Understanding Big Data Analysis with Machine Learning
In this chapter, we are going to learn about different machine learning techniques that can be used with R and Hadoop to perform Big Data analytics with the help of the following points:
Introduction to machine learning
Types of machine-learning algorithms
Supervised machine-learning algorithms
Unsupervised machine-learning algorithms
Recommendation algorithms
Introduction to machine learning
Machine learning is a branch of artificial intelligence that allows us to make our application intelligent without being explicitly programmed. Machine learning concepts are used to enable applications to take a decision from the available datasets. A combination of machine learning and data mining can be used to develop spam mail detectors, self-driven cars, speech recognition, face recognition, and online transactional fraud-activity detection.
There are many popular organizations that are using machine-learning algorithms to make their service or product understand the need of their users and provide services as per their behavior. Google has its intelligent web search engine, which provides a number one search, spam classification in Google Mail, news labeling in Google News, and Amazon for recommender systems. There are many open source frameworks available for developing these types of applications/frameworks, such as R, Python, Apache Mahout, and Weka.
Types of machine-learning algorithms
There are three different types of machine-learning algorithms for intelligent system development:
Supervised machine-learning algorithms
Unsupervised machine-learning algorithms
Recommender systems
In this chapter, we are going to discuss well-known business problems with classification, regression, and clustering, as well as how to perform these machinelearning techniques over Hadoop to overcome memory issues.
If you load a dataset that wont be able to fit into your machine memories and you try to run it, the predictive analysis will throw an error related to machine memory, such as Error: cannot allocate vector of size 990.1 MB. The solution is to increase the machine configuration or parallelize with commodity hardware.
Supervised machine-learning algorithms
In this section, we will be learning about supervised machine-learning algorithms. The algorithms are as follows:
Linear regression
Logistic regression
Linear regression
Linear regression is mainly used for predicting and forecasting values based on historical information. Regression is a supervised machine-learning technique to identify the linear relationship between target variables and explanatory variables. We can say it is used for predicting the target variable values in numeric form. In the following section, we will be learning about linear regression with R and linear regression with R and Hadoop.
Here, the variables that are going to be predicted are considered as target variables and the variables that are going to help predict the target variables are called explanatory variables. With the linear relationship, we can identify the impact of a change in explanatory variables on the target variable.
In mathematics, regression can be formulated as follows:
Other formulae include:
The slope of the regression line is given by:
The intercept point of regression is given by:
Here, x and y are variables that form a dataset and N is the total numbers of values. Suppose we have the data shown in the following table:
x |
y |
63 |
3.1 |
64 |
3.6 |
65 |
3.8 |
66 |
4 |
If we have a new value of x, we can get the value of y with it with the help of the regression formula.
Applications of linear regression include:
Sales forecasting
Predicting optimum product price
Predicting the next online purchase from various sources and campaigns
Lets look at the statistical technique to implement the regression model for the provided dataset. Assume that we have been given n number of statistical data units.
Its formula is as follows:
Here, Y is the target variable (response variable), xi are explanatory variables, and e0 is the sum of the squared error term, which can be considered as noise. To get a more accurate prediction, we need to reduce this error term as soon as possible with the help of the call function.
Linear regression with R
Now we will see how to perform linear regression in R. We can use the in-built lm() method to build a linear regression model with R.
Model lt;- lm(target ~ ex_var1, data=train_dataset)
It will build a regression model based on the property of the provided dataset and store all of the variables coefficients and model parameters used for predicting and identifying of data pattern from the model variable values.
# Defining data variables
X = matrix(rnorm(2000), ncol = 10)
y = as.matrix(rnorm(200))
# Bundling data variables into dataframe
train_data lt;- data.frame(X,y)
# Training model for generating prediction
lmodellt;- lm(y~ train_data $X1 train_data $X2 train_data $X3 train_data $X4 train_data $X5 train_data $X6 train_data $X7 train
剩余内容已隐藏,支付完成后下载完整资料
资料编号:[234907],资料为PDF文档或Word文档,PDF文档可免费转换为Word
以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。
您可能感兴趣的文章
- 质量管理体系:确保全面质量管理的一个急需的工具外文翻译资料
- 识别MOBA游戏中具有预测性的胜利团战模式外文翻译资料
- 曲线拟合和最小二乘法来推断埃塞俄比亚COVID-19病例状态外文翻译资料
- 欧洲区域政策与欧洲区域社会经济多样性:多元分析外文翻译资料
- 公共企业资源规划公司估值的关键指标和关键驱动因素外文翻译资料
- 结构方程建模中模型评估的统一方法外文翻译资料
- Fisher线性判别函数的“朴素贝叶斯”,以及变量多于观测 值情况下的一些替代方法外文翻译资料
- 变量对于分类的贡献外文翻译资料
- 多时间尺度自相关和交互相关多元分位数投影变换偏差订正降尺度模型外文翻译资料
- 与可交换性结合时随机缺失和相关定义的注释外文翻译资料