大数据环境下的数据挖掘及其应用研究外文翻译资料

 2022-08-14 16:05:16

Big Data Analytics with R and Hadoop

UNIT6:Understanding Big Data Analysis with Machine Learning

In this chapter, we are going to learn about different machine learning techniques that can be used with R and Hadoop to perform Big Data analytics with the help of the following points:

Introduction to machine learning

Types of machine-learning algorithms

Supervised machine-learning algorithms

Unsupervised machine-learning algorithms

Recommendation algorithms

Introduction to machine learning

Machine learning is a branch of artificial intelligence that allows us to make our application intelligent without being explicitly programmed. Machine learning concepts are used to enable applications to take a decision from the available datasets. A combination of machine learning and data mining can be used to develop spam mail detectors, self-driven cars, speech recognition, face recognition, and online transactional fraud-activity detection.

There are many popular organizations that are using machine-learning algorithms to make their service or product understand the need of their users and provide services as per their behavior. Google has its intelligent web search engine, which provides a number one search, spam classification in Google Mail, news labeling in Google News, and Amazon for recommender systems. There are many open source frameworks available for developing these types of applications/frameworks, such as R, Python, Apache Mahout, and Weka.

Types of machine-learning algorithms

There are three different types of machine-learning algorithms for intelligent system development:

Supervised machine-learning algorithms

Unsupervised machine-learning algorithms

Recommender systems

In this chapter, we are going to discuss well-known business problems with classification, regression, and clustering, as well as how to perform these machinelearning techniques over Hadoop to overcome memory issues.

If you load a dataset that wont be able to fit into your machine memories and you try to run it, the predictive analysis will throw an error related to machine memory, such as Error: cannot allocate vector of size 990.1 MB. The solution is to increase the machine configuration or parallelize with commodity hardware.

Supervised machine-learning algorithms

In this section, we will be learning about supervised machine-learning algorithms. The algorithms are as follows:

Linear regression

Logistic regression

Linear regression

Linear regression is mainly used for predicting and forecasting values based on historical information. Regression is a supervised machine-learning technique to identify the linear relationship between target variables and explanatory variables. We can say it is used for predicting the target variable values in numeric form. In the following section, we will be learning about linear regression with R and linear regression with R and Hadoop.

Here, the variables that are going to be predicted are considered as target variables and the variables that are going to help predict the target variables are called explanatory variables. With the linear relationship, we can identify the impact of a change in explanatory variables on the target variable.

In mathematics, regression can be formulated as follows:

Other formulae include:

The slope of the regression line is given by:

The intercept point of regression is given by:

Here, x and y are variables that form a dataset and N is the total numbers of values. Suppose we have the data shown in the following table:

x

y

63

3.1

64

3.6

65

3.8

66

4

If we have a new value of x, we can get the value of y with it with the help of the regression formula.

Applications of linear regression include:

Sales forecasting

Predicting optimum product price

Predicting the next online purchase from various sources and campaigns

Lets look at the statistical technique to implement the regression model for the provided dataset. Assume that we have been given n number of statistical data units.

Its formula is as follows:

Here, Y is the target variable (response variable), xi are explanatory variables, and e0 is the sum of the squared error term, which can be considered as noise. To get a more accurate prediction, we need to reduce this error term as soon as possible with the help of the call function.

Linear regression with R

Now we will see how to perform linear regression in R. We can use the in-built lm() method to build a linear regression model with R.

Model lt;- lm(target ~ ex_var1, data=train_dataset)

It will build a regression model based on the property of the provided dataset and store all of the variables coefficients and model parameters used for predicting and identifying of data pattern from the model variable values.

# Defining data variables

X = matrix(rnorm(2000), ncol = 10)

y = as.matrix(rnorm(200))

# Bundling data variables into dataframe

train_data lt;- data.frame(X,y)

# Training model for generating prediction

lmodellt;- lm(y~ train_data $X1 train_data $X2 train_data $X3 train_data $X4 train_data $X5 train_data $X6 train_data $X7 train_data $X8 train_data $X9 train_data $X10,data= train_data)

summary(lmodel)

The following are the various model parameters that can be displayed with the preceding summary command:

RSS: This is equal to .

Degrees of Freedom (DOF): This is used for identifying the degree of fit for the prediction model, which should be as small as possible (log

剩余内容已隐藏,支付完成后下载完整资料


设计(论文)题目:大数据环境下的数据挖掘及其应用研究

Big Data Analytics with R and Hadoop

UNIT6:Understanding Big Data Analysis with Machine Learning

In this chapter, we are going to learn about different machine learning techniques that can be used with R and Hadoop to perform Big Data analytics with the help of the following points:

Introduction to machine learning

Types of machine-learning algorithms

Supervised machine-learning algorithms

Unsupervised machine-learning algorithms

Recommendation algorithms

Introduction to machine learning

Machine learning is a branch of artificial intelligence that allows us to make our application intelligent without being explicitly programmed. Machine learning concepts are used to enable applications to take a decision from the available datasets. A combination of machine learning and data mining can be used to develop spam mail detectors, self-driven cars, speech recognition, face recognition, and online transactional fraud-activity detection.

There are many popular organizations that are using machine-learning algorithms to make their service or product understand the need of their users and provide services as per their behavior. Google has its intelligent web search engine, which provides a number one search, spam classification in Google Mail, news labeling in Google News, and Amazon for recommender systems. There are many open source frameworks available for developing these types of applications/frameworks, such as R, Python, Apache Mahout, and Weka.

Types of machine-learning algorithms

There are three different types of machine-learning algorithms for intelligent system development:

Supervised machine-learning algorithms

Unsupervised machine-learning algorithms

Recommender systems

In this chapter, we are going to discuss well-known business problems with classification, regression, and clustering, as well as how to perform these machinelearning techniques over Hadoop to overcome memory issues.

If you load a dataset that wont be able to fit into your machine memories and you try to run it, the predictive analysis will throw an error related to machine memory, such as Error: cannot allocate vector of size 990.1 MB. The solution is to increase the machine configuration or parallelize with commodity hardware.

Supervised machine-learning algorithms

In this section, we will be learning about supervised machine-learning algorithms. The algorithms are as follows:

Linear regression

Logistic regression

Linear regression

Linear regression is mainly used for predicting and forecasting values based on historical information. Regression is a supervised machine-learning technique to identify the linear relationship between target variables and explanatory variables. We can say it is used for predicting the target variable values in numeric form. In the following section, we will be learning about linear regression with R and linear regression with R and Hadoop.

Here, the variables that are going to be predicted are considered as target variables and the variables that are going to help predict the target variables are called explanatory variables. With the linear relationship, we can identify the impact of a change in explanatory variables on the target variable.

In mathematics, regression can be formulated as follows:

Other formulae include:

The slope of the regression line is given by:

The intercept point of regression is given by:

Here, x and y are variables that form a dataset and N is the total numbers of values. Suppose we have the data shown in the following table:

x

y

63

3.1

64

3.6

65

3.8

66

4

If we have a new value of x, we can get the value of y with it with the help of the regression formula.

Applications of linear regression include:

Sales forecasting

Predicting optimum product price

Predicting the next online purchase from various sources and campaigns

Lets look at the statistical technique to implement the regression model for the provided dataset. Assume that we have been given n number of statistical data units.

Its formula is as follows:

Here, Y is the target variable (response variable), xi are explanatory variables, and e0 is the sum of the squared error term, which can be considered as noise. To get a more accurate prediction, we need to reduce this error term as soon as possible with the help of the call function.

Linear regression with R

Now we will see how to perform linear regression in R. We can use the in-built lm() method to build a linear regression model with R.

Model lt;- lm(target ~ ex_var1, data=train_dataset)

It will build a regression model based on the property of the provided dataset and store all of the variables coefficients and model parameters used for predicting and identifying of data pattern from the model variable values.

# Defining data variables

X = matrix(rnorm(2000), ncol = 10)

y = as.matrix(rnorm(200))

# Bundling data variables into dataframe

train_data lt;- data.frame(X,y)

# Training model for generating prediction

lmodellt;- lm(y~ train_data $X1 train_data $X2 train_data $X3 train_data $X4 train_data $X5 train_data $X6 train_data $X7 train

剩余内容已隐藏,支付完成后下载完整资料


资料编号:[234907],资料为PDF文档或Word文档,PDF文档可免费转换为Word

原文和译文剩余内容已隐藏,您需要先支付 30元 才能查看原文和译文全部内容!立即支付

以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。