对新闻评论进行情绪分析的混合方法外文翻译资料

 2022-12-24 17:02:21

A Hybrid Approach to Sentiment Analysis of News Comments

Addlight Mukwazvure1, K.P Supreethi2 1,2Department of Computer Science and Engineering JNTU College of Engineering Hyderabad Kukatpally, Hyderabad - 500 085, Telangana, India



Abstract— Today, the web hosts quite a voluminous amount of information. Among such information is user generated content which plays an important role in analyzing different business aspects. Sentiment analysis therefore becomes an effective way of understanding public opinions. Businesses, particularly in e- commerce, stock market, social networks and also political entities can use sentiment analysis for decision making. Traditional methods of opinion gathering involved the use of questioners and interviews which solely depend on the good will of the people to be interviewed. Most research on sentiment analysis focused on social networks, product reviews and also on the stock market. Less research has been covered on analysis of news comments. This research embarks on a hybrid approach to sentiment analysis of news comments which involves using sentiment lexicon for polarity detection (polarity will be classified as positive, negative and neutral). The results from the lexicon based method are then used to train machine learning algorithms. Two algorithms employed in this research are the Support Vector Machine (SVM) and K-Nearest Neighbour (kNN). Experimental results show that SVM performs better than kNN on news comments.

KeywordsUser generated content, sentiment analysis, sentiment lexicon, polarity, SVM, kNN


The rapid increase in web 2.0 applications has seen a vast amount of information available on the web today. Users can now give their perception concerning an entity or service on the web. Such user generated content can be of value to various organizations. Finding ways therefore, to mine such content becomes vital in this web era. One such way of mining user opinion is known as Sentiment Analysis, also known as Opinion Mining. These two terms have been used interchangeably but [1] highlights a slight difference between the two. Opinion mining can be defined as a means to understand the peoplersquo;s emotions, attitudes and perceptions about a service or entity whereas sentiment analysis finds opinions, identify the sentiment expressed in the text and then classify its polarity. For this reason sentiment analysis has been defined as a classification problem[2].

Sentiment analysis finds itrsquo;s applications in many areas among which business and politics are not exempted. By understanding public views and feelings about an entity,

businesses can tailor make their services to meet public demands. Consumers on the other hand find it easier to make purchasing decisions. Politicians can also determine the level of support they have and can consequently measure the effectiveness of their policies.

Up to now sentiment analysis has been limited to a single domain, with research on cross domain sentiment analysis still ongoing. A large number of previous works on sentiment analysis majored on highly subjective texts like product reviews, movie reviews and twitter data, however sentiment analysis has also found its way in newsrooms. Taking into cognizance that in product reviews and tweets, the author of the text is the opinion giver, classification is somehow different when dealing with news. News articles are generally objective and what determines the audiencersquo;s reaction and feelings about a particular article is not deduced from the article but from the comments the commentators give regarding the issue addressed in the article. These comments can provide information to the news agents on how the public perceive their coverage. It can help them to know information like quality of their work, coverage expected by users and also editorial issues. Instead of manually reading through every comment on the web, automatic classification of the comments as positive or negative will therefore be valuable information to the entity in question.

The rest of this paper is organized as follows: Section II presents related work. Section III described the proposed system framework while Section IV describes the general system overview of the hybrid approach. Section V presents experimental evaluation and analysis. Section VI concludes the paper.


There are basically two main approaches to sentiment analysis which are lexicon-based approach and machine learning approach [3]. The lexicon based approach, unlike in machine learning, do not require the storage of a large corpus of data. It utilizes lexicon or dictionaries to calculate the semantic orientation of a document. Semantic Orientation (SO) is a measure of subjectivity and opinion in text and it captures polarity and strength of words or phrases [2], [3]. Each wordrsquo;s SO determines the overall sentiment orientation of the document[4]. Opinion lexicon can either be manually or

automatically created. Machine learning methods consist of supervised and unsupervised learning. Unsupervised learning methods do not require labelled data for classification while supervised learning algorithms require a labelled corpus for training the classifier [5]. There are a number of algorithms that can be used in supervised learning. The challenge with this method is that we do not always have well defined data.

There is quite a number of research on sentiment analysis on the news with most research centred on news articles. Machine learning supervised approach was implemented in Sentiment Classification for Online



题 目 对新闻评论进行情绪分析的混合方法

作 者 Addlight Mukwazvurel,K

发表时间_____ 2015年_______

二O 一九 年 四 月 十五 日


关键词:用户生成内容,情绪分析,情绪词典,极性,SVM, kNN





2. 相关工作



提出了一种利用ConceptNet、SenticNet等常识性知识库对MPQA语料库新闻文章进行情感分析的感知计算方法。本研究的目的是为了实现句子层次的情感分析。该意见引擎由语义分析器、情绪分析器和SenticNet数据库副本组成。语义分析器用于从每个句子中提取常识性概念。然后,语义分析器将这些概念与SenticNet中的感知向量进行匹配。感知向量只描述句子中的情感,而不描述句子的极性。然后使用极性测量将一个感知向量转换成极性分数在-1.0到 1.0之间。每个概念的感知能力向量都是基于情感的沙漏,它将情感分为四类,分别是愉悦感、能力倾向、注意力和敏感度。




3.1 观点词汇和情感计算

在预处理之后,我们使用一个情感字典来给文本分配极性。使用sentiment lexicon为评论分配情绪标签,而不是手工注释新闻评论。我们使用了AFFIN-111单词列表。AFFIN是由Finn Arup Nielsen开发的情感词汇。单词列表包含2477个单词和短语,情感得分在-5到 5之间。根据词典中各个单词的得分,判断一个评论的情绪如下:


Sentiment是一条评论的总体情绪,len (comments)是一条评论的总字数和S


例如这条评论:“So?”无论如何,这部电影很糟糕。”这条评论的sentiment是句子中每个单词的sentiment score之和,len(句子)= 6。为了得到该文档的情感,在本例中,我们将单个评论的情感得分相加,然后使用该特定文档的评论总数对它们进行权衡,如下所示:


情绪是一篇文章的整体情绪,S是每条评论的情绪,len (comments)是评论总数


1:如果情绪gt;1,那么标签是 1




3.2 特征提取与选择


特征加权方法有存在性、项频(TF)和TF- idf等。






TF-IDF是我们在这项工作中使用的方法。它用于度量一个单词对集合或语料库中的文档的重要性。重要性随单词在文档中出现的次数成比例增加,但被单词在语料库中的出现频率所抵消。某一特征的Tf- Idf值计算如下式所示:




在统计chi;sup2;是用来测试独立事件之间。这可以用数学方法表示为:P (A|B) = P (A)和P (B|A) = P (B),其中A和B分别表示一个项的出现和类的出现。而Tf-Idf显示的重量特性,chi;sup2;我们设法选择最好的特征分类模型。我们利用sklearn特征选择模块,我们计算了chi;sup2;统计每个类/功能组合。

3.3 分类




5. 结论

新闻评论的分类是一个有点挑战性的,因为存在非正式的语言。实验表明,支持向量机的性能优于k- nn故障,KNN不能识别第三类随着k的增加,可以归结为中性文章数量较少。因此,较小的数据集会导致较差的分类器性能。大多数工作通常集中在两个类,即积极类和消极类,因此引入第三类也会对分类结果产生负面影响。在讨论的三个部分中,技术部分找到了两个分类器的最佳分类结果。我们还观察到使用三个类进行分类对结果有很大的影响。中性类涉及的三个类别的文章和评论,通常较少,因此中性类的分类非常差。




