当知识库不够用时:使用外部文本数据对知识库进行问答外文翻译资料

 2023-01-29 15:58:33

英语原文共 10 页,剩余内容已隐藏,支付完成后下载完整资料


毕业设计(论文)

外文资料翻译

说明:

封面之后放英文原文;英文原文之后放译文。

原文:

Date:2016

When a Knowledge Base Is Not Enough: Question Answering over Knowledge Bases with External Text Data

ABSTRACT

One of the major challenges for automated question answering over Knowledge Bases (KBQA) is translating a natural language question to the Knowledge Base (KB) entities and predicates. Previous systems have used a limited amount of training data to learn a lexicon that is later used for question answering. This approach does not make use of other potentially relevant text data, outside the KB, which could supplement the available information. We introduce a new system, Text2KB, that enriches question answering over a knowledge base by using external text data. Specifically, we revisit different phases in the KBQA process and demonstrate that text resources improve question interpretation, candidate generation and ranking. Building on a state-of-the-art traditional KBQA system, Text2KB utilizes web search results, community question answering and general text document collection data, to detect question topic entities, map question phrases to KB predicates, and to enrich the features of the candidates derived from the KB. Text2KB significantly improves performance over the baseline KBQA method, as measured on a popular WebQuestions dataset.

1.INTRODUCTION

It has long been recognized that searchers prefer concise and speciefic answers, rather than lists of document results. In particular, factoid questions have been an active focus of research for decades due to both practical importance and

relatively objective evaluation criteria. As an important example, a large proportion of Web search queries are looking for entities or their attributes [19], a setting on which we focus in this work.

Two relatively separate approaches for Question Answering (QA) have emerged: text-centric, or Text-QA and knowledge base-centric, or KBQA. In the more traditional, Text-QA approach, systems use text document collections to

retrieve passages relevant to a question and extract candidate answers [14]. Unfortunately, a passage of text provides a limited amount of information about the mentioned entities, which has to be inferred from the context. The KBQA

approach, which evolved from the database community, relies on large scale knowledge bases, such as DBpedia [1],Freebase [9], WikiData [24] and others, which store a vast amount of general knowledge about dicrarr;erent kinds of enti-

ties. This information, encoded as [subject, predicate,object] RDF triples, can be effectively queried using structured query languages, such as SPARQL.

Both approaches eventually deal with natural language questions, in which information needs are expressed by the users. While question understanding is di cult in itself, this setting is particularly challenging for KBQA systems, as it

requires a translation of a text question into a structured query language, which is complicated because of the complexity of a KB schema, and many dicrarr;erences between natural language and knowledge representations. For example,

Figure 1 shows a SPARQL query that retrieves the answer to a relatively simple question “who was the president of the Dominican Republic in 2010?” from Freebase.

Figure 1: SPARQL query to retrieve the answer to the question “who was the president of the dominican

SELECT DISTINCT ?name {

:m. 0 2 7 rn : government . g o v e r n m e n t a l j u r i s d i c t i o n . g o v e r n i n g o f f i c i a l s ? g o v p o s i t i o n .

? g o v p o s i t i o n : government . g o v e r n m e n t p o s i t i o n h e l d . b a s i c t i t l e :m. 0 6 0 c4 .

? g o v p o s i t i o n : government . g o v e r n m e n t p o s i t i o n h e l d . o f f i c e h o l d e r ? p r e s i d e n t .

? g o v p o s i t i o n : government . g o v e r n m e n t p o s i t i o n h e l d . from ? f r o m d a t e .

? g o v p o s i t i o n : government . g o v e r n m e n t p o s i t i o n h e l d . t o ? t o d a t e .

FILTER ( xsd : d a t e ( ? f r o m d a t e ) lt;= ”2 0 1 0 ”circ; circ; xsd : d a t e AND xsd : d a t e ( ? t o d a t e ) gt;= ”2 0 1 0 ”circ; circ; xsd : d a t e )

? p r e s i d e n t : t y p e . o b j e c t . name ?name

}

republic in 2010?” from Freebase

KBQA systems must address three challenges, namely question entity identification (to anchor the query process);candidate answer generation; and candidate ranking. We will show that these challenges can be alleviated by the appropriate use of external textual data. Entity identification seeds the answer search process, and therefore the performance of the whole system greatly depends on this stage

[28]. Question text is often quite short, may contain typos and other problems, that complicate entity linking. Existing approaches are usually based on dictionaries that contain entity names, aliases and some other phrases, used to refer to the entities [21]. These dictionaries are noisy and incomplete, e.g., to answer the question “what year did tut became king?” a system needs to detect a mention “tut”, which refers to the entity Tutankhamun. If a dictionary doesnrsquo;t contain a mapping “tut” ! Tutankhamun, as happens for one of the state of the art systems, it will not be able to answer the question correctly. Such less popular name variations are often used along with full names inside text documents, for

example, to avoid repetitions. Therefore, we propose to look into web search results to find variations of question entity n

剩余内容已隐藏,支付完成后下载完整资料


资料编号:[254058],资料为PDF文档或Word文档,PDF文档可免费转换为Word

原文和译文剩余内容已隐藏,您需要先支付 30元 才能查看原文和译文全部内容!立即支付

以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。