Word文档格式规范检查系统外文翻译资料

 2022-08-25 21:24:11

Apache POI - HWPF and XWPF - Java API to Handle Microsoft Word Files

Abstract

HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java. It also provides limited read only support for the older Word 6 and Word 95 file formats.

The partner to HWPF for the new Word 2007 .docx format is XWPF. Whilst HWPF and XWPF provide similar features, there is not a common interface across the two of them at this time.

Both HWPF and XWPF could be described as 'moderately functional'. For some use cases, especially around text extraction, support is very strong. For others, support may be limited or incomplete, and it may be necessary to dig down into low-level code. Error checking may be missing in places, so it may be possible to accidentally generate invalid files. Enhancements to fix such things are generally very well received!

1 Introduction

Source in the org.apache.poi.hwpf.model tree is the Java representation of internal Word format structure. This code is 'internal', it shall not be used by your code. Code from org.apache.poi.hwpf.usermodel package is actual public and user-friendly (as much as possible) API to access document parts. Source code in the org.apache.poi.hwpf.extractor tree is a wrapper of this to facilitate easy extraction of interesting things (e.g. the Text), and org.apache.poi.hwpf.converter package contains Word-to-HTML and Word-to-FO converters (latest can be used to generate PDF from Word files when using with Apache FOP ). Also there is a small file-structure-dumping utility in org.apache.poi.hwpf.dev package, primally for developing purposes.

The main entry point to HWPF is HWPFDocument. Currently it has a lot of references both to internal interfaces ( org.apache.poi.hwpf.model package) and public API ( org.apache.poi.hwpf.usermodel ) package. It is possible that it will be split into two different interfaces (like WordFile and WordDocument) in later versions.

The main entry point to XWPF is XWPFDocument. From there, you can get the paragraphs, pictures, tables, sections, headers etc.

Currently, there are only a handful of example programs using HWPF and XWPF available. They can be found in svn in the examples section, under HWPF and XWPF. Both HWPF and XWPF have fairly high levels of unit test coverage, which provides examples of using the various areas of functionality of both modules. These can be found in svn, under HWPF and XWPF. Contributions of more examples, whether inspired by the unit tests or not, would be most welcomed!

2 HWPF Notes

A .doc Word document, as handled by HWPF, can be considered as very long single text buffer. The HWPF API provides 'pointers' to document parts, like sections, paragraphs and character runs. Usually user will iterates over main document part sections, paragraphs from sections and character runs from paragraph. Each such interface is a pointer to document text subrange along with additional properties (and they all extends same Range parent class). There is additional Range implementations like Table, TableRow, TableCell, etc. Some structures like Bookmark or Field can also provide subranges pointers.

Changing file content usually requires a lot of synchronized changes in those structures like updating property boundaries, position handlers, etc. Because of that HWPF API shall be considered as not thread safe. In addition, there is a 'one pointer' rule for changing content. It means you should not use two different Range instances at one time. More precisely, if you are changing file content using some range pointer, all other range pointers except parents ones become invalid. For example if you obtain overall range (1), paragraph range (2) from overall range and character run range (3) from paragraph range and change text of paragraph, character run range is now invalid and should not be used, but overall range pointer still valid. Each time you obtaining range (pointer) new instance is created. It means if you obtained two range pointers and changed document text using first range pointer, second one became invalid.

XWPF Patches Required!

At the moment, XWPF covers many common use cases for reading and writing .docx files. Whilst this is a great thing, it does mean that XWPF does everything that the current POI committers need it to do, and so none of the committers are actively adding new features.

If you come across a feature in XWPF that you need, and isnt currently there, please do send in a patch to add the extra functionality! More details on contributing patches are available on the 'Contribution to POI' page.

HWPF Patches Required!

At the moment we unfortunately do not have someone taking care for HWPF and fostering its development. What we need is someone to stand up, take this thing under his hood as his baby and push it forward. Ryan Ackley, who put a lot of effort into HWPF, is no longer on board, so HWPF is an orphan child waiting to be adopted.

If you are interested in becoming the new HWPF pointman, you should look into the Microsoft Word internals. A good starting point seems to be Ryan Ackleysoverview 剩余内容已隐藏,支付完成后下载完整资料


英语译文共 8 页,剩余内容已隐藏,支付完成后下载完整资料


资料编号:[484678],资料为PDF文档或Word文档,PDF文档可免费转换为Word

原文和译文剩余内容已隐藏,您需要先支付 30元 才能查看原文和译文全部内容!立即支付

以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。