`
xyliufeng
  • 浏览: 86044 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

pdfbox 例子

阅读更多
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;



public class Tpdf2 {
	/**
	 * simply reader all the text from a pdf file. 
	 * You have to deal with the format of the output text by yourself.
	 * 2008-2-25
	 * @param pdfFilePath file path
	 * @return all text in the pdf file
	 */
	
	public static void main(String[] arg)
	{
		System.out.println(Tpdf2.getTextFromPDF("E:/person/pdf/sp.pdf"));
	}
	
	public static String getTextFromPDF(String pdfFilePath) {
		String result = null;
		FileInputStream is = null;
		PDDocument document = null;
		try {
			is = new FileInputStream(pdfFilePath);
			PDFParser parser = new PDFParser(is);
			parser.parse();
			document = parser.getPDDocument();
			PDFTextStripper stripper = new PDFTextStripper();
			result = stripper.getText(document);
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally {
			if (is != null) {
				try {
					is.close();
				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
			}
			if (document != null) {
				try {
					document.close();
				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
			}
		}
		return result;
	}
}



pdfbox 下载:http://pdfbox.apache.org/download.html
分享到:
评论
1 楼 antony102201 2011-12-12  
用这个读取的时候有几个PDF文件读不出来,请问是程序有什么要提高的还是PDF文件本身有问题啊,这个PDF文件是可以看的,如果这个PDF文件本身有什么问题,怎么看啊

相关推荐

Global site tag (gtag.js) - Google Analytics