V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
Ewig
V2EX  ›  问与答

Python pdf 处理

  •  
  •   Ewig · 2019-01-14 11:12:50 +08:00 · 998 次点击
    这是一个创建于 1901 天前的主题,其中的信息可能已经有所发展或是发生改变。
    我在处理 pdf 里面的一些数据 报错如下,下面的意思我需要改这个软件的源码 int 为 byte ?


    读取的文件为 /home/shenjianlin/pdf_file/qimingpian_pdf/无线医疗白皮书-12 页.pdf
    Traceback (most recent call last):
    File "remove_water_mark.py", line 90, in <module>
    remove_water_mark().read_content()
    File "remove_water_mark.py", line 52, in read_content
    for i in range(0, pdf.getNumPages()):
    File "/usr/lib64/python3.4/site-packages/PyPDF2/pdf.py", line 1155, in getNumPages
    self._flatten()
    File "/usr/lib64/python3.4/site-packages/PyPDF2/pdf.py", line 1505, in _flatten
    catalog = self.trailer["/Root"].getObject()
    File "/usr/lib64/python3.4/site-packages/PyPDF2/generic.py", line 511, in __getitem__
    return dict.__getitem__(self, key).getObject()
    File "/usr/lib64/python3.4/site-packages/PyPDF2/generic.py", line 178, in getObject
    return self.pdf.getObject(self).getObject()
    File "/usr/lib64/python3.4/site-packages/PyPDF2/pdf.py", line 1599, in getObject
    idnum, generation = self.readObjectHeader(self.stream)
    File "/usr/lib64/python3.4/site-packages/PyPDF2/pdf.py", line 1667, in readObjectHeader
    return int(idnum), int(generation)
    ValueError: invalid literal for int() with base 10: b'bj'
    第 1 条附言  ·  2019-01-14 13:20:07 +08:00
    import PyPDF2
    for i in range(0, pdf.getNumPages()):
    if i < 3:
    Num_page_content = pdf.getPage(i)
    print(Num_page_content)
    if Num_page_content.get('/Resources'):
    page_resource = Num_page_content['/Resources']
    if page_resource.get('/XObject'):
    xobject = page_resource['/XObject']
    form = None
    for item in xobject:
    if item.startswith('/FormXob'):
    if not flag:
    flag = True
    form = item
    if form:
    print('remove water mark in page: {}'.format(i))
    xobject.pop(form)
    pdf_output.addPage(Num_page_content)
    else:
    pdf_output.addPage(pdf.getPage(i))
    目前尚无回复
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   我们的愿景   ·   实用小工具   ·   5386 人在线   最高记录 6543   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 30ms · UTC 08:37 · PVG 16:37 · LAX 01:37 · JFK 04:37
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.