Python验证码模块tesserocr安装

科技2025-05-26 123

参考https://www.jianshu.com/p/4c15bdda55d9

文章目录

1. 去github下载whl文件2. 安装tessersct

1. 去github下载whl文件

按照官方文档https://github.com/sirfz/tesserocr的描述，我们可以在github中下载whl文件点入链接，下载对应自己python和电脑版本的whl文件。如我的是3.6+64位。注意右上角tesseract的版本号，一会儿要用到。下载完毕后，直接在该目录下pip安装whl即可。这里提供一个验证码，供测试用https://img-blog.csdnimg.cn/20201008111402909.jpg。

运行官方文档示例，会发现报错

import tesserocr from PIL import Image print(tesserocr.tesseract_version()) # print tesseract-ocr version print(tesserocr.get_languages()) # prints tessdata path and list of available languages image = Image.open('20201008111402909.jpg') print(tesserocr.image_to_text(image)) # print ocr text from image # or print(tesserocr.file_to_text('20201008111402909.jpg')) output >> RuntimeError: Failed to init API, possibly an invalid tessdata path: d:\python\python36\/tessdata/

这是因为我们缺少相关的语言素材包，这时候就需要安装tessersct了。

2. 安装tessersct

下载地址为https://digi.bib.uni-mannheim.de/tesseract/。其中dev为开发版本，alpha, beta为测试版本，因此最好选择不带有这两个字眼的稳定版本。记得一定要选择合适系统的版本，区分32位和64位

根据下载时选择的whl版本所对应的tesseract版本号，下载对应的版本

安装的时候选择Additional language data(download），这样就会下载tessdata文件，里面包含了全部的语言包。在安装目录下把语言包复制到python的安装路径中接下来运行代码，就不会报错了。

官方https://github.com/tesseract-ocr还提供了很多的语言包供我们选择，如tessdata_best, tessdata_fast,tessdata，都是可以在仓库中找到的。

https://github.com/tesseract-ocr/tessdata_besthttps://github.com/tesseract-ocr/tessdata_fasthttps://github.com/tesseract-ocr/tessdata

Processed: 0.016, SQL: 9