使用谷歌 Gemini 大模型推理识别验证码

2024-02-18码农

在开发爬虫过程中，有些网站需要验证码通过后才能进入网页，目的很简单，就是区分人类访问和机器爬虫。验证码问题看似简单，但要做到高准确率却并不容易。

在大模型的浪潮中，最受欢迎的莫过于 Gemini 1.5 和 OpenAI Sora，Gemini 的优势在于 免费的 API 和处理超长上下文的能力 ，其出色的图片识别效果赢得了用户的青睐。无论是细腻的纹理还是复杂的场景，Gemini 都能准确捕捉并解读其中的信息。

结合上述业务逻辑， Gemini 可以提供常见图片验证码的识别能力，并且能够对干扰线等进行推理和捕捉，对复杂的数学计算等直接进行推理并输出结果，这对验证码识别具有很大的帮助。

添加依赖

基于 Gemini RestAPI ^[1] 开发的 Spring Boot Starter

<dependency> <groupId>io.springboot.plugin</groupId> <artifactId>gemini-spring-boot3-starter</artifactId> <version>1.0.0</version> </dependency>

配置 gemini 相关参数

目前可以直接申请的 1.0 版本申请 API Key ^[2] ，最新发布的超长上下文 1.5 版本，需要加入 waitlist 等待

gemini: api-key:key proxy-host:ip proxy-port:port

文本模型测试

@Autowired private GeminiClient client; @Test voidgenerate(){ // 文本格式化提示词 String prompt = """ As a writing improvement assistant, your task is to improve the spelling, grammar, clarity, concision, and overall readability of the text provided, while breaking down long sentences, reducing repetition, and providing suggestions for improvement. Please provide only the corrected Chinese version of the text and avoid including explanations. Please begin by editing the following text: """; Generate.Request request = Generate.creatTextChart(prompt + """ 通过此技术，前端可以定制任何数据、任何结构。后端接收到请求再也不用写 Java controller 各类接口、实体代码，可直接操作数据库获取目标结果 """ ); Generate.Response response = client.generate(request); String answer = Generate.toAnswer(response); System.out.println(answer); }

输出优化后文本：

通过本技术，前端可以定制任何数据、任何结构。后端接收到请求无需再编写 Java controller、实体代码等接口，可直接操作数据库获取目标结果

图片模型测试

获取验证码图片原始文本

@Test voidgenerateVision()throws IOException { String prompt = """ I will provide you with an image CAPTCHA, please recognize the content inside the CAPTCHA and output the text """; Generate.Request request = Generate.creatImageChart(prompt, new File("/Users/lengleng/Downloads/1.png")); Generate.Response response = client.generate(request); String answer = Generate.toAnswer(response); System.out.println(answer); }

9+8=?

获取验证码图片计算结果

I will provide you with an image CAPTCHA. Please recognize the content inside the CAPTCHA and output the text . If the text is a mathematical calculation, please directly output the result