Python,学霸
阅读指南
简介
安装
实例
输出
安装
pip install mechanicalsoup
简介
大家好!今天给你们带来了通过mechanicalsoup爬取gitee搜索结果的简单实例,可以设置页数。
实例
import mechanicalsoup
def fetch_repo_info(keyword, pages=2):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
base_url = "https://search.gitee.com/?skin=rec&type=repository&q={}&pageno={}"
all_results = [] #用于存储所有页面的结果
for page in range(1, pages + 1):
url = base_url.format(keyword, page)
browser = mechanicalsoup.StatefulBrowser()
browser.session.headers.update(headers)
browser.open(url) #访问指定的URL
page_content = browser.get_current_page() #获取当前页面
items = page_content.find_all( class_="item")
for item in items:
link_element = item.find('a', href=True)
title = link_element.text.strip() if link_element else"No title found"
if title == "No title found":
continue
link = link_element['href'] if link_element else"No link found"
desc_element = item.find( class_="desc")
description = desc_element.text.strip() if desc_element else"No description found"
all_results.append({
"title": title,
"link": link,
"description": description
})
return all_results
#搜索
print(fetch_repo_info("PYTHON",3))
输出
[{'title': 'OpenHarmony', 'link': 'https://gitee.com/openharmony?_from=gitee_search', 'description': 'OpenHarmony 是开放原子开源基金会(OpenAtom Foundation)旗下开源项目,定位是一款面向全场景的开源分布式操作系统。'}, {'title': '小柒2012/从 零学Python', 'link': 'https://gitee.com/52it style/Python?_from=gitee_search', 'description': '从零学Python,各种开发案例,不定期更新。'}, {'title': '编程语言算法集/Python', 'link': 'https://gitee.com/TheAlgorithms/Python?_from=gitee_search', 'description': 'Python 算法集'}, {'title': '程序员晚枫/python-office', 'link': 'https://gitee.com/CoderWanFeng/python-office?_from=gitee_search', 'description': 'Python自动化办公的第三方库:pip install python-office'}, {'title': 'YuHong-LDU/Python-AI', 'link': 'https://gitee.com/yuhong-ldu/python-ai?_from=gitee_search', 'description': 'Python与人工智能实践 (鲁东大学信电学院人工智能教研室)'}, {'title': '码多多AI/likeadmin(Python版)', 'link': 'https://gitee.com/likeadmin/likeadmin_python?_from=gitee_search', 'description': '🚀🚀🚀likeadmin是一套快速开发管理后台,使用流行的技术栈Python3、FastAPI、TypeScript、Vue3、vite2、Element Plus1.2(ElementUI)。 后台管理系统、后台管理框架、Python管理后台、FastApi管理后台、前后端分离管理后台、Vue3管理后台、Vue'}, {'title': 'awesome-lib/awesome-python', 'link': 'https://gitee.com/awesome-lib/awesome-python?_from=gitee_search', 'description': 'awesome-python 的中文版'}, {'title': 'Gitee 极速下载/jackfrued-Python-100-Days', 'link': 'https://gitee.com/mirrors/jackfrued-Python-100-Days?_from=gitee_search', 'description': 'Python - 100天从新手到大师'}, {'title': 'mktime/python-learn', 'link': 'https://gitee.com/mktime/python-learn?_from=gitee_search', 'description': 'GPT对话,Python基础编程示例:Excel读写追加处理,XML解析、JSON解析、FLV与MP4转换,PyQT界面应用程序开发示例等,https证书到期检测,糗百爬虫,pdf和图片互相转换,socket使用,百度OCR调用例子,IP及端口快速扫描。'}, {'title': '非空/QrF.Python.FaceRecognition', 'link': 'https://gitee.com/QR/QrF.Python.FaceRecognition?_from=gitee_search', 'description': 'Python 人脸识别技术'}, {'title': '天勤量化(TqSdk)/tqsdk-python', 'link': 'https://gitee.com/tianqin_quantification_tqsdk/tqsdk-python?_from=gitee_search', 'description': '简单但强大的Python量化开发包'}, {'title': 'EliteQuant/EliteQuant_Python', 'link': 'https://gitee.com/EliteQuant/EliteQuant_Python?_from=gitee_search', 'description': 'Python量化投资交易平台。基于Python3的多线程并发式高频交易平台, 提供一致的回测和实时交易解决方案。它遵循现代设计模式,例如事件驱动, 服务器/客户端架构和松散耦合的强大稳定的分布式系统。它遵循与其他EliteQuant产品线相同的结构和绩效'}, {'title': 'OpenHarmony', 'link': 'https://gitee.com/openharmony?_from=gitee_search', 'description': 'OpenHarmony 是开放原子开源基金会(OpenAtom Foundation)旗下开源项目,定位是一款面向全场景的开源分布式操作系统。'}, {'title': '火鸟/Python开源扫雷游戏PyMine', 'link': 'https://gitee.com/jerryshensjf/PyMine?_from=gitee_search', 'description': 'Python WxPython开源扫雷游戏PyMine为开 源扫雷游戏PyMine 使用Python语言和WxPython UI框架。本例移植自本人开源例程JMine 请在程序所在目录使用python PyMine.py启动例程需要先安装Python 3'}, {'title': '武沛齐/python_course', 'link': 'https://gitee.com/wupeiqi/python_course?_from=gitee_search', 'description': 'Python全栈开发课件 & 源码 & 题目 & 答案'}, {'title': '唐佐林/Python for OpenHarmony', 'link': 'https://gitee.com/delphi-tang/python-for-hos?_from=gitee_search', 'description': '在鸿蒙设备上使用 Python 编程。'}, {'title': 'Gitee Community/Python 贪吃蛇魔改大赛', 'link': 'https://gitee.com/gitee-community/Adapted-game?_from=gitee_search', 'description': 'Python 「贪吃蛇」 魔改大赛,是 Gitee 面向 Python 爱好者举办的一场创意编程比赛,旨在鼓励喜爱 Python 编程或有丰富想象力、创新力的小伙伴积极参与开源,并为其提供竞技和展示的舞台,将天马行空的想象化为一行行代码,为经典的「贪吃蛇」小游戏焕发新的生命力!'}, {'title': 'Python自动化办公社区/python_auto_office', 'link': 'https://gitee.com/zhaofeng092/python_auto_office?_from=gitee_search', 'description': '关注公众号:Python自动化办公社区,发送:1109,领取【47页PPT-Python如何进行自动化办公?】。'}, {'title': 'keijack/python-simple-http-server', 'link': 'https://gitee.com/keijack/python-simple-http-server?_from=gitee_search', 'description': '一个超轻量级的 HTTP Server,支持线程和协程模式,源生支持 websocket 哦!你也可以非常容易的将其嵌入到 WSGI 与 ASGI 的服务器里。并且支持分布式 Session!'}, {'title': 'andyham/Python_junior', 'link': 'https://gitee.com/andyham_andy.ham/Python_junior?_from=gitee_search', 'description': 'The foundation of financial risk model programming'}, {'title': '耿直的小爬虫/Python爬虫', 'link': 'https://gitee.com/testp2y/python_reptilian?_from=gitee_search', 'description': '大数据时代 让爬虫爬取我们所需'}, {'title': 'vn.py官方/vn.py', 'link': 'https://gitee.com/vnpy/vnpy?_from=gitee_search', 'description': '基于Python的开源量化交易平台开发框架'}, {'title': '30秒学代码/30-seconds-of-python-code', 'link': 'https://gitee.com/seconds-of-code/30-seconds-of-python-code?_from=gitee_search', 'description': 'Python 语言版的 30 秒学代码'}, {'title': 'src-openEuler/fuse-python', 'link': 'https://gitee.com/src-openeuler/fuse-python?_from=gitee_search', 'description': 'Python bindings for FUSE - filesystem in userspace.'}, {'title': 'OpenHarmony', 'link': 'https://gitee.com/openharmony?_from=gitee_search', 'description': 'OpenHarmony 是开放原子开源基金会(OpenAtom Foundation)旗下开源项目,定位是一款面向全场景的开源分布式操作系统。'}, {'title': 'src-openEuler/python-docker', 'link': 'https://gitee.com/src-openeuler/python-docker?_from=gitee_search', 'description': 'A Python library for the Docker Engine API'}, {'title': 'keijack/python-eureka-client', 'link': 'https://gitee.com/keijack/python-eureka-client?_from=gitee_search', 'description': '一个 Python 编写的 eureka 客户端,同时支持注册与发现服务,能使得你的代码非常方便 地接入 spring cloud 中。'}, {'title': 'src-openEuler/python-importlib-metadata', 'link': 'https://gitee.com/src-openeuler/python-importlib-metadata?_from=gitee_search', 'description': 'Read metadata from Python packages'}, {'title': '6tail/lunar-python', 'link': 'https://gitee.com/6tail/lunar-python?_from=gitee_search', 'description': '日历、公历(阳历)、农历(阴历、老黄历)、佛历、道历,支持节假日、星座、儒略日、干支、生肖、节气、节日、彭祖百忌、每日宜忌、吉神宜趋凶煞宜忌、吉神(喜神/福神/财神/阳贵神/阴贵神)方位、胎神方位、冲煞、纳音、星宿、八字、五行、十神、建除十二值星、青龙'}, {'title': 'src-openEuler/python-texttable', 'link': 'https://gitee.com/src-openeuler/python-texttable?_from=gitee_search', 'description': 'Python module to generate a formatted text table, using ASCII characters'}, {'title': '百晓通客栈/BXT-AR4Python', 'link': 'https://gitee.com/Lindor_L/BXT-AR4Python?_from=gitee_search', 'description': '百晓通客栈-增强现实开发库(with Python)'}, {'title': 'src-oepkgs/python-kafka-python', 'link': 'https://gitee.com/src-oepkgs/python-kafka-python?_from=gitee_search', 'description': 'Pure Python client for Apache Kafka'}, {'title': 'src-oepkgs/python-python-gitlab', 'link': 'https://gitee.com/src-oepkgs/python-python-gitlab?_from=gitee_search', 'description': 'Python module for interacting with the GitLab API'}, {'title': 'ni1o1/pygeo-tutorial', 'link': 'https://gitee.com/ni1o1/pygeo-tutorial?_from=gitee_search', 'description': 'Tutorial of geospatial data processing using python 用python分析时空数据的教程(in Chinese and English )'}, {'title': 'src-openEuler/python-meson-python', 'link': 'https://gitee.com/src-openeuler/python-meson-python?_from=gitee_search', 'description': 'Meson Python build backend (PEP 517)'}, {'title': 'wilson_yin/Zero basics Python', 'link': 'https://gitee.com/wilsonyin/zero-basics-python?_from=gitee_search', 'description': '零基础学Python'}]