当前位置: 欣欣网 > 码农

Python 爬取GITEE搜索结果

2024-03-06码农

Python,学霸

  • 阅读指南

  • 简介

  • 安装

  • 实例

  • 输出

  • 安装

    pip install mechanicalsoup

    简介

    大家好!今天给你们带来了通过mechanicalsoup爬取gitee搜索结果的简单实例,可以设置页数。

    实例


    import mechanicalsoup
    def fetch_repo_info(keyword, pages=2):
    headers = {
    "User-Agent""Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
    }
    base_url = "https://search.gitee.com/?skin=rec&type=repository&q={}&pageno={}"
    all_results = [] #用于存储所有页面的结果
    for page in range(1, pages + 1):
    url = base_url.format(keyword, page)
    browser = mechanicalsoup.StatefulBrowser()
    browser.session.headers.update(headers)
    browser.open(url) #访问指定的URL
    page_content = browser.get_current_page() #获取当前页面
    items = page_content.find_all( class_="item")
    for item in items:
    link_element = item.find('a', href=True)
    title = link_element.text.strip() if link_element else"No title found"
    if title == "No title found":
    continue
    link = link_element['href'if link_element else"No link found"
    desc_element = item.find( class_="desc")
    description = desc_element.text.strip() if desc_element else"No description found"
    all_results.append({
    "title": title,
    "link": link,
    "description": description
    })
    return all_results
    #搜索
    print(fetch_repo_info("PYTHON",3))



    输出

    [{'title''OpenHarmony''link''https://gitee.com/openharmony?_from=gitee_search''description''OpenHarmony 是开放原子开源基金会(OpenAtom Foundation)旗下开源项目,定位是一款面向全场景的开源分布式操作系统。'}, {'title''小柒2012/从 零学Python''link''https://gitee.com/52it style/Python?_from=gitee_search''description''从零学Python,各种开发案例,不定期更新。'}, {'title''编程语言算法集/Python''link''https://gitee.com/TheAlgorithms/Python?_from=gitee_search''description''Python 算法集'}, {'title''程序员晚枫/python-office''link''https://gitee.com/CoderWanFeng/python-office?_from=gitee_search''description''Python自动化办公的第三方库:pip install python-office'}, {'title''YuHong-LDU/Python-AI''link''https://gitee.com/yuhong-ldu/python-ai?_from=gitee_search''description''Python与人工智能实践 (鲁东大学信电学院人工智能教研室)'}, {'title''码多多AI/likeadmin(Python版)''link''https://gitee.com/likeadmin/likeadmin_python?_from=gitee_search''description''🚀🚀🚀likeadmin是一套快速开发管理后台,使用流行的技术栈Python3、FastAPI、TypeScript、Vue3、vite2、Element Plus1.2(ElementUI)。 后台管理系统、后台管理框架、Python管理后台、FastApi管理后台、前后端分离管理后台、Vue3管理后台、Vue'}, {'title''awesome-lib/awesome-python''link''https://gitee.com/awesome-lib/awesome-python?_from=gitee_search''description''awesome-python 的中文版'}, {'title''Gitee 极速下载/jackfrued-Python-100-Days''link''https://gitee.com/mirrors/jackfrued-Python-100-Days?_from=gitee_search''description''Python - 100天从新手到大师'}, {'title''mktime/python-learn''link''https://gitee.com/mktime/python-learn?_from=gitee_search''description''GPT对话,Python基础编程示例:Excel读写追加处理,XML解析、JSON解析、FLV与MP4转换,PyQT界面应用程序开发示例等,https证书到期检测,糗百爬虫,pdf和图片互相转换,socket使用,百度OCR调用例子,IP及端口快速扫描。'}, {'title''非空/QrF.Python.FaceRecognition''link''https://gitee.com/QR/QrF.Python.FaceRecognition?_from=gitee_search''description''Python 人脸识别技术'}, {'title''天勤量化(TqSdk)/tqsdk-python''link''https://gitee.com/tianqin_quantification_tqsdk/tqsdk-python?_from=gitee_search''description''简单但强大的Python量化开发包'}, {'title''EliteQuant/EliteQuant_Python''link''https://gitee.com/EliteQuant/EliteQuant_Python?_from=gitee_search''description''Python量化投资交易平台。基于Python3的多线程并发式高频交易平台, 提供一致的回测和实时交易解决方案。它遵循现代设计模式,例如事件驱动, 服务器/客户端架构和松散耦合的强大稳定的分布式系统。它遵循与其他EliteQuant产品线相同的结构和绩效'}, {'title''OpenHarmony''link''https://gitee.com/openharmony?_from=gitee_search''description''OpenHarmony 是开放原子开源基金会(OpenAtom Foundation)旗下开源项目,定位是一款面向全场景的开源分布式操作系统。'}, {'title''火鸟/Python开源扫雷游戏PyMine''link''https://gitee.com/jerryshensjf/PyMine?_from=gitee_search''description''Python WxPython开源扫雷游戏PyMine为开 源扫雷游戏PyMine 使用Python语言和WxPython UI框架。本例移植自本人开源例程JMine 请在程序所在目录使用python PyMine.py启动例程需要先安装Python 3'}, {'title''武沛齐/python_course''link''https://gitee.com/wupeiqi/python_course?_from=gitee_search''description''Python全栈开发课件 & 源码 & 题目 & 答案'}, {'title''唐佐林/Python for OpenHarmony''link''https://gitee.com/delphi-tang/python-for-hos?_from=gitee_search''description''在鸿蒙设备上使用 Python 编程。'}, {'title''Gitee Community/Python 贪吃蛇魔改大赛''link''https://gitee.com/gitee-community/Adapted-game?_from=gitee_search''description''Python 「贪吃蛇」 魔改大赛,是 Gitee 面向 Python 爱好者举办的一场创意编程比赛,旨在鼓励喜爱 Python 编程或有丰富想象力、创新力的小伙伴积极参与开源,并为其提供竞技和展示的舞台,将天马行空的想象化为一行行代码,为经典的「贪吃蛇」小游戏焕发新的生命力!'}, {'title''Python自动化办公社区/python_auto_office''link''https://gitee.com/zhaofeng092/python_auto_office?_from=gitee_search''description''关注公众号:Python自动化办公社区,发送:1109,领取【47页PPT-Python如何进行自动化办公?】。'}, {'title''keijack/python-simple-http-server''link''https://gitee.com/keijack/python-simple-http-server?_from=gitee_search''description''一个超轻量级的 HTTP Server,支持线程和协程模式,源生支持 websocket 哦!你也可以非常容易的将其嵌入到 WSGI 与 ASGI 的服务器里。并且支持分布式 Session!'}, {'title''andyham/Python_junior''link''https://gitee.com/andyham_andy.ham/Python_junior?_from=gitee_search''description''The foundation of financial risk model programming'}, {'title''耿直的小爬虫/Python爬虫''link''https://gitee.com/testp2y/python_reptilian?_from=gitee_search''description''大数据时代 让爬虫爬取我们所需'}, {'title''vn.py官方/vn.py''link''https://gitee.com/vnpy/vnpy?_from=gitee_search''description''基于Python的开源量化交易平台开发框架'}, {'title''30秒学代码/30-seconds-of-python-code''link''https://gitee.com/seconds-of-code/30-seconds-of-python-code?_from=gitee_search''description''Python 语言版的 30 秒学代码'}, {'title''src-openEuler/fuse-python''link''https://gitee.com/src-openeuler/fuse-python?_from=gitee_search''description''Python bindings for FUSE - filesystem in userspace.'}, {'title''OpenHarmony''link''https://gitee.com/openharmony?_from=gitee_search''description''OpenHarmony 是开放原子开源基金会(OpenAtom Foundation)旗下开源项目,定位是一款面向全场景的开源分布式操作系统。'}, {'title''src-openEuler/python-docker''link''https://gitee.com/src-openeuler/python-docker?_from=gitee_search''description''A Python library for the Docker Engine API'}, {'title''keijack/python-eureka-client''link''https://gitee.com/keijack/python-eureka-client?_from=gitee_search''description''一个 Python 编写的 eureka 客户端,同时支持注册与发现服务,能使得你的代码非常方便 地接入 spring cloud 中。'}, {'title''src-openEuler/python-importlib-metadata''link''https://gitee.com/src-openeuler/python-importlib-metadata?_from=gitee_search''description''Read metadata from Python packages'}, {'title''6tail/lunar-python''link''https://gitee.com/6tail/lunar-python?_from=gitee_search''description''日历、公历(阳历)、农历(阴历、老黄历)、佛历、道历,支持节假日、星座、儒略日、干支、生肖、节气、节日、彭祖百忌、每日宜忌、吉神宜趋凶煞宜忌、吉神(喜神/福神/财神/阳贵神/阴贵神)方位、胎神方位、冲煞、纳音、星宿、八字、五行、十神、建除十二值星、青龙'}, {'title''src-openEuler/python-texttable''link''https://gitee.com/src-openeuler/python-texttable?_from=gitee_search''description''Python module to generate a formatted text table, using ASCII characters'}, {'title''百晓通客栈/BXT-AR4Python''link''https://gitee.com/Lindor_L/BXT-AR4Python?_from=gitee_search''description''百晓通客栈-增强现实开发库(with Python)'}, {'title''src-oepkgs/python-kafka-python''link''https://gitee.com/src-oepkgs/python-kafka-python?_from=gitee_search''description''Pure Python client for Apache Kafka'}, {'title''src-oepkgs/python-python-gitlab''link''https://gitee.com/src-oepkgs/python-python-gitlab?_from=gitee_search''description''Python module for interacting with the GitLab API'}, {'title''ni1o1/pygeo-tutorial''link''https://gitee.com/ni1o1/pygeo-tutorial?_from=gitee_search''description''Tutorial of geospatial data processing using python 用python分析时空数据的教程(in Chinese and English )'}, {'title''src-openEuler/python-meson-python''link''https://gitee.com/src-openeuler/python-meson-python?_from=gitee_search''description''Meson Python build backend (PEP 517)'}, {'title''wilson_yin/Zero basics Python''link''https://gitee.com/wilsonyin/zero-basics-python?_from=gitee_search''description''零基础学Python'}]