当前位置: 欣欣网 > 码农

Python爬取最新电影预告片信息

2024-01-30码农

大家好!今天给你们带来了从imdb爬取最新电影预告片信息,新增翻译功能。

安装所需库:

pip install pyhttpx translate

完整实例:

import pyhttpxfrom bs4 import BeautifulSoupfrom translate import Translatorheaders = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}session = pyhttpx.HttpSession()res = session.get(url='https://www.imdb.com/trailers/', headers=headers)soup = BeautifulSoup(res.text, 'html.parser')links = soup.find_all('a', class_='ipc-poster-card__title')translator = Translator(from_lang="en", to_lang="zh")for link in links: href = link['href'] text = link.get_text() translated_text = translator.translate(text) print(f"电影名: {text}({translated_text})") print(f"链接: https://www.imdb.com{href}") print()

  • 输出:电影名: Despicable Me 4(卑鄙的我4)链接: https://www.imdb.com/title/tt7510222/?ref_=vi_tr_tr_tt_0电影名: Bob Marley: One Love(鲍勃·马利:同一份爱)链接: https://www.imdb.com/title/tt8521778/?ref_=vi_tr_tr_tt_1电影名: Avatar: The Last Airbender(降世神通:最後的氣宗)链接: https://www.imdb.com/title/tt9018736/?ref_=vi_tr_tr_tt_2电影名: If(如果)链接: https://www.imdb.com/title/tt11152168/?ref_=vi_tr_tr_tt_3电影名: Immaculate(聖母無染原罪)链接: https://www.imdb.com/title/tt23137390/?ref_=vi_tr_tr_tt_4电影名: Shôgun(Shôgun)链接: https://www.imdb.com/title/tt2788316/?ref_=vi_tr_tr_tt_5电影名: Kung Fu Panda 4(功夫熊猫)链接: https://www.imdb.com/title/tt21692408/?ref_=vi_tr_tr_tt_6电影名: The Walking Dead: The Ones Who Live(行尸走肉:活着的人)链接: https://www.imdb.com/title/tt9859436/?ref_=vi_tr_tr_tt_7电影名: Ripley(蕾普利)链接: https://www.imdb.com/title/tt11016042/?ref_=vi_tr_tr_tt_8电影名: Ghostbusters: Frozen Empire(捉鬼敢死队:冰封帝国)链接: https://www.imdb.com/title/tt21235248/?ref_=vi_tr_tr_tt_9电影名: Constellation(配置)链接: https://www.imdb.com/title/tt19395018/?ref_=vi_tr_tr_tt_10电影名: Tracker(轨道)链接: https://www.imdb.com/title/tt13875494/?ref_=vi_tr_tr_tt_11电影名: Lisa Frankenstein(Lisa Frankenstein)链接: https://www.imdb.com/title/tt21188080/?ref_=vi_tr_tr_tt_12电影名: Godzilla x Kong: The New Empire(哥斯拉x金刚:新帝国)链接: https://www.imdb.com/title/tt14539740/?ref_=vi_tr_tr_tt_13电影名: The Gentlemen(The Gentlemen)链接: https://www.imdb.com/title/tt13210838/?ref_=vi_tr_tr_tt_14电影名: Furiosa: A Mad Max Saga(Furiosa :A Mad Max Saga)链接: https://www.imdb.com/title/tt12037194/?ref_=vi_tr_tr_tt_15电影名: The Penguin(企鹅)链接: https://www.imdb.com/title/tt15435876/?ref_=vi_tr_tr_tt_16电影名: Shaitaan(Shaitaan)链接: https://www.imdb.com/title/tt27744786/?ref_=vi_tr_tr_tt_17电影名: Inside Out 2(由内而外2)链接: https://www.imdb.com/title/tt22022452/?ref_=vi_tr_tr_tt_18电影名: Kingdom of the Planet of the Apes(人猿星球王国)链接: https://www.imdb.com/title/tt11389872/?ref_=vi_tr_tr_tt_19电影名: Abigail(阿比盖尔)链接: https://www.imdb.com/title/tt27489557/?ref_=vi_tr_tr_tt_20电影名: Fallout(尘降)链接: https://www.imdb.com/title/tt12637874/?ref_=vi_tr_tr_tt_21电影名: Sleeping Dogs(睡狗)链接: https://www.imdb.com/title/tt8542964/?ref_=vi_tr_tr_tt_22电影名: Imaginary(虚拟)链接: https://www.imdb.com/title/tt26658104/?ref_=vi_tr_tr_tt_23电影名: The Garfield Movie(加菲猫电影)链接: https://www.imdb.com/title/tt5779228/?ref_=vi_tr_tr_tt_24电影名: The New Look(全新造型)链接: https://www.imdb.com/title/tt18177528/?ref_=vi_tr_tr_tt_25电影名: Back to Black(本真之黑)链接: https://www.imdb.com/title/tt21261712/?ref_=vi_tr_tr_tt_26电影名: Beverly Hills Cop: Axel F(比佛利山庄警察:Axel F)链接: https://www.imdb.com/title/tt3083016/?ref_=vi_tr_tr_tt_27电影名: Cabrini(Cabrini)链接: https://www.imdb.com/title/tt14351082/?ref_=vi_tr_tr_tt_28电影名: Rebel Moon - Part Two: The Scargiver(叛军月亮-第二部分:疤痕者)链接: https://www.imdb.com/title/tt23137904/?ref_=vi_tr_tr_tt_29电影名: Mary & George(Mary & George)链接: https://www.imdb.com/title/tt26246248/?ref_=vi_tr_tr_tt_30电影名: Teri Baaton Mein Aisa Uljha Jiya(Teri Baaton Mein Aisa Uljha Jiya)链接: https://www.imdb.com/title/tt27459160/?ref_=vi_tr_tr_tt_31电影名: One Day(1天)链接: https://www.imdb.com/title/tt16283804/?ref_=vi_tr_tr_tt_32电影名: The First Omen(他也成为第一个赢得该奖杯的北爱尔兰球手)链接: https://www.imdb.com/title/tt5672290/?ref_=vi_tr_tr_tt_33电影名: KD - The Devil(KD - The Devil)链接: https://www.imdb.com/title/tt15295368/?ref_=vi_tr_tr_tt_34电影名: The Sympathizer(同情者)链接: https://www.imdb.com/title/tt14404618/?ref_=vi_tr_tr_tt_35电影名: Elsbeth(Elsbeth)链接: https://www.imdb.com/title/tt26591110/?ref_=vi_tr_tr_tt_36电影名: MaXXXine(MaXXXine)链接: https://www.imdb.com/title/tt22048412/?ref_=vi_tr_tr_tt_37

    pyhttpx参数简要:

    url:请求的URL地址。method:请求的HTTP方法,如GET、POST等。params:URL参数,可以是字典或字符串。headers:请求头,可以是字典。cookies:发送的cookies,可以是字典。data:请求体中发送的数据,可以是字符串或字典。json:以JSON格式发送的请求体数据,可以是字典。files:上传的文件,可以是字典。timeout:请求超时时间,可以是整数或浮点数。proxies:代理服务器设置,可以是字典。verify:是否验证SSL证书。cert:客户端证书路径。allow_redirects:是否允许重定向。stream:是否以流式传输响应。auth:HTTP身份验证,可以是元组(username, password)。params:请求的URL参数,可以是字典或字符串。trust_env:是否使用环境变量来设置代理和SSL配置。