大家好!今天给你们带来了从imdb爬取最新电影预告片信息,新增翻译功能。
安装所需库:
pip install pyhttpx translate
完整实例:
import pyhttpx
from bs4 import BeautifulSoup
from translate import Translator
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
session = pyhttpx.HttpSession()
res = session.get(url='https://www.imdb.com/trailers/', headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
links = soup.find_all('a', class_='ipc-poster-card__title')
translator = Translator(from_lang="en", to_lang="zh")
for link in links:
href = link['href']
text = link.get_text()
translated_text = translator.translate(text)
print(f"电影名: {text}({translated_text})")
print(f"链接: https://www.imdb.com{href}")
print()
输出:
电影名: Despicable Me 4(卑鄙的我4)
链接: https://www.imdb.com/title/tt7510222/?ref_=vi_tr_tr_tt_0
电影名: Bob Marley: One Love(鲍勃·马利:同一份爱)
链接: https://www.imdb.com/title/tt8521778/?ref_=vi_tr_tr_tt_1
电影名: Avatar: The Last Airbender(降世神通:最後的氣宗)
链接: https://www.imdb.com/title/tt9018736/?ref_=vi_tr_tr_tt_2
电影名: If(如果)
链接: https://www.imdb.com/title/tt11152168/?ref_=vi_tr_tr_tt_3
电影名: Immaculate(聖母無染原罪)
链接: https://www.imdb.com/title/tt23137390/?ref_=vi_tr_tr_tt_4
电影名: Shôgun(Shôgun)
链接: https://www.imdb.com/title/tt2788316/?ref_=vi_tr_tr_tt_5
电影名: Kung Fu Panda 4(功夫熊猫)
链接: https://www.imdb.com/title/tt21692408/?ref_=vi_tr_tr_tt_6
电影名: The Walking Dead: The Ones Who Live(行尸走肉:活着的人)
链接: https://www.imdb.com/title/tt9859436/?ref_=vi_tr_tr_tt_7
电影名: Ripley(蕾普利)
链接: https://www.imdb.com/title/tt11016042/?ref_=vi_tr_tr_tt_8
电影名: Ghostbusters: Frozen Empire(捉鬼敢死队:冰封帝国)
链接: https://www.imdb.com/title/tt21235248/?ref_=vi_tr_tr_tt_9
电影名: Constellation(配置)
链接: https://www.imdb.com/title/tt19395018/?ref_=vi_tr_tr_tt_10
电影名: Tracker(轨道)
链接: https://www.imdb.com/title/tt13875494/?ref_=vi_tr_tr_tt_11
电影名: Lisa Frankenstein(Lisa Frankenstein)
链接: https://www.imdb.com/title/tt21188080/?ref_=vi_tr_tr_tt_12
电影名: Godzilla x Kong: The New Empire(哥斯拉x金刚:新帝国)
链接: https://www.imdb.com/title/tt14539740/?ref_=vi_tr_tr_tt_13
电影名: The Gentlemen(The Gentlemen)
链接: https://www.imdb.com/title/tt13210838/?ref_=vi_tr_tr_tt_14
电影名: Furiosa: A Mad Max Saga(Furiosa :A Mad Max Saga)
链接: https://www.imdb.com/title/tt12037194/?ref_=vi_tr_tr_tt_15
电影名: The Penguin(企鹅)
链接: https://www.imdb.com/title/tt15435876/?ref_=vi_tr_tr_tt_16
电影名: Shaitaan(Shaitaan)
链接: https://www.imdb.com/title/tt27744786/?ref_=vi_tr_tr_tt_17
电影名: Inside Out 2(由内而外2)
链接: https://www.imdb.com/title/tt22022452/?ref_=vi_tr_tr_tt_18
电影名: Kingdom of the Planet of the Apes(人猿星球王国)
链接: https://www.imdb.com/title/tt11389872/?ref_=vi_tr_tr_tt_19
电影名: Abigail(阿比盖尔)
链接: https://www.imdb.com/title/tt27489557/?ref_=vi_tr_tr_tt_20
电影名: Fallout(尘降)
链接: https://www.imdb.com/title/tt12637874/?ref_=vi_tr_tr_tt_21
电影名: Sleeping Dogs(睡狗)
链接: https://www.imdb.com/title/tt8542964/?ref_=vi_tr_tr_tt_22
电影名: Imaginary(虚拟)
链接: https://www.imdb.com/title/tt26658104/?ref_=vi_tr_tr_tt_23
电影名: The Garfield Movie(加菲猫电影)
链接: https://www.imdb.com/title/tt5779228/?ref_=vi_tr_tr_tt_24
电影名: The New Look(全新造型)
链接: https://www.imdb.com/title/tt18177528/?ref_=vi_tr_tr_tt_25
电影名: Back to Black(本真之黑)
链接: https://www.imdb.com/title/tt21261712/?ref_=vi_tr_tr_tt_26
电影名: Beverly Hills Cop: Axel F(比佛利山庄警察:Axel F)
链接: https://www.imdb.com/title/tt3083016/?ref_=vi_tr_tr_tt_27
电影名: Cabrini(Cabrini)
链接: https://www.imdb.com/title/tt14351082/?ref_=vi_tr_tr_tt_28
电影名: Rebel Moon - Part Two: The Scargiver(叛军月亮-第二部分:疤痕者)
链接: https://www.imdb.com/title/tt23137904/?ref_=vi_tr_tr_tt_29
电影名: Mary & George(Mary & George)
链接: https://www.imdb.com/title/tt26246248/?ref_=vi_tr_tr_tt_30
电影名: Teri Baaton Mein Aisa Uljha Jiya(Teri Baaton Mein Aisa Uljha Jiya)
链接: https://www.imdb.com/title/tt27459160/?ref_=vi_tr_tr_tt_31
电影名: One Day(1天)
链接: https://www.imdb.com/title/tt16283804/?ref_=vi_tr_tr_tt_32
电影名: The First Omen(他也成为第一个赢得该奖杯的北爱尔兰球手)
链接: https://www.imdb.com/title/tt5672290/?ref_=vi_tr_tr_tt_33
电影名: KD - The Devil(KD - The Devil)
链接: https://www.imdb.com/title/tt15295368/?ref_=vi_tr_tr_tt_34
电影名: The Sympathizer(同情者)
链接: https://www.imdb.com/title/tt14404618/?ref_=vi_tr_tr_tt_35
电影名: Elsbeth(Elsbeth)
链接: https://www.imdb.com/title/tt26591110/?ref_=vi_tr_tr_tt_36
电影名: MaXXXine(MaXXXine)
链接: https://www.imdb.com/title/tt22048412/?ref_=vi_tr_tr_tt_37
pyhttpx参数简要:
url:请求的URL地址。
method:请求的HTTP方法,如GET、POST等。
params:URL参数,可以是字典或字符串。
headers:请求头,可以是字典。
cookies:发送的cookies,可以是字典。
data:请求体中发送的数据,可以是字符串或字典。
json:以JSON格式发送的请求体数据,可以是字典。
files:上传的文件,可以是字典。
timeout:请求超时时间,可以是整数或浮点数。
proxies:代理服务器设置,可以是字典。
verify:是否验证SSL证书。
cert:客户端证书路径。
allow_redirects:是否允许重定向。
stream:是否以流式传输响应。
auth:HTTP身份验证,可以是元组(username, password)。
params:请求的URL参数,可以是字典或字符串。
trust_env:是否使用环境变量来设置代理和SSL配置。