关于编程提取酒店详情信息的求助

st800820 发表于 2025-5-20 10:23:01

本帖最后由 st800820 于 2025-5-20 10:24 编辑

最近需要打电话向酒店介绍产品寻求合作，但人工采集电话信息特别麻烦，请万能的群友看有没有能通过编程实现的办法，谢谢！以下是介绍，如果违反群规，请提醒我删帖:handshake

1.打开chrome浏览器，完全打开后，在查找酒店搜索框输入需要搜集的城市名称后点击搜索
2.弹窗询问用户当前页面是否需要登陆操作，如完成，请点击"继续采集“后继续运行下面的动作

3.检测到用户点击"继续采集“后，模拟人工向下滚动鼠标加载列表，直到检测到下方出现”搜索更多酒店按钮“：
搜索更多酒店xpath： //*[@id="ibu_hotel_container"]/div/section/div/ul/div/div/span
搜索更多酒店css选择器： #ibu_hotel_container > div > section > div.list-content > ul > div.list-btn-more > div > span
4.点击该按钮，然后模拟人工向下滚动鼠标加载列表，再次检测到”搜索更多酒店按钮“后继续点击，然后模拟人工向下滚动鼠标加载列表
5.重新点击”搜索更多酒店按钮“和模拟人工向下滚动鼠标加载列表动作，直到检测不到”搜索更多酒店按钮“后，弹窗提示：酒店列表加载完毕，请点击“采集酒店详情”继续，并出现120秒倒计时，如果2分钟内未检测到人工点击该按钮，则自动进入下一步骤。
6.逐个打开列表中的酒店，
其中：列表中酒店的xpath://*[@id="ibu_hotel_container"]/div/section/div/ul/li/div/div/div/div/div/span
或者 CSS 选择器:#ibu_hotel_container > div > section > div.list-content > ul > li:nth-child(5) > div > div.right-card > div.hotel-info > div.hotel-head.mgb-6 > div > span.hotelName
7.在新打开的酒店详情页面中提取需要的信息，主要包括：酒店名称、地址、开业时间、客房数、酒店电话
相关页面元素的xpath:
酒店名称://*[@id="ibu-hotel-detail-head"]/div/div/div/h1
地址://*[@id="ibu-hotel-detail-head"]/div/div/div/div/span/span
开业时间://*[@id="detail-hotel-description"]/div/div/div/ul/li
客房数://*[@id="detail-hotel-description"]/div/div/div/ul/li
酒店电话://*[@id="detail-hotel-description"]/div/div/div/div/div/div
相关页面元素的 CSS 选择器:
酒店名称:#ibu-hotel-detail-head > div.detail-headline_container > div.detail-headline_base > div.detail-headline_title > h1
地址:#ibu-hotel-detail-head > div.detail-headline_container > div.detail-headline_base > div.detail-headline_address > div.detail-headline_position > span > span.detail-headline_position_text
开业时间:#detail-hotel-description > div.m-hotel-desc > div > div.m-hoteldesc_basic.basicInfo > ul > li:nth-child(1)
客房数:#detail-hotel-description > div.m-hotel-desc > div > div.m-hoteldesc_basic.basicInfo > ul > li:nth-child(2)
酒店电话:#detail-hotel-description > div.m-hotel-desc > div > div.m-hoteldesc_basic.basicInfo > div > div:nth-child(2) > div
8.把搜集的所有信息导入到以采集开始时间命名的csv表格中
9.采集完该酒店后，关闭当前酒店详情页面，继续打开列表中的下一家酒店，然后重新完成第7步采集酒店详情信息和第8步将采集信息追加写入表格中的动作
10.检测到列表中的酒店都采集过后，弹窗提示采集完成，并显示采集用时、采集家数等详情。弹窗中有点击“退出”的按钮，点击后退出该程序。
情况说明：
11.chrome已加入环境变量，地址：

"C:\chromedriver-win64\chromedriver.exe"
"C:\Program Files\Google\Chrome\Application\chrome.exe"
说明：
12日志功能，以排查问题，记录完整操作流水
13.由于脚本运行中页面会发生变化，请设计一个悬浮窗，提示每个环节的操作，通过与人工互动，增强脚本运行的可靠性
14.本人初学，以上xpath和css地址不确实提取的是否正确

akraja 发表于 2025-5-23 16:32:11

Don’t overload servers or scrape sites that prohibit it

Requirements
Python 3.x

Google Chrome

ChromeDriver (same version as your Chrome browser)

Install Python packages:

pip install selenium

Sample Code: Scraping Hotel Info via Google Search

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import time

# Setup Chrome options
chrome_options = Options()
chrome_options.add_argument("--headless")# Run in background
chrome_options.add_argument("--disable-gpu")

# Provide path to chromedriver
service = Service(executable_path='path/to/chromedriver')# Update this

# Initialize the browser
driver = webdriver.Chrome(service=service, options=chrome_options)

# Replace with your target hotel
hotel_query = "Taj Mahal Hotel Mumbai"

# Google Search URL
driver.get(f"https://www.google.com/search?q={hotel_query}")

time.sleep(2)# Wait for page to load

try:
hotel_name = driver.find_element(By.XPATH, '//div[@data-attrid="title"]/span').text
address = driver.find_element(By.XPATH, '//span/following-sibling::span').text
phone = driver.find_element(By.XPATH, '//span/following-sibling::span').text
hours = driver.find_element(By.XPATH, '//span/following-sibling::span').text

print("Hotel Name:", hotel_name)
print("Address:", address)
print("Contact Number:", phone)
print("Working Hours:", hours)

except Exception as e:
print("Some details could not be found:", e)

# Close the browser
driver.quit()

st800820 发表于 2025-5-26 08:50:35

akraja 发表于 2025-5-23 16:32
Don’t overload servers or scrape sites that prohibit it

Requirements

感谢，你电脑上能跑通吗？我这边运行不起来啊，之前用chatgpt反复生成和修改，都达不到理想效果:'(

mikiwei 发表于 2025-5-28 11:18:33

之前在推特上看到过一个大佬帮女朋友做类似的功能，不用他是通过高德地图或百度地图做的，没通过浏览器。

小猫小狗 发表于 2025-5-30 13:24:13

这是要干啥呀

st800820 发表于 2025-6-5 10:02:26

小猫小狗发表于 2025-5-30 13:24
这是要干啥呀

电话推销产品，获客用的

小猫小狗 发表于 2025-6-6 09:07:40

st800820 发表于 2025-6-5 10:02
电话推销产品，获客用的

:handshake

页: [1]

爱好论坛's Archiver

关于编程提取酒店详情信息的求助