关于编程提取酒店详情信息的求助

st800820 · 发表于 2025-5-20 10:23:01

本帖最后由 st800820 于 2025-5-20 10:24 编辑

最近需要打电话向酒店介绍产品寻求合作，但人工采集电话信息特别麻烦，请万能的群友看有没有能通过编程实现的办法，谢谢！以下是介绍，如果违反群规，请提醒我删帖

1.打开chrome浏览器，完全打开后，在查找酒店搜索框输入需要搜集的城市名称后点击搜索
2.弹窗询问用户当前页面是否需要登陆操作，如完成，请点击"继续采集“后继续运行下面的动作

3.检测到用户点击"继续采集“后，模拟人工向下滚动鼠标加载列表，直到检测到下方出现”搜索更多酒店按钮“：
搜索更多酒店xpath： //*[@id="ibu_hotel_container"]/div/section/div[2]/ul/div[2]/div/span
搜索更多酒店css选择器： #ibu_hotel_container > div > section > div.list-content > ul > div.list-btn-more > div > span
4.点击该按钮，然后模拟人工向下滚动鼠标加载列表，再次检测到”搜索更多酒店按钮“后继续点击，然后模拟人工向下滚动鼠标加载列表
5.重新点击”搜索更多酒店按钮“和模拟人工向下滚动鼠标加载列表动作，直到检测不到”搜索更多酒店按钮“后，弹窗提示：酒店列表加载完毕，请点击“采集酒店详情”继续，并出现120秒倒计时，如果2分钟内未检测到人工点击该按钮，则自动进入下一步骤。
6.逐个打开列表中的酒店，
其中：列表中酒店的xpath://*[@id="ibu_hotel_container"]/div/section/div[2]/ul/li[5]/div/div[2]/div[1]/div[1]/div/span[1]
或者 CSS 选择器:#ibu_hotel_container > div > section > div.list-content > ul > li:nth-child(5) > div > div.right-card > div.hotel-info > div.hotel-head.mgb-6 > div > span.hotelName
7.在新打开的酒店详情页面中提取需要的信息，主要包括：酒店名称、地址、开业时间、客房数、酒店电话
相关页面元素的xpath:
酒店名称://*[@id="ibu-hotel-detail-head"]/div[1]/div[1]/div[1]/h1
地址://*[@id="ibu-hotel-detail-head"]/div[1]/div[1]/div[2]/div[1]/span/span[1]
开业时间://*[@id="detail-hotel-description"]/div[2]/div/div[1]/ul/li[1]
客房数://*[@id="detail-hotel-description"]/div[2]/div/div[1]/ul/li[3]
酒店电话://*[@id="detail-hotel-description"]/div[2]/div/div[1]/div/div[2]/div
相关页面元素的 CSS 选择器:
酒店名称:#ibu-hotel-detail-head > div.detail-headline_container > div.detail-headline_base > div.detail-headline_title > h1
地址:#ibu-hotel-detail-head > div.detail-headline_container > div.detail-headline_base > div.detail-headline_address > div.detail-headline_position > span > span.detail-headline_position_text
开业时间:#detail-hotel-description > div.m-hotel-desc > div > div.m-hoteldesc_basic.basicInfo > ul > li:nth-child(1)
客房数:#detail-hotel-description > div.m-hotel-desc > div > div.m-hoteldesc_basic.basicInfo > ul > li:nth-child(2)
酒店电话:#detail-hotel-description > div.m-hotel-desc > div > div.m-hoteldesc_basic.basicInfo > div > div:nth-child(2) > div
8.把搜集的所有信息导入到以采集开始时间命名的csv表格中
9.采集完该酒店后，关闭当前酒店详情页面，继续打开列表中的下一家酒店，然后重新完成第7步采集酒店详情信息和第8步将采集信息追加写入表格中的动作
10.检测到列表中的酒店都采集过后，弹窗提示采集完成，并显示采集用时、采集家数等详情。弹窗中有点击“退出”的按钮，点击后退出该程序。
情况说明：
11.chrome已加入环境变量，地址：

"C:\chromedriver-win64\chromedriver.exe"
"C:\Program Files\Google\Chrome\Application\chrome.exe"
说明：
12日志功能，以排查问题，记录完整操作流水
13.由于脚本运行中页面会发生变化，请设计一个悬浮窗，提示每个环节的操作，通过与人工互动，增强脚本运行的可靠性
14.本人初学，以上xpath和css地址不确实提取的是否正确

akraja · 发表于 7 天前

Don’t overload servers or scrape sites that prohibit it

Requirements
Python 3.x

Google Chrome

ChromeDriver (same version as your Chrome browser)

Install Python packages:

pip install selenium

Sample Code: Scraping Hotel Info via Google Search

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import time

# Setup Chrome options
chrome_options = Options()
chrome_options.add_argument("--headless")  # Run in background
chrome_options.add_argument("--disable-gpu")

# Provide path to chromedriver
service = Service(executable_path='path/to/chromedriver')  # Update this

# Initialize the browser
driver = webdriver.Chrome(service=service, options=chrome_options)

# Replace with your target hotel
hotel_query = "Taj Mahal Hotel Mumbai"

# Google Search URL
driver.get(f"https://www.google.com/search?q={hotel_query}")

time.sleep(2)  # Wait for page to load

try:
hotel_name = driver.find_element(By.XPATH, '//div[@data-attrid="title"]/span').text
address = driver.find_element(By.XPATH, '//span[contains(text(),"Address")]/following-sibling::span').text
phone = driver.find_element(By.XPATH, '//span[contains(text(),"Phone")]/following-sibling::span').text
hours = driver.find_element(By.XPATH, '//span[contains(text(),"Hours")]/following-sibling::span').text

print("Hotel Name:", hotel_name)
print("Address:", address)
print("Contact Number:", phone)
print("Working Hours:", hours)

except Exception as e:
print("Some details could not be found:", e)

# Close the browser
driver.quit()

st800820 · 发表于 4 天前

akraja 发表于 2025-5-23 16:32
Don’t overload servers or scrape sites that prohibit it

Requirements

感谢，你电脑上能跑通吗？我这边运行不起来啊，之前用chatgpt反复生成和修改，都达不到理想效果

mikiwei · 发表于前天 11:18

之前在推特上看到过一个大佬帮女朋友做类似的功能，不用他是通过高德地图或百度地图做的，没通过浏览器。

		自动登录	找回密码
密码			立即注册