XPath Practice Lab — HKBU Comm

← XPath Practice Lab/Practice Pages
XPath Practice Pages每個頁面都是一個真實場景 — 打開後按 F12 → Console → 輸入 $x("...") 測試
Level 1
Basic SelectionRTHK News Listing
Practice //tag, /child, @attr, text() — extract headlines, links, images from a news page.
//h1/text()//nav//a/@href//img/@src//article//h3/text()
Open Practice Page →
Level 2
Predicates & FilteringResearch Database
Use [ ] predicates to filter by class, attribute, position — scrape academic papers.
[@class='featured'][position()<=3][not(@disabled)][contains(@class,'x')]
Open Practice Page →
Level 3
Real Website StructureHKBU Comm Staff Directory
Scrape staff names, titles, emails, phone numbers from a realistic university directory page.
//a[contains(@href,'mailto:')]//td[@class='staff-name']//tr[@data-rank='professor']//span[@class='research-tag']
Open Practice Page →
LIHKGREAL STRUCTURE
LIHKG 吹水台lihkg.com/category/1
使用與真實 LIHKG 完全相同的 HTML 結構練習 XPath — 練習後可直接爬取真實網站。
contains(@class,"wQ4Ran")contains(@class,"CxY4XDSSI")contains(@class,"_20jopXBF")contains(@class,"_3VRxq3mC")
Open Practice Page →
💡 Workflow: Open a practice page → F12 → Console tab → type $x("//your/xpath") → press Enter → see highlighted results. Once your XPath works here, copy it directly into your Python tree.xpath("...") call.

⚠️ LIHKG Note: LIHKG uses React with hashed class names. The classes like wQ4Ran7ySbKd8PdMeHZZR may change after site updates. Always use contains(@class, "...") with the stable prefix, not exact class matching.