← XPath Practice Lab/Practice Pages

XPath Practice Pages

每個頁面都是一個真實場景 — 打開後按 F12 → Console → 輸入 $x("...") 測試

Level 1

Basic Selection

RTHK News Listing

Practice //tag, /child, @attr, text() — extract headlines, links, images from a news page.

//h1/text()//nav//a/@href//img/@src//article//h3/text()
Open Practice Page →
Level 2

Predicates & Filtering

Research Database

Use [ ] predicates to filter by class, attribute, position — scrape academic papers.

[@class='featured'][position()<=3][not(@disabled)][contains(@class,'x')]
Open Practice Page →
Level 3

Real Website Structure

HKBU Comm Staff Directory

Scrape staff names, titles, emails, phone numbers from a realistic university directory page.

//a[contains(@href,'mailto:')]//td[@class='staff-name']//tr[@data-rank='professor']//span[@class='research-tag']
Open Practice Page →
LIHKGREAL STRUCTURE

LIHKG 吹水台

lihkg.com/category/1

使用與真實 LIHKG 完全相同的 HTML 結構練習 XPath — 練習後可直接爬取真實網站。

contains(@class,"wQ4Ran")contains(@class,"CxY4XDSSI")contains(@class,"_20jopXBF")contains(@class,"_3VRxq3mC")
Open Practice Page →
💡 Workflow: Open a practice page → F12 → Console tab → type $x("//your/xpath") → press Enter → see highlighted results. Once your XPath works here, copy it directly into your Python tree.xpath("...") call.

⚠️ LIHKG Note: LIHKG uses React with hashed class names. The classes like wQ4Ran7ySbKd8PdMeHZZR may change after site updates. Always use contains(@class, "...") with the stable prefix, not exact class matching.