Website Change Detection: How to Monitor a Website for Changes
"Monitor a website for changes" sounds like one problem. It isn't — it's four different problems with four different right answers.
- Find new pages that didn't exist yesterday — the core of competitor monitoring.
- Detect content changes on a specific known page (pricing, homepage copy, docs).
- Detect visual changes on a page (layout, imagery, CSS).
- Watch a single value inside a page (a price number, a stock status, a headline).
If you pick the wrong method for your job, you'll get either silence when something changes or a flood of useless alerts when nothing really did. Here's how to pick.
Method 1: Sitemap diffing (best for finding new pages)
Almost every website publishes a sitemap.xml — a plain XML file listing every URL the site wants indexed. You can fetch it, diff it against yesterday's version, and see exactly which URLs were added or removed.
How to set it up end to end
Find the sitemap URL. Ninety percent of sites put it at one of three locations:
https://example.com/sitemap.xmlhttps://example.com/sitemap_index.xmlhttps://example.com/sitemap.xml.gz
If none return XML, open https://example.com/robots.txt. Near the bottom you'll see Sitemap: https://example.com/sitemap.xml.
Parse the XML. Each <url> entry has a <loc> (the URL) and often a <lastmod>. Pull out the <loc> values into a set. Ten lines of Python, five of Ruby.
Store yesterday's snapshot. Save the full URL list — text file, SQLite row, Postgres JSONB column. For a 5,000-page site the snapshot is under 500KB.
Run on a daily cron. Most sites regenerate their sitemap between 2am and 6am local time. Fetch around 10am to avoid catching a half-written file.
Categorize the URLs. A raw diff of 30 URLs is noise. Bucketed by path prefix — /blog/*, /solutions/*, /customers/*, /careers/*, /changelog/* — it becomes signal. For URLs that don't match a known prefix, pass the URL + page title to an LLM with eight category choices. Claude Haiku does this for about $0.00005 per URL.
Aggregate into a weekly digest. Daily alerts become background noise within a week. A weekly summary — "47 blog posts, 3 pricing changes, 11 new careers listings across your 8 tracked sites" — is dramatically more useful.
What can go wrong
- Sitemap flakiness. Some sites regenerate non-atomically; you'll occasionally fetch a half-written file. Retry once on parse error before treating URLs as "removed."
- Pagination. Large blogs put posts into paginated sitemaps (
sitemap-posts-1.xml). Follow the index. - Bot blocking. Some sites behind Cloudflare Bot Fight Mode serve a challenge page to non-browser User-Agents. A generic Mozilla UA fixes most of them.
- "Fake" removals. A URL dropping out doesn't always mean the page was deleted — sometimes the site just marked it noindex. Still signal, just not what you assume.
Method 2: Content hashing (best for pricing, homepage copy, docs)
When you want to know if the content of a specific page changed — not just whether a byte moved — the right approach is to fetch the HTML, strip the scripts and styles, extract the text, and hash it.
On the next check, re-fetch and compare. If the hash differs, the meaningful content changed. Store the old text alongside the new one so you can diff them (or feed both to an LLM for a semantic summary of what changed).
This is the right method for:
- Pricing pages (catch price changes) — see how to automate pricing change tracking
- Homepage hero copy (catch positioning shifts)
- Terms of service / privacy policy (catch legal changes)
- Changelogs
Don't use it for: finding new pages — hashing only tells you about URLs you already know.
Method 3: Visual/screenshot diffs (best for layout and design changes)
Tools like Visualping and Distill take a screenshot of the page on a schedule and run a pixel-level comparison. You get a before/after image highlighting exactly which regions changed.
This is overkill for text changes (content hashing is faster and quieter) and useless for finding new pages. It's the right tool when the change you care about is visual — a redesign, a hero image swap, a new CTA button, a layout shift.
Expect noise from any page with rotating banners, dynamic testimonials, or live counters. Mask those regions or skip the method.
Method 4: Selector polling (best for single values)
When you want to watch a single value — "has the price of this item dropped?", "is this product back in stock?", "did the job listing change status?" — you don't need whole-page monitoring. You need CSS-selector polling.
Fetch the page, evaluate a CSS selector (or XPath), extract the text or attribute, compare against yesterday's value. Distill and custom scripts using Playwright or Puppeteer both handle this.
The work is picking a selector that won't break on every site redesign. Prefer stable attributes (data-testid, itemprop) over class names.
Which method for which job
| Job | Best method |
|---|---|
| Find new pages on a competitor site | Sitemap diffing |
| Track pricing changes on a known page | Content hashing + LLM diff |
| Catch homepage messaging shifts | Content hashing |
| Detect design changes or redesigns | Visual diffing |
| Watch a specific product price or stock | Selector polling |
| Monitor 5+ sites broadly | Sitemap diffing + content hashing for top pages |
The trap most teams fall into: they use one method for all four jobs. A visual-diff tool pointed at a blog section will spam them; a sitemap monitor pointed at a pricing page will say nothing useful. Match the method to the signal.
RivalPages combines sitemap diffing (to discover new pages) with content hashing plus an LLM-powered semantic diff (to explain what changed on watched pages) — the two methods that matter for competitive intelligence. Visual diffs and selector polling are better served by purpose-built tools like Visualping or Distill, and we link to them where appropriate.
Track this yourself in 30 seconds
RivalPages watches competitor sitemaps, pricing pages, and homepage messaging — and sends you a weekly digest of what actually matters. Free during early access.
Start tracking competitors