Website Change Detection: How to Monitor a Website for Changes

April 22, 2026

"Monitor a website for changes" sounds like one problem. It isn't - it's four different problems with four different right answers.

Find new pages that didn't exist yesterday - the core of competitor monitoring.
Detect content changes on a specific known page (pricing, homepage copy, docs).
Detect visual changes on a page (layout, imagery, CSS).
Watch a single value inside a page (a price number, a stock status, a headline).

If you pick the wrong method for your job, you'll get either silence when something changes or a flood of useless alerts when nothing really did. Here's how to pick.

Method 1: Sitemap diffing (best for finding new pages)

Almost every website publishes a sitemap.xml - a plain XML file listing every URL the site wants indexed. You can fetch it, diff it against yesterday's version, and see exactly which URLs were added or removed.

How to set it up end to end

Find the sitemap URL. Ninety percent of sites put it at one of three locations:

https://example.com/sitemap.xml
https://example.com/sitemap_index.xml
https://example.com/sitemap.xml.gz

If none return XML, open https://example.com/robots.txt. Near the bottom you'll see Sitemap: https://example.com/sitemap.xml.

Parse the XML. Each <url> entry has a <loc> (the URL) and often a <lastmod>. Pull out the <loc> values into a set. Ten lines of Python, five of Ruby.

Store yesterday's snapshot. Save the full URL list - text file, SQLite row, Postgres JSONB column. For a 5,000-page site the snapshot is under 500KB.

Run on a daily cron. Most sites regenerate their sitemap between 2am and 6am local time. Fetch around 10am to avoid catching a half-written file.

Categorize the URLs. A raw diff of 30 URLs is noise. Bucketed by path prefix - /blog/*, /solutions/*, /customers/*, /careers/*, /changelog/* - it becomes signal. For URLs that don't match a known prefix, pass the URL + page title to an LLM with eight category choices. Claude Haiku does this for about $0.00005 per URL.

Aggregate into a weekly digest. Daily alerts become background noise within a week. A weekly summary - "47 blog posts, 3 pricing changes, 11 new careers listings across your 8 tracked sites" - is dramatically more useful.

What can go wrong

Sitemap flakiness. Some sites regenerate non-atomically; you'll occasionally fetch a half-written file. Retry once on parse error before treating URLs as "removed."
Pagination. Large blogs put posts into paginated sitemaps (sitemap-posts-1.xml). Follow the index.
Bot blocking. Some sites behind Cloudflare Bot Fight Mode serve a challenge page to non-browser User-Agents. A generic Mozilla UA fixes most of them.
"Fake" removals. A URL dropping out doesn't always mean the page was deleted - sometimes the site just marked it noindex. Still signal, just not what you assume.

Method 2: Content hashing (best for pricing, homepage copy, docs)

When you want to know if the content of a specific page changed - not just whether a byte moved - the right approach is to fetch the HTML, strip the scripts and styles, extract the text, and hash it.

On the next check, re-fetch and compare. If the hash differs, the meaningful content changed. Store the old text alongside the new one so you can diff them (or feed both to an LLM for a semantic summary of what changed).

This is the right method for:

Pricing pages (catch price changes) - see how to automate pricing change tracking
Homepage hero copy (catch positioning shifts)
Terms of service / privacy policy (catch legal changes)
Changelogs

Don't use it for: finding new pages - hashing only tells you about URLs you already know.

Method 3: Visual/screenshot diffs (best for layout and design changes)

Tools like Visualping and Distill take a screenshot of the page on a schedule and run a pixel-level comparison. You get a before/after image highlighting exactly which regions changed.

This is overkill for text changes (content hashing is faster and quieter) and useless for finding new pages. It's the right tool when the change you care about is visual - a redesign, a hero image swap, a new CTA button, a layout shift.

Expect noise from any page with rotating banners, dynamic testimonials, or live counters. Mask those regions or skip the method.

Method 4: Selector polling (best for single values)

When you want to watch a single value - "has the price of this item dropped?", "is this product back in stock?", "did the job listing change status?" - you don't need whole-page monitoring. You need CSS-selector polling.

Fetch the page, evaluate a CSS selector (or XPath), extract the text or attribute, compare against yesterday's value. Distill and custom scripts using Playwright or Puppeteer both handle this.

The work is picking a selector that won't break on every site redesign. Prefer stable attributes (data-testid, itemprop) over class names.

Which method for which job

Job	Best method
Find new pages on a competitor site	Sitemap diffing
Track pricing changes on a known page	Content hashing + LLM diff
Catch homepage messaging shifts	Content hashing
Detect design changes or redesigns	Visual diffing
Watch a specific product price or stock	Selector polling
Monitor 5+ sites broadly	Sitemap diffing + content hashing for top pages

The trap most teams fall into: they use one method for all four jobs. A visual-diff tool pointed at a blog section will spam them; a sitemap monitor pointed at a pricing page will say nothing useful. Match the method to the signal.

RivalPages combines sitemap diffing (to discover new pages) with content hashing plus an LLM-powered semantic diff (to explain what changed on watched pages) - the two methods that matter for competitive intelligence. Visual diffs and selector polling are better served by purpose-built tools like Visualping or Distill, and we link to them where appropriate.

Track this yourself in 30 seconds

RivalPages watches competitor sitemaps, pricing pages, and homepage messaging - and sends you a weekly digest of what actually matters. Free during early access.

Start tracking competitors

More on competitor intelligence

Competitor Monitoring: 4 Ways to Track What Competitors Actually Do

Sitemap monitoring is the cheapest, most reliable signal for what a competitor is building. Everything else is a workaround.

How to Track Competitor Pricing Changes Automatically

Pricing changes predict revenue strategy. A hash-based monitor tells you a page changed; you need the actual delta.

What Your Competitor's Tech Stack Tells You About Their Roadmap

A LaunchDarkly script appearing on a competitor's site means they're about to ship feature flags - 4-8 weeks before you see the features.

How to Spy on Competitor SEO Strategy Using Sitemaps

Keyword tools are noisy. Sitemap structure is the uneditable truth of what pages a competitor is investing in.