changedetection.io: Simple web-change monitoring tool

changedetection.io: self-hosted web-change monitoring with selectors and multi-channel alerts for price, restock and document/API tracking.

GitHub dgtlmoon/changedetection.io Updated 2025-10-07 Branch main Stars 28.6K Forks 1.6K

Docker Playwright/Browser Automation Website Change Monitoring Price & Restock Tracking Notification Integrations Self-hosted Chrome Extension PDF/JSON Monitoring Proxy Support

💡 Deep Analysis

What concrete problems does this project solve, and in which scenarios should I choose changedetection.io?

Core Analysis ¶

Project Positioning: changedetection.io aims to provide a self-hostable, feature-rich monitoring platform for web/document changes covering static HTML, JS-rendered dynamic pages, interaction-driven content, and PDF/JSON changes.

Technical Features ¶

Dual fetcher approach: lightweight HTTP fetch for static pages and Chromium-based Playwright/WebDriver for JS execution and interactions when needed.
Rich extraction/filter pipeline: supports XPath/CSS/JSONPath/jq/regex and ignore/remove rules to reduce noise and precisely target content.
File and structured data support: built-in detection logic for PDFs (text/checksum/size) and JSON APIs.
Notifications and integrations: uses apprise to support many channels (Slack/Email/Discord/Webhook, etc.).

Usage Recommendations ¶

Match scenarios: Choose changedetection.io if you need self-hosted monitoring for prices, restocks, regulatory updates, or announcements—especially when pages require JS rendering or login. It’s more capable than simple HTML diff tools for these cases.
Prioritize configuration: Use non-JS fetcher by default to save resources; enable Playwright only for URLs that truly require it. Use dedicated PDF/JSON rules to avoid false positives.

Important Notes ¶

Resources & frequency: Extensive Playwright use increases CPU/memory; high-frequency checks require proxies and stronger hosts.
Anti-scraping & compliance: Frequent automated interactions may violate site terms—adjust frequency and follow legal/compliance constraints.
Tuning required: Complex pages require debugging browser steps and selectors.

Important Notice: For pages with heavy bot defenses (CAPTCHAs, behavior-based detection), the platform alone often cannot bypass them; you’ll need additional solutions or manual intervention.

Summary: changedetection.io is high-value for controlled, self-hosted monitoring of dynamic pages and document types, but assess resource overhead and anti-bot constraints for high-scale or protected targets.

85.0%

How to use Browser Steps and Visual Selector reliably for pages requiring interaction? What common misconfigurations cause failures?

Core Analysis ¶

Core question: How to build a stable monitoring flow for pages that require login or interaction to reach the target content?

Technical Analysis ¶

Role of Browser Steps: Simulates user actions (navigate, fill, click, submit) to achieve the final rendered state.
Role of Visual Selector: Visually selects the element/area to monitor within the post-interaction DOM, reducing unrelated noise.
Key point: Interaction scripts must be repeatable and idempotent; waiting conditions should be reliable (element visible/text present/network idle) rather than fixed sleeps.

Practical Recommendations ¶

Replay and validate steps locally first: Manually run the login/navigation flow in a browser until stable.
Use explicit waits: In Browser Steps wait for specific elements/text instead of fixed timeouts.
Pick robust selectors: In Visual Selector prefer stable CSS/XPath and avoid selectors that reference dynamic IDs or random classes.
Session & credential management: Store credentials securely and handle session expiry with retries; for 2FA/CAPTCHA sites consider manual or alternative flows.
Stepwise validation: After changes, run a check and inspect snapshots to ensure no regressions or false positives.

Common misconfigurations & impacts ¶

Using fixed sleeps instead of explicit waits: causes flakiness under varying network conditions.
Overbroad selectors including dynamic content: increases noise and false alerts.
Ignoring session expiry: checks return login pages instead of target content.
Neglecting anti-bot defenses (CAPTCHA/behavioral checks): leads to blocked checks or additional verification steps.

Important Notice: Browser Steps are powerful but not a cure-all. For CAPTCHAs and advanced defenses you’ll need manual steps or third-party services.

Summary: Treat Browser Steps as a programmable browser—combine explicit waits with robust selectors and validate after each change to improve reliability and reduce false positives.

85.0%

How to reduce false positives (noise) in practice and increase change-detection precision? Which filtering strategies are most effective?

Core Analysis ¶

Core question: How to reduce false positives while retaining meaningful changes?

Technical Analysis ¶

Prioritize targeting: Use the Visual Selector or precise CSS/XPath to narrow monitoring to the exact DOM fragment of interest, avoiding full-page diffs.
Field cleaning: Apply Remove/Ignore rules, regex replacements, or jq for JSON to eliminate timestamps, dynamic IDs, ads, and other noise.
Conditional triggers: For numeric data (price, stock) use thresholds or percent-change triggers; for text use keyword/regex triggers to ignore minor formatting changes.

Practical Recommendations ¶

Prefer precise extraction: If Visual Selector can target it, avoid page-level monitoring. Smaller scope reduces unrelated changes.
Maintain an ignore list: Identify known noise (dates, UUIDs, ad divs) and remove/replace them via rules.
Use JSONPath/jq for structured data: Extract specific fields instead of the whole payload and pair with conditional triggers.
Use thresholds/percentage changes: For price monitoring, avoid alerts on tiny formatting changes.
Iterate: Run for a period, inspect snapshots/alerts and refine rules based on observed noise.

Notes ¶

Over-cleaning risk: Aggressive replacements may hide meaningful changes; be cautious in production.
Highly dynamic content: Some client-side frequently reflowed content may require manual assessment or lower check frequency.

Important Notice: Treat change detection as an iterative task—start permissive then progressively tighten filters and triggers.

Summary: Combining precise selectors, field cleaning, and conditional triggers is the most effective way to reduce false positives; noisy pages require continuous tuning.

85.0%

How does changedetection.io implement PDF and JSON monitoring, and what are the limitations of these features?

Core Analysis ¶

Core question: How does changedetection.io detect changes in PDFs and JSON, and what are the limitations?

Technical Analysis ¶

PDF monitoring implementation: Typically download the PDF and compare extracted text or file-level metrics (size/checksum). If the PDF has a text layer (not a scanned image), text comparison can detect edits accurately.
JSON monitoring implementation: Use JSONPath/jq to extract specific fields from the response, store historical values, and trigger notifications on field diffs; regex or hashes can be used for segments.

Limitations ¶

Scanned PDFs (image-based): Require OCR to extract text; OCR adds error/noise and may cause false positives/negatives.
Frequently restructured JSON: If the API changes field paths or versions often, extraction rules break and require maintenance.
Large files & performance: Big PDFs/JSON increase CPU/memory; you may need to limit extraction scope.
Formatting vs semantic changes: Minor formatting edits (whitespace, layout) may register as changes in raw text diffs—use cleaning rules or smarter diffing.

Practical Recommendations ¶

For text-based PDFs, use text diffs with ignore/regex rules to strip timestamps or footers; for scanned PDFs consider OCR with expectations for higher noise.
For JSON, target stable JSONPath/jq expressions and validate rules during API release cycles; use numeric thresholds or type checks for key fields.
For large payloads, limit extraction scope or use hashing to detect if a deeper comparison is needed.

Important Notice: PDF and JSON support broaden use cases significantly, but scanned documents and unstable APIs require extra engineering to achieve reliable results.

Summary: changedetection.io’s PDF/JSON features are valuable for text-extractable PDFs and structured APIs; scanned PDFs, heavy OCR needs, or frequently changing API schemas require additional handling and maintenance.

85.0%

✨ Highlights

Feature-rich with many notification channels and visual selectors
Supports fast Docker deployment and self-hosting
Focused on price, restock and document/API change detection
Repository lacks a declared license, creating compliance and usage uncertainties
Contributor, release and commit metadata missing; community activity information is incomplete

🔧 Engineering

Supports visual selectors, Playwright and browser-step complex fetches
Multiple notification channels: Email, Discord, Slack, Telegram, Webhooks, etc.
Price and restock monitoring with conditional triggers, upper/lower limits and percentage thresholds
Supports PDF/JSON monitoring, custom JS execution, screenshot notifications, and per-watch proxy configuration

⚠️ Risks

No license declared, posing legal risks for commercial use and redistribution
Project metadata (contributors, releases, commits) appears missing, indicating higher maintenance risk
Scraping is susceptible to anti-bot measures, CAPTCHAs and proxy costs, requiring additional operational effort

👥 For who?

Sysadmins and self-hosting enthusiasts needing continuous website change monitoring
E-commerce product managers and buyers tracking price and restock changes
Compliance/legal teams and researchers monitoring legal texts, PDFs and API changes