changedetection.io: Simple web-change monitoring tool
changedetection.io: self-hosted web-change monitoring with selectors and multi-channel alerts for price, restock and document/API tracking.
GitHub dgtlmoon/changedetection.io Updated 2025-10-07 Branch main Stars 28.6K Forks 1.6K
Docker Playwright/Browser Automation Website Change Monitoring Price & Restock Tracking Notification Integrations Self-hosted Chrome Extension PDF/JSON Monitoring Proxy Support

💡 Deep Analysis

4
What concrete problems does this project solve, and in which scenarios should I choose changedetection.io?

Core Analysis

Project Positioning: changedetection.io aims to provide a self-hostable, feature-rich monitoring platform for web/document changes covering static HTML, JS-rendered dynamic pages, interaction-driven content, and PDF/JSON changes.

Technical Features

  • Dual fetcher approach: lightweight HTTP fetch for static pages and Chromium-based Playwright/WebDriver for JS execution and interactions when needed.
  • Rich extraction/filter pipeline: supports XPath/CSS/JSONPath/jq/regex and ignore/remove rules to reduce noise and precisely target content.
  • File and structured data support: built-in detection logic for PDFs (text/checksum/size) and JSON APIs.
  • Notifications and integrations: uses apprise to support many channels (Slack/Email/Discord/Webhook, etc.).

Usage Recommendations

  1. Match scenarios: Choose changedetection.io if you need self-hosted monitoring for prices, restocks, regulatory updates, or announcements—especially when pages require JS rendering or login. It’s more capable than simple HTML diff tools for these cases.
  2. Prioritize configuration: Use non-JS fetcher by default to save resources; enable Playwright only for URLs that truly require it. Use dedicated PDF/JSON rules to avoid false positives.

Important Notes

  • Resources & frequency: Extensive Playwright use increases CPU/memory; high-frequency checks require proxies and stronger hosts.
  • Anti-scraping & compliance: Frequent automated interactions may violate site terms—adjust frequency and follow legal/compliance constraints.
  • Tuning required: Complex pages require debugging browser steps and selectors.

Important Notice: For pages with heavy bot defenses (CAPTCHAs, behavior-based detection), the platform alone often cannot bypass them; you’ll need additional solutions or manual intervention.

Summary: changedetection.io is high-value for controlled, self-hosted monitoring of dynamic pages and document types, but assess resource overhead and anti-bot constraints for high-scale or protected targets.

85.0%
How to use Browser Steps and Visual Selector reliably for pages requiring interaction? What common misconfigurations cause failures?

Core Analysis

Core question: How to build a stable monitoring flow for pages that require login or interaction to reach the target content?

Technical Analysis

  • Role of Browser Steps: Simulates user actions (navigate, fill, click, submit) to achieve the final rendered state.
  • Role of Visual Selector: Visually selects the element/area to monitor within the post-interaction DOM, reducing unrelated noise.
  • Key point: Interaction scripts must be repeatable and idempotent; waiting conditions should be reliable (element visible/text present/network idle) rather than fixed sleeps.

Practical Recommendations

  1. Replay and validate steps locally first: Manually run the login/navigation flow in a browser until stable.
  2. Use explicit waits: In Browser Steps wait for specific elements/text instead of fixed timeouts.
  3. Pick robust selectors: In Visual Selector prefer stable CSS/XPath and avoid selectors that reference dynamic IDs or random classes.
  4. Session & credential management: Store credentials securely and handle session expiry with retries; for 2FA/CAPTCHA sites consider manual or alternative flows.
  5. Stepwise validation: After changes, run a check and inspect snapshots to ensure no regressions or false positives.

Common misconfigurations & impacts

  • Using fixed sleeps instead of explicit waits: causes flakiness under varying network conditions.
  • Overbroad selectors including dynamic content: increases noise and false alerts.
  • Ignoring session expiry: checks return login pages instead of target content.
  • Neglecting anti-bot defenses (CAPTCHA/behavioral checks): leads to blocked checks or additional verification steps.

Important Notice: Browser Steps are powerful but not a cure-all. For CAPTCHAs and advanced defenses you’ll need manual steps or third-party services.

Summary: Treat Browser Steps as a programmable browser—combine explicit waits with robust selectors and validate after each change to improve reliability and reduce false positives.

85.0%
How to reduce false positives (noise) in practice and increase change-detection precision? Which filtering strategies are most effective?

Core Analysis

Core question: How to reduce false positives while retaining meaningful changes?

Technical Analysis

  • Prioritize targeting: Use the Visual Selector or precise CSS/XPath to narrow monitoring to the exact DOM fragment of interest, avoiding full-page diffs.
  • Field cleaning: Apply Remove/Ignore rules, regex replacements, or jq for JSON to eliminate timestamps, dynamic IDs, ads, and other noise.
  • Conditional triggers: For numeric data (price, stock) use thresholds or percent-change triggers; for text use keyword/regex triggers to ignore minor formatting changes.

Practical Recommendations

  1. Prefer precise extraction: If Visual Selector can target it, avoid page-level monitoring. Smaller scope reduces unrelated changes.
  2. Maintain an ignore list: Identify known noise (dates, UUIDs, ad divs) and remove/replace them via rules.
  3. Use JSONPath/jq for structured data: Extract specific fields instead of the whole payload and pair with conditional triggers.
  4. Use thresholds/percentage changes: For price monitoring, avoid alerts on tiny formatting changes.
  5. Iterate: Run for a period, inspect snapshots/alerts and refine rules based on observed noise.

Notes

  • Over-cleaning risk: Aggressive replacements may hide meaningful changes; be cautious in production.
  • Highly dynamic content: Some client-side frequently reflowed content may require manual assessment or lower check frequency.

Important Notice: Treat change detection as an iterative task—start permissive then progressively tighten filters and triggers.

Summary: Combining precise selectors, field cleaning, and conditional triggers is the most effective way to reduce false positives; noisy pages require continuous tuning.

85.0%
How does changedetection.io implement PDF and JSON monitoring, and what are the limitations of these features?

Core Analysis

Core question: How does changedetection.io detect changes in PDFs and JSON, and what are the limitations?

Technical Analysis

  • PDF monitoring implementation: Typically download the PDF and compare extracted text or file-level metrics (size/checksum). If the PDF has a text layer (not a scanned image), text comparison can detect edits accurately.
  • JSON monitoring implementation: Use JSONPath/jq to extract specific fields from the response, store historical values, and trigger notifications on field diffs; regex or hashes can be used for segments.

Limitations

  1. Scanned PDFs (image-based): Require OCR to extract text; OCR adds error/noise and may cause false positives/negatives.
  2. Frequently restructured JSON: If the API changes field paths or versions often, extraction rules break and require maintenance.
  3. Large files & performance: Big PDFs/JSON increase CPU/memory; you may need to limit extraction scope.
  4. Formatting vs semantic changes: Minor formatting edits (whitespace, layout) may register as changes in raw text diffs—use cleaning rules or smarter diffing.

Practical Recommendations

  1. For text-based PDFs, use text diffs with ignore/regex rules to strip timestamps or footers; for scanned PDFs consider OCR with expectations for higher noise.
  2. For JSON, target stable JSONPath/jq expressions and validate rules during API release cycles; use numeric thresholds or type checks for key fields.
  3. For large payloads, limit extraction scope or use hashing to detect if a deeper comparison is needed.

Important Notice: PDF and JSON support broaden use cases significantly, but scanned documents and unstable APIs require extra engineering to achieve reliable results.

Summary: changedetection.io’s PDF/JSON features are valuable for text-extractable PDFs and structured APIs; scanned PDFs, heavy OCR needs, or frequently changing API schemas require additional handling and maintenance.

85.0%

✨ Highlights

  • Feature-rich with many notification channels and visual selectors
  • Supports fast Docker deployment and self-hosting
  • Focused on price, restock and document/API change detection
  • Repository lacks a declared license, creating compliance and usage uncertainties
  • Contributor, release and commit metadata missing; community activity information is incomplete

🔧 Engineering

  • Supports visual selectors, Playwright and browser-step complex fetches
  • Multiple notification channels: Email, Discord, Slack, Telegram, Webhooks, etc.
  • Price and restock monitoring with conditional triggers, upper/lower limits and percentage thresholds
  • Supports PDF/JSON monitoring, custom JS execution, screenshot notifications, and per-watch proxy configuration

⚠️ Risks

  • No license declared, posing legal risks for commercial use and redistribution
  • Project metadata (contributors, releases, commits) appears missing, indicating higher maintenance risk
  • Scraping is susceptible to anti-bot measures, CAPTCHAs and proxy costs, requiring additional operational effort

👥 For who?

  • Sysadmins and self-hosting enthusiasts needing continuous website change monitoring
  • E-commerce product managers and buyers tracking price and restock changes
  • Compliance/legal teams and researchers monitoring legal texts, PDFs and API changes