Nanobrowser: Local, privacy-first AI web automation

Local multi-agent LLM browser extension delivering privacy-first web automation and interactive workflows while using your own API keys and models.

GitHub nanobrowser/nanobrowser Updated 2025-09-18 Branch master Stars 10.6K Forks 1.1K

TypeScript Chrome Extension Multi-agent Automation Privacy-first

💡 Deep Analysis

What concrete web-automation problems does Nanobrowser solve, and how does it implement these capabilities inside the browser?

Core Analysis ¶

Project Positioning: Nanobrowser is an open-source Chrome/Edge extension designed to provide controllable, privacy-friendly AI-driven web automation directly in the browser. It implements a multi-agent setup where a Planner handles semantic task decomposition and a Navigator performs DOM interactions, allowing different LLMs (including local models) to be assigned to each agent to execute complex, multi-step web tasks locally.

Technical Features ¶

In-browser execution: Uses extension content scripts and background scripts to manipulate the DOM and a sidebar UI for real-time visibility, avoiding third-party cloud intermediaries and keeping sensitive data local.
Multi-agent division: Planner for strategy and task breakdown; Navigator for clicks, data extraction, and form fills — reducing errors from single-model ambiguity.
Modular LLM adapters: Supports OpenAI, Anthropic, Gemini, Ollama, etc., enabling high-capacity models for planning and lightweight models for frequent execution.

Usage Recommendations ¶

Fit scenarios: Data scraping (tabular extraction), cross-page comparison tasks, bulk form filling, and orchestrated repetitive workflows—especially where privacy and cost control matter.
Deployment advice: Install the extension and iterate in a controlled test environment to fine-tune Planner prompts and Navigator behaviors before scaling.

Caveats ¶

The extension requires page access permissions—limit and understand them to reduce security risk; improper API key handling can leak credentials.
Pages with CAPTCHAs, MFA, or strict CSPs may block automation; browser resource limits affect long/parallel agents.

Important: Allocate high-capacity models to the Planner and lightweight models to the Navigator to balance accuracy and cost.

Summary: Nanobrowser brings semantic, multi-step automation into the browser, ideal for privacy-sensitive and cost-aware web automation. It’s not a universal replacement for server-based, long-running automation or anti-bot-proof pages, but it offers a practical in-browser alternative for everyday information workers and low-code builders.

87.0%

Regarding privacy and local models, how does Nanobrowser protect data and API keys? What are the recommended best practices?

Core Analysis ¶

Core Issue: Nanobrowser offers two privacy-protecting design choices: running automation locally in the browser and supporting local models (e.g., Ollama). These reduce or eliminate the need to send sensitive page contents to third-party clouds. However, actual security depends on how API keys are stored, extension permissions, and whether users choose local endpoints.

Technical Basis ¶

Local execution: Content scripts and sidebar run in the user’s browser, cutting out intermediary services.
Local model support: Configuring Ollama or other local endpoints allows inference to remain on the machine.
Open-source auditability: Apache-2.0 license and TypeScript sources enable code review for trust.

Best Practices ¶

Prefer local models for sensitive data: Configure Planner (or the full pipeline) to run on local endpoints when possible to avoid remote calls.
Secure API key storage: If using remote providers, store keys only in local browser storage (avoid syncing/backups) and rotate keys periodically.
Principle of least privilege: Grant the extension only necessary domain access; do not allow global access to untrusted sites.
Audit dependencies and updates: Use the open-source nature to monitor releases and dependencies for vulnerabilities.
Rate and quota controls: Set call limits per task to prevent key abuse or unexpected billing.

Important: Even if the extension runs locally, using remote LLMs sends requests (and possibly partial data) to providers. For strict compliance or highly sensitive data, use local models or self-hosted enterprise endpoints.

Summary: Nanobrowser can provide a high level of privacy if users opt for local models and follow key storage and permission best practices. Those steps are essential to minimize data exfiltration risks.

86.0%

How should one choose appropriate LLMs (cloud vs local) for different agents to balance cost and performance, and what rate-limiting and monitoring strategies should be used?

Core Analysis ¶

Core Issue: In Nanobrowser’s multi-agent architecture, model choice strongly affects cost, latency, and reliability. Assigning cloud vs local models appropriately, and enforcing rate limits and monitoring, is essential to balance cost and performance.

Model Assignment Rules ¶

Planner (low-frequency, high-value): Use high-capacity cloud models for strategy, recovery, and complex prompt parsing. These calls are infrequent but need quality.
Navigator (high-frequency execution): Prefer lightweight or local models (e.g., Ollama, small Llama variants) to minimize API calls, latency, and cost.
Hybrid approach: Send only critical decision steps to cloud models; let local models or deterministic logic handle repetitive DOM actions.

Rate-limiting & Budget Controls ¶

Per-task budget: Set a max API call count and spend threshold per workflow; pause and alert users if exceeded.
Global throttling: Enforce minute/hour limits to prevent cost spikes from concurrent tasks.
Batching calls: When possible, batch multiple Navigator requests into a single call to reduce handshake overhead, while preserving semantics.

Observability & Monitoring ¶

Call logs: Capture each LLM call (timestamp, agent, model, token/estimated cost) and export for cost analytics.
Failure metrics: Monitor Planner retries and Navigator DOM failures to guide model reassignments or prompt tuning.
Alerts: Notify users on budget breaches or abnormal failure rates and auto-scale down or pause tasks.

Important: Assign expensive models to high-value, low-frequency Planner tasks and lightweight/local models to frequent Navigator calls to reduce cost. Continuously monitor real usage and tune accordingly.

Summary: With clear Planner/Navigator responsibilities, per-task/global budgets, and monitoring, users can retain automation quality while keeping model costs and latency manageable.

85.0%

Why does Nanobrowser use a multi-agent (Planner and Navigator) design with modular LLM adapters? What are the advantages and limitations of this choice?

Core Analysis ¶

Architectural Intent: Nanobrowser separates tasks into Planner (semantic planning and strategy) and Navigator (page interactions and DOM operations) agents and uses a modular LLM adapter layer to allow appropriate models to be assigned to each role. This design optimizes the trade-offs between capability, cost, and privacy by enabling local or remote model mixes.

Technical Advantages ¶

Cost–capability separation: High-capability models handle complex planning infrequently, while lightweight models handle frequent execution tasks, reducing overall API costs.
Increased robustness: The Planner can detect failures and re-plan, mitigating cascading failures from single-step errors.
Flexible model strategy: Modular adapters enable hybrid cloud/local usage for privacy and performance tuning.

Limitations & Challenges ¶

Latency and switching costs: Frequent interaction between agents can add latency, especially when Planner uses a remote large model and Navigator uses local models.
Browser resource constraints: Running multiple agents long-term is limited by browser memory/CPU and may affect stability.
Execution limits on complex pages: Navigator can still be blocked by CSPs, dynamic loading, anti-bot measures, or session requirements.

Practical Recommendations ¶

Model assignment: Reserve high-latency/costly models for Planner; use lightweight/local models for Navigator to minimize frequent network calls and costs.
Reduce switching: Reuse the same model instance or endpoint during a task to lower initialization overhead.
Limit concurrency: Cap concurrent agents and monitor browser resource usage to avoid instability.

Important: Multi-agent architecture improves control and auditability but does not replace server-side engines for long-running, high-concurrency, or highly reliable automation.

Summary: The architecture is well-suited for cost-aware, privacy-sensitive, interactive in-browser automation. It trades off some latency and execution reliability on protected or long-running tasks, which may require complementary backend solutions.

84.0%

For non-engineering users, what is the learning curve and common pitfalls of using Nanobrowser? How can they lower onboarding friction and improve success rates?

Core Analysis ¶

Core Issue: Nanobrowser provides a visual sidebar and conversational interface that makes installing and trying simple automations relatively easy for non-engineering users. However, achieving stable, high-success, and cost-controlled complex workflows requires understanding model selection, extension permissions, and page execution boundaries—making the learning curve moderately steep.

Common Pitfalls (Evidence-based)¶

Permissions & security concerns: The extension requires page access; users who don’t understand permission scopes may be uneasy; improper API key handling risks leakage.
Model & cost misconfiguration: Using expensive models for high-frequency Navigator calls drives up cost; using weak models for planning causes repeated failures.
Page execution failures: Dynamic loading, CSPs, anti-bot protections, or login/CAPTCHA pages can block automation.
Resource & stability issues: Long-running or concurrent agents in the browser can be limited by memory/CPU, causing instability.

Practical Steps to Lower Onboarding Friction ¶

Start in a controlled test environment: Iterate on prompts and actions on test pages and record failure patterns.
Define model roles: Use high-capability models for Planner and lightweight/local models for Navigator to reduce API calls/costs.
Scale gradually: Begin with single-page automations, then extend to multi-page/process chains.
Minimize permissions: Grant the extension only necessary domain access, store API keys locally, and rotate keys regularly when possible.
Monitor and throttle: Set call/throughput limits to prevent unexpected billing spikes.

Important: For tasks requiring logins, CAPTCHAs, or complex session handling, partner with a developer or use server-side automation for reliability.

Summary: With staged testing, proper model assignment, and permission management, non-engineering users can quickly gain value from Nanobrowser. Complex scenarios, however, still benefit from developer support or hybrid architectures for robust execution.

83.0%

What are Nanobrowser's practical limitations on complex or protected pages (dynamic loading, CSP, CAPTCHA, MFA), and what mitigations are feasible?

Core Analysis ¶

Core Issue: Navigator runs as a browser extension and is constrained by page security policies and runtime behavior. Dynamic loading, CSP, CAPTCHAs, and MFA are designed to block automation, so Nanobrowser cannot reliably or legally bypass them using semantic planning alone.

Specific Limitations (evidence-based)¶

CSP/script injection: Some sites’ CSPs block content script injection or external scripts, preventing expected DOM operations.
Dynamic/async rendering: Unpredictable element timing can cause click/extraction failures.
CAPTCHA & MFA: These protect against automation; LLMs cannot legitimately bypass them and require human intervention or specialized services (mindful of legality).

Mitigations ¶

Split tasks & require human steps: Design workflows where automation pauses for required logins or CAPTCHAs and resumes after user action.
Backend assistance: Use server-side Playwright/Selenium for complex session management, then hand sanitized pages or results to Nanobrowser for semantic work.
Increase robustness: Add explicit waits, retries, and visibility checks in Navigator to handle dynamic loading.
Compliant CAPTCHAs handling: If CAPTCHA solving is necessary, use compliant third-party services or manual solving—do not attempt to circumvent.

Important: Do not use Nanobrowser to evade site protections. For CAPTCHAs or MFA, prefer legal and authorized solutions.

Summary: Nanobrowser is effective for ordinary scraping and form automation, but for protected pages adopt a hybrid approach: use Nanobrowser for automatable parts and rely on backend or human steps for authentication and anti-bot challenges.

82.0%

✨ Highlights

Free and runs locally in the browser, ensuring user data privacy
Supports multiple LLM providers and per-agent model assignment
Official support only for Chrome/Edge; other browsers have limited compatibility
Requires storing API keys in the extension; compromise could expose keys

🔧 Engineering

Local multi-agent system that supports task delegation and dynamic planning
Supports OpenAI/Anthropic/Gemini/Ollama and custom OpenAI-compatible endpoints
Provides an interactive side panel, conversation history, and follow-up queries for visual workflows

⚠️ Risks

Relatively few contributors (~10), creating uncertainty around long-term maintenance and security updates
The extension manages user API keys and browser permissions; misuse or compromise presents high risk

👥 For who?

Developers, researchers, and advanced users who prioritize privacy and local execution
Product and engineering teams needing cross-site automation, repetitive task orchestration, or customizable AI workflows