SiYuan: Privacy-first self‑hosted personal knowledge manager and ecosystem

SiYuan is a privacy‑first, feature‑rich self‑hosted knowledge management platform focused on block‑level references and Markdown WYSIWYG, suited for users and teams who require data control.

GitHub siyuan-note/siyuan Updated 2025-09-25 Branch main Stars 37.7K Forks 2.3K

TypeScript Golang Self‑hosted Personal Knowledge Management Markdown WYSIWYG Block‑level references Docker deployment Mobile apps

💡 Deep Analysis

What are common operational challenges when self-hosting SiYuan with Docker? How to avoid data loss and permission issues?

Core Analysis ¶

Core Issue: When self-hosting SiYuan (especially via Docker), common problems include file permission mismatches preventing read/write access, lack of persistent volume mounts causing data loss, and misconfigured reverse proxies/auth leading to availability or security issues.

Technical Analysis ¶

Permissions & Mounts: The container’s internal UID/GID must match host file ownership. README recommends using the entrypoint script’s PUID/PGID environment variables to align runtime user, resolving host-mounted directory permission issues.
Persistence & Data Loss: Always mount the workspace as a host persistent volume (-v workspace_dir_host:workspace_dir_container) to avoid storing data inside container layers that vanish on container removal.
Reverse Proxy & Real-time: Ensure your reverse proxy (Nginx/Caddy) correctly forwards WebSocket and preserves headers; otherwise real-time sync/desktop connections may fail.
Auth & Access Control: Configure --accessAuthCode properly and protect the service via HTTPS and IP controls to avoid exposure or accidental denial of access.

Practical Recommendations ¶

Before starting, chown PUID:PGID the workspace on the host or pass matching PUID/PGID env vars to docker run.
Use bind mounts or named volumes for workspace persistence and implement a backup/versioning strategy.
Test reverse proxy WebSocket forwarding and secure the endpoint with HTTPS and a strong access code.
Test plugins/import/export workflows in a staging instance before upgrading production workspace.

Caveats ¶

The Docker image notes some feature limitations (e.g., desktop/mobile connections and some import/export flows); verify required features before production deployment.

Summary: Proper UID/GID alignment, persistent volume mounting, correct reverse-proxy configuration, and access control reduce the main self-hosting risks for SiYuan.

91.0%

Why does SiYuan use a hybrid architecture (TypeScript/Electron frontend + Go kernel)? What are the advantages and potential drawbacks?

Core Analysis ¶

Core Question: SiYuan’s hybrid architecture (TypeScript/Electron frontend + Go kernel) aims to balance a rich client UI with backend efficiency and system-level capabilities, while enabling multi-client access and extensibility.

Technical Analysis ¶

Advantages:
Cross-platform UI & Rapid Development: TypeScript and Electron speed up building visual WYSIWYG editors and desktop apps using frontend ecosystems (npm, UI components).
Performance & System Access: Go as a kernel is suitable for concurrency, file I/O, subprocesses (e.g., invoking Tesseract OCR), and providing a stable HTTP/WebSocket interface.
Clear Separation: HTTP/WebSocket and the siyuan:// protocol decouple UI from kernel, facilitating third-party clients, mobile access, and plugin APIs (petal).
Binary/Container Friendly: Go’s static binaries are beneficial for server/container deployment; entrypoint handling of PUID/PGID improves volume mount compatibility.
Potential Drawbacks:
Resource Usage: Electron often consumes more memory/disk, posing issues on low-resource devices.
IPC & Compatibility Complexity: Frontend-backend communication requires protocol compatibility across upgrades; plugin runtimes must account for both kernel and frontend contexts.
Feature Divergence: README notes Docker image limitations (e.g., some exports/desktop connections restricted), indicating inconsistent experiences across deployment modes.

Practical Recommendations ¶

If desktop-like experience is required and hardware allows, use the official Electron desktop package for full features.
When deploying in containers/servers, test critical features (export, desktop connections, OCR) and prepare rollback plans for API/protocol upgrades.

Caveats ¶

For low-resource environments or memory-sensitive deployments, weigh Electron’s overhead; for long-running services, prioritize kernel API stability and implement logging/monitoring.

Summary: The hybrid architecture offers strong functionality and extensibility but requires attention to resource consumption, cross-layer compatibility, and feature consistency across deployment modes.

88.0%

How portable are SiYuan's data and export capabilities? What should be considered when backing up, migrating, or interoperating with other tools?

Core Analysis ¶

Core Question: Assess SiYuan’s data portability and export capabilities for backups, migration, and interoperability with other tools.

Technical Analysis ¶

File-based workspace advantages:
Workspace folders make it easy to copy, place under VCS, or script backups—resources and Markdown content can be exported together for archival.
Semantic & dynamic content challenges:
Block-level references: When exporting to standard Markdown you must decide whether to expand references or preserve them as placeholders; behavior differs in target systems.
SQL embeds & dynamic views: Embedded queries produce dynamic content; exporting statically loses query capability—materialize results before export if needed.
Plugins & custom extensions: Plugin-generated content or external dependencies (AI outputs, third-party storage) may require extra steps during migration.

Practical Recommendations ¶

Treat the workspace as the atomic backup unit: regularly copy/version this directory and take full backups before upgrades.
Conduct migration rehearsals: validate that block refs, SQL embeds, charts, and plugin outputs reproduce correctly in the target system.
Materialize important query results into static tables before export to preserve readability.
Avoid storing workspace inside container layers—mount it as a host persistent volume to prevent loss on container recreation.

Caveats ¶

README notes Docker image modes may limit desktop/mobile connections and import/export functions—verify your deployment’s capabilities before migrating/backing up.

Summary: The file-based workspace supports portability, but preserving block semantics and dynamic query behavior through migration requires a deliberate export strategy and validation in the destination environment.

88.0%

How does SiYuan's block-level referencing with Markdown WYSIWYG work in practice? What are its advantages and limitations?

Core Analysis ¶

Core Question: Evaluate SiYuan’s block-level referencing and Markdown WYSIWYG in writing, referencing, and export workflows.

Technical Analysis ¶

Advantages:
Block reuse & bidirectional links: Enables referencing/embedding of paragraphs or items as independent units, ideal for reusable knowledge nodes and networks.
WYSIWYG Markdown: Lowers the entry barrier—users see formatted output while retaining Markdown exportability (preserving portability).
Large-document & outline support: Features like list outline, block zoom-in, and million-word editing suit long-form writing and research notes.
Limitations:
Data model complexity: Block IDs, references, and nesting add complexity, raising consistency and compatibility concerns during export and sync.
Export variance: README notes standard Markdown export exists but certain Docker deployments limit features; complex embeds (SQL, plugins) may require extra handling to become plain Markdown.
Editing habit mismatch: Power Markdown users often prefer raw text control; WYSIWYG may be less flexible for complex code blocks or custom syntax.

Practical Recommendations ¶

Plan a block strategy before structuring notes (decide what should be independent blocks vs. continuous text) to reduce refactoring.
Test exports for important publication/archive workflows to ensure SQL embeds, charts, etc., serialize correctly.
Maintain some pure Markdown files or switch to source view before releases if precise textual control is needed.

Caveats ¶

Block-level editing brings strong reuse capabilities but increases management complexity; for high consistency/export demands, validate workflows in a test environment first.

Summary: SiYuan delivers a powerful visual block-based experience suitable for heavy users needing block reuse and graphical interaction, but users with strict export or raw-text control requirements should plan and test accordingly.

87.0%

How does SiYuan's SQL query embed make notes a lightweight queryable knowledge base? What are the use cases and limitations?

Core Analysis ¶

Core Question: Evaluate the practicality of SQL query embed for turning notes into a queryable knowledge base, typical use cases, and limitations.

Technical Analysis ¶

Value: SQL embeds let you run structured queries directly inside notes against blocks with custom attributes or tabular content, enabling dynamic table views and built-in reports via filtering, sorting, and aggregation.
Common Use Cases:
Literature management: aggregate references by author/year/tag and show statistics.
Task management: dashboard tasks by status/date/priority.
Research notes: filter experimental logs to produce data subsets for analysis/export.
Limitations & Risks:
Performance: Local engine capacity limits heavy, complex queries; frequent large joins/aggregations may cause latency.
Security: In shared/plugin scenarios, protect against unauthorized SQL execution and injection.
Reproducibility: Query results rely on workspace integrity; after export/migration, the SQL environment/data schema must match for reproducibility.

Practical Recommendations ¶

Add clear custom attributes and lightweight indexing (consistent field names, tag taxonomy) to queried data to improve performance and readability.
For heavy queries, run them offline and cache results to avoid full scans in the UI.
Restrict SQL permissions in team/plugin contexts; use read-only or sandboxed execution where possible.

Caveats ¶

Test complex queries on a sample dataset or staging environment first to measure performance and consistency.

Summary: SQL embeds elevate SiYuan into a queryable knowledge base ideal for attribute-driven management and dynamic views, but you must manage performance and security via indexing, caching, and permission controls.

86.0%

How usable are SiYuan's integrated OCR (Tesseract) and AI (OpenAI) features? What are the limitations in a local-first/offline scenario?

Core Analysis ¶

Core Question: Assess SiYuan’s bundled Tesseract OCR and OpenAI-based AI features in terms of usability, privacy, and offline capabilities.

Technical Analysis ¶

Tesseract OCR (local):
Pros: Runs fully locally, suitable for sensitive data and privacy-first workflows; open-source and configurable with language packs.
Cons: Accuracy depends on image quality, language models, and preprocessing; struggles with complex layouts (multi-column, tables) and handwriting.
OpenAI Integration (cloud API):
Pros: Offers powerful generation, summarization, and Q&A capabilities, rapidly improving writing workflows.
Cons: Requires network and API keys, involves data leaving the local environment and recurring cost; unavailable in offline scenarios.
Combined View: SiYuan leverages Tesseract for local OCR and OpenAI for advanced generation. Fully local AI requires community/self-hosted models or plugins, incurring additional hardware and maintenance.

Practical Recommendations ¶

For privacy-first needs, use Tesseract and improve OCR accuracy via preprocessing (denoising, resolution enhancements) and manual validation.
If using OpenAI, configure API keys carefully and establish policies to avoid sending sensitive data.
For strictly offline/compliant environments, evaluate self-hosted LLMs or offline inference options, and budget for hardware and ops costs.

Caveats ¶

Review data exfiltration risks and costs when enabling OpenAI; for bulk OCR workflows, consider performance and concurrency impacts on resources.

Summary: Tesseract serves local OCR needs but with limited accuracy; OpenAI integration delivers powerful capabilities but is cloud-dependent. Strictly offline or privacy-constrained users should plan for self-hosted model alternatives.

86.0%

✨ Highlights

Privacy‑first, fully self‑hosted personal knowledge management solution
Supports block‑level references, Markdown WYSIWYG and multi‑platform (desktop/mobile/Docker) deployment
Feature-rich: export options, OCR, AI writing, table/database view and plugin marketplace
License information is unclear, which may affect commercial use and downstream development compliance
Repository metadata shows zero contributors and no releases — incomplete data raises maintenance and reliability concerns

🔧 Engineering

Block‑based document model with bidirectional links enabling fine‑grained content reuse and references
Comprehensive import/export, OCR, AI‑assisted writing and multi‑platform support — suitable for deep note taking and knowledge base construction

⚠️ Risks

Unknown license type may introduce legal uncertainty for enterprise adoption and redistribution
Provided data shows zero contributors and commits; if accurate, this indicates risks around community activity and maintenance continuity

👥 For who?

Individuals, knowledge workers and small teams prioritizing privacy and control
Users or organizations with basic ops ability (supporting Docker, desktop/mobile deployment and local data storage)