WireGuard FPGA: Low-cost wire-speed hardware implementation

Implements an auditable, wire‑speed WireGuard prototype in SystemVerilog on low‑cost Artix‑7 with an open toolchain and on‑chip RISC‑V; currently a proof‑of‑concept with unclear licensing and limited maintenance — suitable for research and teaching, not production.

GitHub chili-chips-ba/wireguard-fpga Updated 2025-10-15 Branch main Stars 1.0K Forks 19

FPGA WireGuard SystemVerilog Low-cost hardware VPN

💡 Deep Analysis

What core problem does this project solve? How does it implement a wire-speed WireGuard data plane in hardware?

Core Analysis ¶

Project Positioning: The project addresses the gap where software WireGuard cannot reach wire‑speed and commercial hardware is expensive/closed. It implements a WireGuard data plane on a low‑cost Artix‑7 FPGA in open HDL, hardware‑accelerating core crypto to reduce CPU load and aiming for multi‑port 1GbE throughput.

Technical Features ¶

HW Crypto: ChaCha20‑Poly1305 implemented in RTL to lower per‑packet crypto overhead and increase forwarding throughput.
HW/SW Split: On‑chip RISC‑V runs control and key management while the data plane is handled by pipelined hardware — balancing flexibility vs performance.
Open Reuse: Leveraging Corundum and other open FPGA IP and open toolchains for auditability and reproducible builds.

Usage Recommendations ¶

Measure achievable throughput: Start with board‑level Ethernet loopback and benchmarks to determine attainable throughput at target frequencies.
Integrate incrementally: Validate Ethernet switching first, then add crypto, then the RISC‑V control plane to reduce integration risk.
Watch toolchain gaps: Cross‑check OpenXC7 vs Vivado for synthesis/implementation differences and capture constraint workarounds.

Important Notice: Phase1 is a PoC with limited features (static channels, minimal management) and is not production ready.

Summary: By hardening core crypto in FPGA and delegating control to RISC‑V, the project offers a path to near‑wire‑speed WireGuard on low‑cost hardware, but is constrained by Artix‑7 resources, toolchain maturity, and its current PoC scope.

88.0%

For developers or educational labs, what is the learning curve and common pitfalls? What best practices reduce integration risk?

Core Analysis ¶

Core Issue: The project requires cross‑domain skills (HDL, FPGA flows, network protocols, crypto, embedded SW), giving it a steep learning curve. Typical pitfalls are timing closure, toolchain differences, HW/SW interface bugs and crypto correctness.

Common Pitfalls ¶

Timing closure failures on Artix‑7 under dense logic or high frequency.
Toolchain discrepancies between OSS tools and Vivado causing synthesis/implementation differences.
Complex HW/SW synchronization between RISC‑V and the data plane (buffers, concurrency, error handling).
Underestimated crypto risks: RTL correctness and side‑channel exposure.

Best Practices (Actionable)¶

Integrate in stages: board bring‑up → Ethernet verification → data‑plane forward → crypto offload → control plane integration.
Automate DV: Use cocotb for regression and run at‑speed hardware regressions for edge cases.
Profile early: Identify performance hotspots (hashes, Curve25519) to decide HW/SW split early.
Define key boundary: Specify key storage/update/access policies and isolate sensitive paths from day one.
Reproducible builds: Capture bitstream generation scripts and tool versions for audits.

Important Notice: Treat this project as a development platform, not a drop‑in product — plan for substantial debug and verification effort.

Summary: The learning curve is steep but manageable: staged integration, automated testing, early profiling and clear security boundaries let teams adopt the platform safely within a reasonable timeframe.

87.0%

What are the performance and security advantages and risks of hardware‑implementing ChaCha20‑Poly1305? How should one balance them in implementation?

Core Analysis ¶

Core Issue: Moving ChaCha20‑Poly1305 from software to FPGA yields substantial performance gains but creates implementation correctness and side‑channel risks that must be balanced through careful design and verification.

Performance Benefits ¶

Pipelining & Parallelism: Hardware can process blocks in parallel, reducing per‑packet latency and increasing throughput.
CPU Offload: Offloading crypto frees the RISC‑V for control tasks and improves overall forwarding capacity.

Security Risks ¶

Implementation Bugs: RTL bugs can break correctness or interoperability; fixes on FPGA are costlier than software patches.
Side‑Channel Leakage: Timing/power/EM side channels are harder to mitigate post‑silicon and require design‑time countermeasures (constant‑time, masking).

Practical Recommendations ¶

Golden‑model diffing: Compare hardware outputs against a trusted software implementation (e.g. Linux WireGuard) for many vectors.
DV & at‑speed regression: Use cocotb plus real‑hardware regression to cover edge cases and error paths.
Side‑channel testing: Perform lab power/timing analysis and employ masking or time‑hardening if needed.
Updatability: Ensure the control plane supports safe roll‑out of bitstream/config updates when fixes are required.

Important Notice: Hardware improves performance but amplifies security responsibilities — complete functional and side‑channel validation is mandatory before sensitive deployment.

Summary: Hardening ChaCha20‑Poly1305 is essential for wire‑speed, but only acceptable if paired with rigorous verification and side‑channel mitigations to avoid turning performance wins into security liabilities.

86.0%

What specific scenarios is this project suitable for? When is it not recommended to use it?

Core Analysis ¶

Suitable Use Cases:

Teaching & Labs: Low‑cost hardware and open RTL are ideal for courses on HW/SW co‑design, crypto in hardware and networking.
Security/Audit Research: The transparent design enables backdoor and implementation audits.
FPGA Prototyping & Edge Acceleration: Useful for proving concepts or accelerating WireGuard at ~1Gbps per board in edge devices or custom NICs.

Not Recommended For:

High‑throughput production gateways: 10Gbps+ or many concurrent tunnels are out of scope for the current PoC.
Projects needing clear commercial licenses/support: License and delivery processes are unclear, posing legal/maintenance risk.
SLA‑critical, high‑reliability deployments: The PoC lacks production management, redundancy and hardened maintenance paths.

Practical Advice ¶

Use as a research/teaching platform: Employ it for experiments, papers or curricula rather than as a direct production replacement.
Plan for commercial scaling: If commercial use is required, plan migration to higher‑end FPGAs, commercial IP and clarify licensing/support.

Important Notice: Verify licensing and complete independent security/side‑channel assessments before any sensitive deployment.

Summary: The project is an effective low‑cost, auditable prototype suited to education and research; it is not yet a production replacement for high‑bandwidth or compliance‑sensitive scenarios.

86.0%

How to build an effective verification and test flow to ensure functional correctness and near‑wire‑speed performance?

Core Analysis ¶

Goal: Create a layered verification flow that covers functional correctness, interoperability, timing and performance so you can validate near‑wire‑speed behavior with limited lab resources and uncover security issues.

Recommended Verification Layers ¶

Unit & Protocol Simulation (RTL): Use cocotb to run extensive vector tests covering normal, boundary and error cases.
ISS / SW–HW Co‑simulation: Use a RISC‑V ISS (e.g. VProc) to validate control plane interactions and concurrency.
Post‑synthesis & Timing Verification: Synthesize with the target toolchain and run gate‑level or realistic timing checks to expose closure issues.
Hardware at‑speed Regression: Use lab traffic generators and measurement tools to exercise real links and measure throughput, loss, latency and resource usage.
Security & Side‑channel Testing: Do differential comparisons against a software golden model and run power/timing side‑channel probes on the crypto cores.

Tools & Pragmatic Tips ¶

Automate cocotb regression and include hardware testbeds in CI if board access is possible.
Create reproducible traffic scripts (tcpreplay/hardware generators) and standardized measurement templates to quantify wire‑speed behavior.
Profile performance hotspots early (hashes, Curve25519) to decide HW/SW partitioning.

Important Notice: Pure RTL simulation cannot reveal at‑speed timing behaviors — real hardware regression or shared lab facilities are essential.

Summary: A layered strategy (unit → co‑simulation → post‑synthesis → hardware regression) combined with differential and side‑channel testing provides the best coverage to validate correctness and near‑wire‑speed performance during PoC.

86.0%

Why choose Artix‑7 and an open-source toolchain? What are the trade-offs in performance and auditability?

Core Analysis ¶

Reason for Choice: The project opts for Artix‑7 and open toolchains to achieve cost‑effective, fully auditable hardware that lowers the bar for academia and researchers.

Technical Trade-offs ¶

Auditability & Transparency (Pro): Open toolchains and RTL allow third‑party inspection of bitstream generation, helpful for backdoor and correctness reviews.
Cost & Accessibility (Pro): Artix‑7 is inexpensive and widely available, ideal for teaching and PoC work.
Performance & Resource Limits (Con): Artix‑7’s logic, BRAM and IO ceilings limit concurrent tunnels and maximum forwarding rates; timing closure at high frequencies is challenging.
Toolchain Compatibility Risk (Con): Open toolchains may lack support for vendor‑specific primitives or constraints, requiring additional RTL wrappers or workarounds.

Practical Recommendations ¶

Match platform to use case: Use this setup for audits, education or low‑speed edge acceleration; choose higher‑end FPGAs or commercial IP for 10Gbps+ needs.
Prepare compatibility layers: Make RTL substitutes for vendor primitives and document Vivado vs OSS differences.
Budget resources early: Estimate LUT/BRAM/IO needs and timing margins early to avoid redesign.

Important Notice: Auditability is not automatic security — the bitstream build path and side‑channel vectors still require explicit validation.

Summary: Artix‑7 with open tooling is a deliberate trade‑off favoring transparency and cost; it fits PoC, research and teaching, but not high‑end production throughput without hardware/toolchain compromises.

84.0%

✨ Highlights

WireGuard implemented in FPGA at wire speed and open-sourced
Low-cost Artix-7 platform with four 1GbE ports
Proof-of-concept only; not production-ready
Low community activity and unknown license — substantial adoption risk

🔧 Engineering

WireGuard datapath implemented in SystemVerilog on FPGA, targeting wire‑speed processing
Self-contained board approach with on-chip RISC‑V control and focus on open-toolchain compatibility

⚠️ Risks

No clear license and zero contributors — legal compliance and long-term maintenance risks are high
Artix‑7 frequency and I/O limits (core ≈100 MHz, I/O ≤600 MHz) constrain performance scaling
Missing releases, CI, automated tests and benchmarks — verification and integration costs are high

👥 For who?

Targeted at FPGA and networking hardware engineers with HDL and embedded experience
Suitable for security researchers and educational labs for audit, teaching and prototyping