System Prompts Leaks: Curated multi-platform system prompts and examples

Provides a browsable collection of public chatbot system prompts for researchers and prompt engineers to compare and analyze; however, absence of a declared license and potential sensitive source texts require legal, privacy, and maintenance risk assessment before reuse.

GitHub asgeirtj/system_prompts_leaks Updated 2025-08-28 Branch main Stars 18.3K Forks 3.0K

JavaScript System prompts collection Prompt engineering / comparative analysis Security & compliance assessment

💡 Deep Analysis

Why does the project use a Git + Markdown architecture? What are the advantages and limitations of this technical choice?

Core Analysis ¶

Why Git+Markdown: This setup delivers auditability, decentralized collaboration, and broad toolchain compatibility. Researchers can git diff history, contribute via PRs, and read Markdown easily.

Technical Features and Advantages ¶

Advantage 1 — Auditability/Versioning: Git commits provide provenance and change tracking.
Advantage 2 — Low integration cost: Markdown/plain text can be parsed by any language or CI.
Advantage 3 — Decentralized/offline use: No runtime service required; users can fork/clone for local analysis.

Limitations and Risks ¶

Lack of structured metadata: No enforced fields (source, timestamp, trust) affecting reproducibility.
Unclear licensing/compliance: Legal reuse may be restricted.
Search and scalability: Text directories are less efficient than databases for large corpora.

Practical Recommendations ¶

Add metadata: In derived repos, attach source, date, evidence_url as JSON/YAML alongside Markdown.
Verification process: Require origin evidence in PR reviews and store snapshots.
Hybrid architecture: Use the text repo as raw layer and sync to an indexed DB for enterprise use cases.

Important Notice: Git+Markdown is transparent and lightweight but does not equate to authoritative or verified data.

Summary: The choice is excellent for research and sharing; for rigorous audits or large-scale search, you must add structure and compliance controls.

85.0%

As a security researcher, how can I effectively use this repository for prompt-injection or adversarial testing in practice?

Core Analysis ¶

Core Issue: Using raw system prompts from the repo for prompt-injection or adversarial tests carries risks: unclear provenance, missing context, and poor reproducibility. To produce high-quality tests, you must systematize the data.

Technical Analysis ¶

High sample accessibility: Markdown text is easy to extract into test vectors.
Automatable: Scripts can batch-convert directories into test cases for frameworks (e.g., pytest + simulator).
Verification gap: Missing timestamps, source URLs, and trust ratings reduce confidence in results.

Practical Recommendations (Steps)¶

Sample and group: Tag by vendor/model/use-case (security, moderation, assistant policy) and sample accordingly.
Provenance validation: Record evidence.md locally per sample (screenshots, publish date, PR author) and cite in reports.
Standardize: Convert Markdown to JSON test schemas: {id, vendor, model, system_prompt, source_url, collected_date, confidence}.
Build injection scenarios: Use the system prompt as target context and craft injection payloads; iterate tests.
Record reproducibility: Store git commit IDs, scripts, and results in CI for reproducibility.

Important Notice: Respect legal/ethical boundaries; do not run adversarial tests on unauthorized production systems.

Summary: The repo is a convenient raw sample pool; for rigorous security research, add provenance checks, structured workflows, and reproducible pipelines.

85.0%

In which scenarios is this project most suitable? What are its clear limitations or scenarios where it's not appropriate?

Core Analysis ¶

Suitable Scenarios: The repo’s form and content make it most valuable for:

Academic research and reproduction: Real text samples for behavior analysis and cross-model comparisons.
Security and red-team prep: Source of prompts for prompt-injection and baseline tests.
Engineering reference for integration: Engineers can use it to simulate third-party system instructions.

Clear Limitations and Unsuitable Use Cases ¶

Not authoritative: It is not vendor-published and shouldn’t be used as final configuration.
Legal/ethical risks: Leaked/copied prompts may raise vendor term or privacy/copyright issues.
Reproducibility/audit constraints: Missing timestamps, provenance, and license limit audit use.
Maintenance concerns: No releases; updates depend on community PRs and may be stale.

Practical Recommendations ¶

Treat as raw material: Use for building test suites and preliminary analysis, not final evidence.
Add verification for compliance: Augment samples with provenance and legal review before audit use.
Alternatives: For authoritative or production needs, prefer vendor docs, official APIs, or curated datasets.

Important Notice: Use this repo as a research aid, not as a direct source for production configurations.

Summary: Good for research, testing, and engineering reference; exercise caution for compliance or production use and add verification measures.

85.0%

How can the repository be enhanced to meet enterprise audit or compliance needs? What technical and process improvements are required?

Core Analysis ¶

Core Issue: The repo currently lacks provenance, timestamps, and explicit licensing required for enterprise audits. To become an enterprise-grade asset, both technical and governance upgrades are necessary.

Tech and Process Improvements ¶

Structured metadata: Add JSON/YAML per prompt (source_url, collected_date, evidence_hash, collector, confidence_score).
Evidence archival: Store original screenshots/captures in controlled object storage and reference their hashes.
Commit & signature policy: Use GPG-signed commits or timestamping services for critical commits.
PR review templates: Enforce source proof in PR templates and CI checks for completeness.
License & legal review: Clarify repository license or establish internal usage policies.
Sync & backup: Move verified data into an internal indexed datastore for search and retention.

Implementation Steps (Priority)¶

Add contributing guidelines and PR templates requiring source metadata.
Create CI checks to validate metadata and evidence URLs.
Consult legal to define license/usage terms and document them.
Add signatures and evidence archival, and back up to enterprise storage.

Important Notice: Technical enhancements improve auditability, but legal review is still essential; some sources may remain restricted.

Summary: With structured metadata, evidence storage, signing, and governance, the repo can meet most enterprise audit needs—but it requires organizational investment and legal backing.

85.0%

What common pitfalls exist in user experience for this project? How can learning cost be reduced and usability improved?

Core Analysis ¶

Common Pitfalls:

No onboarding examples: README lacks demonstrations of converting texts into test cases or validating provenance.
No automation tools: Although JavaScript is the main language, the repo doesn’t include parsing/conversion scripts.
Loose contribution process: PRs don’t enforce source or evidence fields, risking low-quality submissions.

Improvements to Reduce Learning Curve ¶

Add quickstart examples: Create examples/ with:
- scripts/parse_prompts.js: sample Markdown-to-JSON parser.
- examples/test_case.json: how to use a prompt in a test harness.
PR templates & CI checks: Require source_url and evidence in PRs; CI validates metadata presence.
Add compliance guidance: Include a Legal & Ethics section in README advising provenance checks.
Document common workflows: Full flow from git clone to generating test suites and recording reproducibility.

Practical usage snippet (brief)¶

git clone https://github.com/asgeirtj/system_prompts_leaks.git
Run node scripts/parse_prompts.js to emit prompts.json.
Use prompts.json to generate test inputs and run in CI.

Important Notice: Even with scripts, you must augment each prompt with provenance and legal review locally.

Summary: Adding parsing scripts, workflow examples, and stricter PR templates will lower onboarding friction and improve data quality and usability.

85.0%

✨ Highlights

High community attention with a notable star count
Aggregates system prompts from multiple public chatbots
No license specified — reuse may entail legal risk

🔧 Engineering

A system-prompt collection aimed at researchers and prompt engineers, including examples from several chatbots
Primarily Markdown-based, easy to browse and extend via pull requests

⚠️ Risks

License not declared — potential copyright or usage restriction risks
Contains potentially sensitive system-prompt texts that may raise privacy and compliance concerns
Few maintainers and no releases — long-term maintenance and quality guarantees are limited

👥 For who?

A reference repository for prompt engineers, model researchers, and security/compliance analysts
Suitable for developers and academics comparing system instructions and generation behaviors across chatbots