Project Name: Last30Days — 30-day multi-source community trends briefing generator

Last30Days is a multi-source trend briefing tool that searches and scores signals from social platforms, news, and prediction markets over the last 30 days, then synthesizes cited narratives—designed for researchers, prompt engineers, and product teams tracking short-term trends.

GitHub mvanhorn/last30days-skill Updated 2026-03-25 Branch main Stars 50.8K Forks 4.2K

data-collection social-media-analysis intelligence-research CLI/plugin multi-source-aggregation trend-insights

💡 Deep Analysis

What exact problem does this project solve, and what is its core value and workflow?

Core Analysis ¶

Project Positioning: This tool automates short-term (30-day window) cross-platform research briefs. Its core value is to parallelize heterogeneous signal collection (social media, short video, forums, prediction markets), normalize and score signals, deduplicate results, and use LLMs to produce citation-backed, savable briefings — replacing slow, error-prone manual searches.

Technical Features ¶

Parallel multi-source retrieval: Covers Reddit, X, Bluesky, YouTube transcripts, TikTok, Instagram, Hacker News, Polymarket, and web sources, reducing total search time and increasing coverage.
Composite scoring pipeline: Uses bidirectional similarity, synonym expansion, engagement velocity normalization, source weighting, cross-source convergence detection, and time decay to balance textual relevance with community dynamics.
Polymarket specialized ranking: Treats monetary bets as a primary signal and ranks markets by a 5-factor weighted score.
Auditable output & persistence: Auto-saves briefings to Markdown (~/Documents/Last30Days/*.md) and supports local SQLite watchlists for later retrieval.

Usage Recommendations ¶

Use --quick for exploration, then run deep mode for full reports to save time.
Centralize credentials (SCRAPECREATORS_API_KEY, BSKY_APP_PASSWORD, X cookies) in .claude/last30days.env or ~/.config and restrict permissions (chmod 600).
Validate scraping backends regularly (ScrapeCreators, bird-search, xAI) to reduce blind spots.

Important Notice: The quality of outputs depends heavily on the captured public data; if a platform fails or is inaccessible, conclusions may be sparse or biased.

Summary: For producing cross-platform, citation-backed research within a 30-day window, this project materially reduces time-to-insight and improves auditability.

90.0%

For a typical researcher or product manager, what is the onboarding cost? What common issues arise and what are best practices?

Core Analysis ¶

Core Question: Assess onboarding difficulty, common pitfalls, and practical tips to help non-engineering users decide whether to adopt or how to deploy the tool.

Technical Analysis (Onboarding Cost)¶

Moderately high learning curve: Multiple environment variables (SCRAPECREATORS_API_KEY, X cookies, BSKY_APP_PASSWORD), Node.js/Python environment, and optional LLM plugin installs are required.
Interaction latency: Deep runs typically take 2–8 minutes, reducing real-time interactivity.
Credential & stability issues: X cookies can expire or lack permissions, causing fallback to inferior sources; third-party scrapers (ScrapeCreators) are a point of fragility.

Common Issues & Best Practices ¶

Common issues:
Auth failures (X cookie / app-password)
Scraper/backend changes causing missing sources
Noise/false positives and context misinterpretation
Best practices:
1. Centralize credentials in .claude/last30days.env or ~/.config and set chmod 600.
2. Explore with --quick, then run full depth once topics are narrowed.
3. Regularly validate scrapers (ScrapeCreators test, bird-search whoami health checks).
4. Use watchlists and scheduled runs to build a local SQLite-backed repository for trend analysis.

Important Notice: If you lack engineering/ops support, consider enabling a smaller set of manageable sources (e.g., ScrapeCreators covering Reddit/TikTok/Instagram) to reduce setup complexity.

Summary: Onboarding is straightforward for users with basic engineering skills and offers high ROI; non-engineering users should rely on simplified configs or team support.

88.0%

How can this tool be integrated into automated monitoring (watchlist/CI/cron) workflows? What configurations and precautions are needed?

Core Analysis ¶

Core Question: What steps, configurations, and risk controls are needed to reliably integrate the tool into automated monitoring (watchlist/CI/cron) workflows?

Technical Analysis (Integration Essentials)¶

Credential management: Store SCRAPECREATORS_API_KEY, X cookie, BSKY_APP_PASSWORD in CI secrets and write them into ~/.claude/last30days.env or project .claude/last30days.env during jobs.
SessionStart validation: Use the SessionStart config check to verify config completeness before runs to avoid silent failures.
Scheduling & resources: Schedule deep runs during off-peak hours and limit parallelism to control bandwidth and CPU usage.
Persistence & aggregation: Auto-saved Markdown and SQLite can be uploaded to central storage (S3, artifact store) for audit and longitudinal analysis.

Implementation Steps (example)¶

Store credentials in CI secrets and write to .claude/last30days.env with chmod 600 at job start.
Run last30days --watchlist [topic] or custom scripts that alternate --quick and full-depth runs.
After job, upload generated Markdown and SQLite to shared storage and trigger alerts for significant diffs (new high-scoring Polymarket items, cross-source convergence).
Schedule weekly health checks for scraper modules; on failure, alert and automatically switch to fallback scraping backends.

Important Notice: Ensure strict access control to credentials in CI, set expiry reminders for credentials, and include source coverage info in reports for auditability.

Summary: Integrating the project into automation yields efficient periodic intelligence collection, provided credential security, scraper redundancy, and runtime health monitoring are implemented.

87.0%

How is the multi-source retrieval and composite scoring pipeline designed? What are its technical advantages and potential limitations?

Core Analysis ¶

Core Question: The pipeline aims to combine textual relevance with community behavior signals to surface topics that are both semantically relevant and community-endorsed. The main technical challenge is normalizing heterogeneous metrics and ensuring stability of external scrapers.

Technical Analysis ¶

Hybrid similarity: Uses trigram-token Jaccard alongside bidirectional text similarity plus synonym expansion, enabling matches at both lexical and semantic levels to reduce keyword misses.
Behavioral normalization: Engagement velocity normalization captures growth rate rather than absolute counts, allowing emerging topics to surface earlier.
Cross-source convergence: Increases confidence when similar content appears independently across sources, reducing single-source noise.
Polymarket specialized scoring: Treats trading volume, liquidity, and price movement as strong quantitative signals distinct from social engagement.

Advantages & Limitations ¶

Advantages: Balances semantics and behavior to detect trends endorsed across communities; applicable to both short-video and long-text signals; offers interpretability via factor weights and time decay.
Limitations: Metrics across platforms are not directly comparable (likes vs bets); scraping failures or 3rd-party API changes degrade score reliability; model requires periodic blind testing and tuning to avoid drift.

Practical Recommendations ¶

Blind-test scoring regularly using 5–10 known topics (the README notes 455+ tests coverage).
Layer failure notifications: If a source fails, include coverage gaps in the report for manual review.
Tune time decay for slow-burn events to prevent suppression by short spikes.

Important Notice: Composite scoring enhances signal quality but cannot fully eliminate context misinterpretations — human review remains necessary.

Summary: The composite scoring pipeline is the project’s core strength, improving cross-source trend detection accuracy, but it depends on robust scraping and ongoing calibration.

86.0%

What architectural advantages does the project have? What are the trade-offs in scalability and maintainability?

Core Analysis ¶

Core Question: Evaluate whether the architecture supports long-term maintenance, adding new sources, and integration into automation while considering deployment complexity and resource cost.

Architectural Strengths ¶

Modular multi-source retrieval: Each data source is a pluggable module, so adding or swapping backends (e.g., replacing ScrapeCreators with native APIs) doesn’t force changes to scoring or synthesis layers.
Unified scoring & dedupe layer: Standardizes heterogeneous inputs into comparable outputs with explainable factor weights.
Local-first & pluggable auth: Supports ~/.config and per-project .claude/last30days.env, enabling controlled deployments and auditability for CI environments.
Persistence strategy: Auto Markdown archiving and SQLite watchlists support building a longitudinal research library.

Key Trade-offs ¶

Deployment complexity vs flexibility: Pluggable auth and multiple backends increase flexibility but also initial setup and credential management burden (cookies, API keys, app passwords).
Resource usage from concurrent retrievals: Parallelism speeds up runs but demands bandwidth and compute; a deep run can take 2–8 minutes, not ideal for constrained environments.
Maintenance of external dependencies: Reliance on 3rd-party scrapers and platform APIs requires ongoing monitoring and updates.

Practical Recommendations ¶

Module-level health checks (bird-search whoami, ScrapeCreators key tests) for each scraper.
Staged deployment: Use --quick mode for development; schedule deep runs in CI during off-peak windows.
Credential hygiene: Store credentials in project-level .env with restricted permissions and change logs for audit.

Important Notice: The architecture favors extensibility and auditability but presumes engineering capability to manage credentials and external dependencies.

Summary: The architecture excels in extensibility and auditability, suitable for teams that can handle the operational overhead.

86.0%

How do third-party scraping dependencies (e.g., ScrapeCreators, vendored Bird client) affect result reliability, and what mitigation strategies exist?

Core Analysis ¶

Core Question: Assess how third-party scraping dependencies affect output integrity and reliability, and provide actionable mitigation steps.

Impact Analysis ¶

Single-point failure risk: Relying on services like ScrapeCreators or vendored Bird clients can cause simultaneous loss of multiple platform scrapes if those services break or change.
Reduced consistency & reproducibility: Changes in third-party services can alter data distributions over time, affecting cross-time comparisons.
Auth-related degradation: For example, X requires cookie tokens; expired credentials can cause silent fallback or degraded search results, skewing conclusions.

Mitigation Strategies (Actionable)¶

Multi-backend fallbacks: Configure primary (ScrapeCreators) and fallback (bird-search / xAI / native web-scrape) for key platforms to auto-degrade gracefully.
Module health checks: Periodically validate scraper modules (key tests, sample queries) and log failure states to Surface coverage gaps in reports.
Coverage disclosure: Explicitly state which sources were successfully retrieved and which were degraded in each briefing for auditability.
Regression & blind testing: Run known-topic tests periodically to detect scraping quality or scoring drift (README’s 455+ tests is a good precedent).
Credential expiry alerts: Track credential lifetimes (cookies, app-passwords) and alert before expiry to prevent silent failures.

Important Notice: Even with mitigations, scraping stability is subject to platform policy changes. For high-stakes decisions, supplement with official data or secondary verification.

Summary: Third-party scrapers speed development and coverage but require fallback strategies, health checks, and transparency to preserve result reliability.

86.0%

✨ Highlights

Aggregates 30-day signals from Reddit, X, YouTube and more
Supports parallel retrieval, comparative mode, and auto-saving outputs
Requires multiple third-party API keys and cookies; configuration overhead is non-trivial
Repository metadata shows no commits/no contributors and license is unknown — maintenance and compliance are uncertain

🔧 Engineering

Performs parallel retrieval across up to 10 signal sources and synthesizes cited briefings using multi-factor scoring and deduplication
Provides comparative mode, per-project .env config, and automatic run-level saving to a local document library

⚠️ Risks

Heavily reliant on third-party scraping services and site authentication (API keys, cookies); interface or policy changes may break functionality
Repository snapshot shows no active commits, zero contributors, and an unknown license — posing long-term maintenance and legal compliance risks
Execution can be slow (2–8 minutes), making it less suitable for real-time use cases or high-concurrency workflows

👥 For who?

Prompt engineers, AI researchers, and product/social-media analysts who are comfortable managing API keys and CLI tools
Suitable for technical users and small research teams needing short-term trend monitoring, prompt research, or quick topic briefings