Depix: PoC for recovering plaintext from pixelated screenshots using linear box filter

Depix is a PoC that uses linear box-filtering and De Bruijn search images to recover plaintext from pixelated screenshots, intended for security validation and research.

GitHub spipm/Depixelization_poc Updated 2025-10-29 Branch main Stars 3.6K Forks 274

Image Processing Depixelization Security Research Python Scripts PoC De Bruijn Search Image

💡 Deep Analysis

What specific problem does Depix solve, and how does it engineer a recovery of plaintext from pixelized screenshots?

Core Analysis ¶

Project Positioning: Depix addresses whether plaintext can be recovered when sensitive text is pixelized using a linear box filter. It implements a reproducible PoC that generates a search image covering character combinations and performs block-level comparisons to recover plaintext.

Technical Analysis ¶

Block-independence assumption: A linear box filter averages each block independently, reducing block content to an average color; this enables per-block matching.
Search-image strategy (De Bruijn): Using a De Bruijn sequence covers all character combinations in minimal rendered space, reducing rendering and matching workload.
Pixel-level comparison and geometric consistency propagation: Single-match blocks are used as anchors; matches are propagated to neighbors to disambiguate multi-match blocks.

Practical Recommendations ¶

Prepare environment: Render the search image in the same editor/font/size/color and display settings as the target screenshot.
Parameter selection: Choose --averagetype (gamma vs linear) and --backgroundcolor to match the pixelizer behavior and improve accuracy.
Verification: Use tool_show_boxes.py to confirm block detection and adjust cropping as needed.

Important Notice: The technique relies on strict assumptions (linear averaging, no compression, pixel alignment). If these are violated, recovery success falls sharply.

Summary: Depix is an engineering PoC for reverse-engineering pixelized text under linear box-filter assumptions — reproducible and explainable when its preconditions hold.

90.0%

How to construct an effective search image to maximize Depix's recovery capability, and why use a De Bruijn sequence?

Core Analysis ¶

Problem Core: The search image is Depix’s dictionary — it needs maximal consistency in rendering and character coverage with the target image so block-level comparisons can hit the target block averages.

Why use a De Bruijn sequence ¶

Optimal coverage: A De Bruijn sequence covers all fixed-length substrings over a character set with minimal length, ensuring any local character combination appears.
Engineering efficiency: It reduces the amount of rendering and screenshots compared to listing every character or combination, lowering matching cost.

Steps and Key Points to Build a Search Image ¶

Define the character set: Limit to expected characters (lower/upper/digits/symbols) to reduce space.
Match rendering environment: Paste the De Bruijn sequence in the same editor/system/font/size/color as the target and screenshot (including antialiasing/subpixel settings).
Crop precisely: Ensure the screenshot contains full character rows and correct pixel alignment; avoid extra background noise.
Match color averaging: Select --averagetype (gamma vs linear) according to the pixelizer; use --backgroundcolor if editor background must be filtered.

Important Notice: If it’s impossible to perfectly match rendering, generate multiple candidate search images (varying antialiasing/size/micro-offsets) and test them in parallel.

Summary: A correctly built search image (De Bruijn rendered in a matching environment) is decisive for Depix — it determines whether block matches will produce comparable pixel averages.

90.0%

Why does the project choose a De Bruijn sequence plus block-level matching approach instead of ML/black-box methods, and what architectural advantages does this selection bring?

Core Analysis ¶

Positioning and Rationale: Depix uses a deterministic De Bruijn + block-level comparison method rather than ML because pixelization here (linear box filter) makes the problem amenable to exhaustive, exact comparisons. The choice favors explainability, low engineering overhead, and reproducibility.

Technical Characteristics and Architectural Advantages ¶

Explainability and Traceability: Each recovered character stems from a clear block match that can be audited, unlike black-box ML outputs.
Low dependency and modularity: Script-based implementation (depix.py etc.) is easy to read, reproduce, and debug; the toolchain allows validating pixelization, block detection and recovery separately.
Efficiency and minimal search space: De Bruijn sequences cover character combinations with minimal rendering, reducing compute and removing the need for training data.

Practical Recommendations ¶

Use as a baseline: Run Depix first when linear average pixelization is suspected for fast, auditable verification.
Hybridize if needed: If Depix fails due to compression/antialiasing, consider ML/statistical methods as a fallback.

Important Notice: Deterministic methods are extremely sensitive to environment matching (font/rendering/alignment). Where those priors are missing, ML may be more robust but less interpretable.

Summary: The choice leverages problem structure for an efficient, reproducible baseline suitable for privacy assessment and forensic reproduction.

88.0%

What common failure modes occur when using Depix in practice, and how can concrete operations and parameter tuning increase success rates?

Core Analysis ¶

Problem Core: Depix’s success depends heavily on consistency between input and search images in rendering/alignment/averaging. Common failure modes and tuning steps are below.

Common Failure Modes and Technical Recommendations ¶

Block detection or cropping inaccuracies: Misaligned boundaries break matches.
Recommendation: Use tool_show_boxes.py for visualization; manually adjust cropping or implement fixed-size cropping.
Color averaging mismatch (gamma vs linear): Different pixelizers average in different color spaces.
Recommendation: Try --averagetype linear or the default gamma mode; compare outputs to select the right one.
Font/size/rendering differences (antialiasing/subpixel): If the search image’s renderer doesn’t match the target, block colors differ.
Recommendation: Render De Bruijn in the same editor/system when possible; if unknown, create multiple candidate search images and try them in parallel.
Lossy compression or resampling: JPEG and resampling ruin precise color values.
Recommendation: Avoid lossy inputs; if unavoidable, apply mild denoising/reverse-compression steps or relax exact color thresholds.

Operational Steps (priority)¶

Validate block detection: tool_show_boxes.py.
Prepare search image: Render De Bruijn with matching font/size/color.
Try different averagetype and backgroundcolor.
If still failing, batch-generate multiple search images or move to ML/statistical fallbacks.

Important Notice: If the image underwent complex post-processing (antialiasing, compression, blur), Depix’s deterministic approach may fail and more robust alternatives should be considered.

Summary: Systematic visualization and parameter tuning are key to improving Depix success; preprocessing and multi-candidate strategies materially improve real-world results.

86.0%

How are Depix's performance and scalability, and how should one optimize runtime efficiency for large batches or high-resolution screenshots?

Core Analysis ¶

Problem Core: Depix’s computation grows roughly with the number of target blocks (B) times search-image blocks (S); naive block-by-block comparisons become a bottleneck for large or high-resolution inputs.

Performance Bottlenecks ¶

Main cost: Pixelizing each search-image block and comparing it to every target block (≈ O(B * S)).
Memory/IO: Large caches of search blocks may consume memory and disk IO.

Optimization Strategies (Practical Recommendations)¶

Reduce search space (S): Limit the character set, use shorter De Bruijn sequences, or test by character classes in batches.
Process only ROIs (reduce B): Pre-detect/crop text regions and run Depix only on those blocks.
Precompute and cache: Pixelize and cache all search-image blocks in advance to avoid recomputation.
Vectorize and parallelize: Use NumPy for batch comparisons and multi-process/thread parallelism per image or block group.
Resolution and approximation: Run at block-level resolution first (compare average colors or center pixels), then refine verified matches.

Important Notice: Speed-up approximations (e.g., comparing only averages) may increase false matches; in forensic contexts keep auditable logs and intermediate data.

Summary: By combining search-space reduction, caching, vectorized comparisons and parallelism, Depix can scale to moderate batch sizes; extremely large S or complex pixelization still forces a tradeoff between accuracy and resource usage.

85.0%

✨ Highlights

Can recover pixelated text under specific conditions
Simple implementation provided as runnable Python scripts
Depends on exact font, pixel alignment and a suitable search image
Highly sensitive to compression or sub-pixel rendering; results can fail

🔧 Engineering

Recovers character blocks by matching linear box-filtered blocks against De Bruijn search images
Includes utility scripts to cut, show and generate pixelated images for experimentation

⚠️ Risks

Requires known or reproducible font and display settings; non-trivial setup cost
Not robust to image compression, color perturbation, or sub-pixel positioning; applicability is limited
Poses ethical and legal risks (privacy breaches, misuse); assess legal responsibility before use

👥 For who?

Security researchers and forensic analysts for demonstrating pixelation weaknesses or reproducing experiments
Developers and educators can use it for teaching or PoC validation, with attention to compliance and data sensitivity