Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

radioastronomyio/logmask-python-library

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ”’ logmask

Python License Docs Status

logmask banner

Deterministic, offline, map-based anonymization of IT infrastructure data in text files.

logmask is a Python CLI tool designed for MSP operational security. When engineers need to paste logs, configs, or transcripts into external tools β€” Claude, vendor support portals, community forums β€” they need to strip infrastructure identifiers first. logmask scans files for those identifiers, builds a persistent translation map, and performs single-pass replacement using an Aho-Corasick automaton. The tool is bidirectional: anonymize out, reveal back.


πŸ”­ Overview

The Problem

MSP engineers routinely share diagnostic data with third parties. Every log file, every config export, every support ticket risks leaking internal infrastructure topology β€” IP ranges, hostnames, domain names, user principal names, Active Directory SIDs. Manual redaction is slow, inconsistent, and error-prone.

The Approach

logmask treats anonymization as a deterministic mapping problem rather than a search-and-destroy exercise. A persistent CSV map links each real identifier to a fake replacement. The same input with the same map always produces byte-identical output. Maps are human-readable and auditable β€” engineers can open them in Excel, hand-edit entries, and share them across teams.

The replacement engine uses an Aho-Corasick automaton for single-pass, longest-match-wins substitution. This means overlapping identifiers (a hostname embedded in a FQDN, a subnet containing individual IPs) are handled correctly without multiple passes or ordering dependencies.

Critical Constraints

Constraint Detail
No build toolchain on endpoints All dependencies install via pip install from pre-built wheels. No C/Rust compilation. No admin elevation.
Windows-first Primary target: Entra-joined Windows 10/11 endpoints. Works in standard user context.
Offline execution Zero network calls at runtime. No cloud APIs, no telemetry, no update checks.
Deterministic Same input + same map = byte-identical output. Every time.
Human-readable maps CSV format. Engineers can open, audit, and hand-edit maps in Excel/Notepad.

πŸ“Š Project Status

Milestone Status Description
Core build βœ… Complete All modules implemented, parsers working
Unit tests βœ… Complete Parsers, map engine, replacer, roundtrip
Post-review fixes βœ… Complete Bug documentation, code review applied
Known bug fixes ⬜ Planned Replacer state corruption, IPv4 octet validation, dead code cleanup
Scanner/CLI tests ⬜ Planned Unit test coverage for scanner.py and cli.py
Real-world validation ⬜ Planned Testing against production log corpus
PyPI release ⬜ Future Package and publish

Current Capabilities (v0.1.0)

The tool processes text files with comprehensive identifier detection across eight pattern types. The core replacement engine is correct and deterministic β€” known issues are documented in AGENTS.md and marked inline in source.


🎯 Identifier Types

Type Pattern Target Example
ipv4 RFC1918 private IPs 10.0.1.50, 192.168.100.10
cidr Subnet notation 192.168.1.0/24, 10.0.0.0/16
hostname NetBIOS and FQDNs SQL-PROD-03, server.contoso.local
upn User Principal Names jsmith@contoso.com
guid Entra object IDs, Azure GUIDs a1b2c3d4-e5f6-7890-abcd-ef1234567890
sid Windows Security Identifiers S-1-5-21-123-456-789-1001
mac MAC addresses AA:BB:CC:11:22:33, 11-22-33-44-55-66
unc UNC paths \\\\FILESVR\\Finance$

πŸ—οΈ Architecture

Five modules, no framework. Parsers are internal callables in a dictionary registry.

alt text

Components

Component Module Purpose
CLI cli.py argparse β€” 6 commands (init, scan, anonymize, reveal, map show, map add)
Scanner scanner.py Discovery engine β€” runs parsers, deduplicates, filters collisions
Parsers parsers/ Registry of detection functions, one per identifier type
Map Engine map_engine.py CSV map CRUD, scope merge (global + project), fake value generation
Replacer replacer.py Aho-Corasick automaton build + single-pass replace + reveal
Models models.py Frozen dataclasses β€” DetectedIdentifier, MapEntry, Config

Map Architecture

Translation maps are CSV files with two scope levels that merge at runtime:

Scope Location Purpose
Global %USERPROFILE%\.logmask\global_map.csv MSP-wide constants (jump servers, monitoring hosts, corporate domain)
Project ./.logmask/project_map.csv Client-specific identifiers for this diagnostic bundle

Project map overrides global map on original_value key collision. Merge happens at runtime load, never mutates either source file.


πŸ“ Repository Structure

logmask/
β”œβ”€β”€ πŸ“‚ assets/                      # Repository images
β”œβ”€β”€ πŸ“‚ docs/                        # Documentation
β”‚   └── logmask-buidl-spec-v1.md    # Authoritative build specification
β”œβ”€β”€ πŸ“‚ src/logmask/                 # Source (PEP 621 src layout)
β”‚   β”œβ”€β”€ cli.py                      # argparse CLI β€” 6 commands
β”‚   β”œβ”€β”€ scanner.py                  # Discovery engine
β”‚   β”œβ”€β”€ map_engine.py               # CSV map CRUD, scope merge, fake generation
β”‚   β”œβ”€β”€ replacer.py                 # Aho-Corasick automaton + single-pass replace
β”‚   β”œβ”€β”€ models.py                   # Frozen dataclasses (data contracts)
β”‚   └── πŸ“‚ parsers/                 # Detection registry
β”‚       β”œβ”€β”€ ipv4.py                 # RFC1918 private IPs
β”‚       β”œβ”€β”€ cidr.py                 # Subnet/CIDR notation
β”‚       β”œβ”€β”€ hostname.py             # NetBIOS + FQDN (structural heuristics)
β”‚       β”œβ”€β”€ identity.py             # UPNs, Entra GUIDs, Windows SIDs
β”‚       └── network.py              # MAC addresses, UNC paths
β”œβ”€β”€ πŸ“‚ tests/                       # Unit tests
β”‚   β”œβ”€β”€ conftest.py                 # Synthetic log fixtures
β”‚   β”œβ”€β”€ test_parsers.py
β”‚   β”œβ”€β”€ test_map_engine.py
β”‚   β”œβ”€β”€ test_replacer.py
β”‚   └── test_roundtrip.py           # Anonymize β†’ reveal β†’ hash compare
β”œβ”€β”€ AGENTS.md                       # Agent context (KiloCode, Claude Code)
β”œβ”€β”€ CLAUDE.md                       # Claude Code context
β”œβ”€β”€ CODE_OF_CONDUCT.md
β”œβ”€β”€ CONTRIBUTING.md
β”œβ”€β”€ LICENSE                         # MIT (code)
β”œβ”€β”€ LICENSE-DATA                    # CC BY 4.0 (documentation, data)
β”œβ”€β”€ SECURITY.md
β”œβ”€β”€ pyproject.toml                  # PEP 621 project config
└── README.md

πŸš€ Getting Started

Prerequisites

  • Python 3.10 or higher
  • pip (no admin elevation required)
  • Windows 10/11 (primary target) or any OS with Python 3.10+

Installation

# Clone the repository
git clone https://github.com/radioastronomyio/logmask.git
cd logmask

# Install in development mode
pip install -e ".[dev]"

All dependencies have pre-built Windows wheels on PyPI β€” no compiler toolchain required:

Package Purpose
pyahocorasick >= 2.3.0 Aho-Corasick automaton (C extension)
pandas CSV map load/merge/write
rich Terminal table output

Quick Start

# Initialize a project (creates .logmask/ with empty map)
logmask init --client "Acme Corp"

# Scan files for infrastructure identifiers
logmask scan ./logs --ext .log .txt .json

# Anonymize β€” replace real values with fakes
logmask anonymize ./logs --out ./anonymized_logs

# Reveal β€” reverse the anonymization
logmask reveal ./anonymized_logs --out ./revealed_logs

# Inspect the translation map
logmask map show --scope project

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src/logmask

# Run specific test file
pytest tests/test_parsers.py

πŸ“„ License

Code is licensed under the MIT License β€” see LICENSE for details.

Documentation and non-code content is licensed under CC BY 4.0 β€” see LICENSE-DATA for details.


πŸ™ Acknowledgments

  • pyahocorasick β€” Efficient multi-pattern matching
  • Anthropic β€” Claude and the agent ecosystem that motivated this tool

Last Updated: 2026-03-01 | v0.1.0 Alpha | Core Build Complete

About

Deterministic, offline, map-based anonymization of IT infrastructure data in text files

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

 
 
 

Contributors

Languages