A Distributed File Storage System Simulation with Base85 Encoding & Batch Rotation
A complete simulation of a stealthy, distributed file storage system that mimics GitHub's repository structure while implementing advanced chunking, encoding, and batch rotation mechanisms.
- Distributed Chunking: Files split into 4MB chunks and encoded with Base85
- Batch Rotation: 100 repositories organized into batches with intelligent rotation
- Stealth System: Multiple layers of obscurity to avoid detection patterns
- Real-time Preview: Image, PDF, and text file preview in browser
- Health Monitoring: Automated system checks and maintenance
- Base85 Encoding: ~25% size increase with ASCII85 compatibility
- Repository Management: 1GB max per repo, 15 active repos at any time
- Background Tasks: Automated cleanup, health checks, and batch rotation
- Metadata Backup: Automatic backup system with versioning
- Error Recovery: Graceful degradation and self-healing mechanisms
project/
├── app.py # Main Flask backend
├── static/
│ ├── index.html # Upload & management interface
│ └── viewer.html # File viewer interface
├── simulated_github_repos/ # 100 simulated repositories
│ ├── repo_000/
│ ├── repo_001/
│ └── ... (100 repos total)
├── system_metadata/ # System metadata and backups
├── temp_files/ # Temporary download files
└── logs/ # System logs
File Upload → Chunking (4MB) → Base85 Encoding →
Distribute to Repositories → Update Metadata →
Store in Multiple Repositories → Batch Rotation
# Python 3.8 or higher required
python --version
# Install dependencies
pip install flask flask-cors# Clone or download the project
git clone <repository-url>
cd drive-simulator
# Run the server
python app.py
# Access at http://localhost:5000| Parameter | Value | Description |
|---|---|---|
TOTAL_REPOS |
100 | Total simulated repositories |
ACTIVE_REPOS |
15 | Active repositories at any time |
REPO_MAX_SIZE |
1GB | Maximum size per repository |
RAW_CHUNK_SIZE |
4MB | Chunk size before encoding |
BATCH_SIZE |
5 | Repositories per batch |
MAX_PARALLEL_CHUNKS |
8 | Maximum concurrent chunk operations |
The system simulates 5 types of repositories for stealth:
- Computer Vision Dataset - Image and vision data
- Audio Processing Samples - Audio files and samples
- ML Model Weights - Machine learning model data
- Document Test Suite - Document processing data
- Benchmark Data - Performance benchmark data
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | System health check |
/api/upload |
POST | Upload file with chunking |
/api/files |
GET | List all files with pagination |
/api/file/<id> |
GET | Download complete file |
/api/file/<id>/info |
GET | Get detailed file info |
/api/file/<id>/preview |
GET | Preview file content |
/api/file/<id> |
DELETE | Delete file (soft then hard) |
| Endpoint | Method | Description |
|---|---|---|
/api/stats |
GET | Complete system statistics |
/api/batches |
GET | List all batches |
/api/repos |
GET | List all repositories |
/api/system/rotate |
POST | Manually rotate batches |
/api/system/maintenance |
POST | Run maintenance operations |
/api/search |
GET | Search files by criteria |
- Drag & Drop Upload: Intuitive file upload interface
- Real-time Progress: Chunk-by-chunk upload progress
- File Management: List, download, delete files
- System Stats: Real-time statistics dashboard
- Batch Management: View and manage repository batches
- Chunk Visualization: Visual representation of chunk distribution
- File Preview: Native preview for images, PDFs, and text
- Technical Details: Complete file metadata and encoding info
- Repository Map: See which repositories contain file chunks
- Pattern Randomization: No predictable timing or size patterns
- Credible Structure: Each repository looks like a real development project
- Natural Timing: Uploads follow realistic human patterns (9AM-6PM)
- Encoding Variants: Random choice of Base85, ASCII85, Base64
- Dynamic Rotation: Active repositories rotate based on utilization
- Random Delays: 100-800ms delays on API calls
- Natural File Names: Files renamed to look like dataset samples
- Dummy Code Files: Each repo contains Python scripts and configs
- Realistic READMEs: Complete documentation for each repository type
- Gitignore Files: Proper .gitignore files in each repo
100 Repositories → 20 Batches (5 repos/batch) → 3 Active Batches
↓
Utilization Monitoring → Rotation Trigger → Batch Swap
↓
New Active Batches → Repository Status Update
- Utilization > 80%: Batch marked for rotation
- Time-based: Automatic rotation every 30 minutes
- Manual: API endpoint for manual rotation
- Health-based: Poor health score triggers rotation
| Metric | Value |
|---|---|
| Chunk Size | 4MB → 5MB (Base85 encoded) |
| Storage Overhead | ~25% |
| Max File Size | Unlimited (chunked) |
| Theoretical Capacity | 100GB (100 repos × 1GB) |
| Operation | Limit |
|---|---|
| Upload Speed | Depends on chunk processing |
| Download Speed | Parallel chunk retrieval |
| Concurrent Operations | 8 parallel chunks |
| Memory Usage | Optimized for large files |
- Metadata Backup: Automatic backups every save
- Chunk Validation: Health checks verify chunk integrity
- Auto-Retry: Failed operations automatically retry
- Graceful Degradation: System maintains functionality during issues
- Repository Full: Automatically selects next available repo
- Chunk Corruption: Health check detects and reports issues
- Network Issues: Retry logic with exponential backoff
- Memory Limits: Stream processing for large files
# Test upload
curl -X POST -F "file=@test.jpg" http://localhost:5000/api/upload
# Test download
curl -O http://localhost:5000/api/file/<file_id>
# Test system stats
curl http://localhost:5000/api/stats- Small Files: < 4MB (single chunk)
- Medium Files: 4MB - 100MB (multiple chunks)
- Large Files: > 100MB (stress test chunking)
- Batch Rotation: Manual rotation via API
- System Recovery: Simulate failures and recovery
| Task | Frequency | Description |
|---|---|---|
| Health Check | Every 5 minutes | Repository health validation |
| Batch Rotation | Every 30 minutes | Automatic batch rotation |
| Temp Cleanup | Every hour | Clean old temporary files |
| Stats Update | Every 2 minutes | Update system statistics |
# Rotate batches manually
curl -X POST http://localhost:5000/api/system/rotate
# Run health check
curl -X POST -H "Content-Type: application/json" \
-d '{"action":"recalculate_health"}' \
http://localhost:5000/api/system/maintenance
# Cleanup temp files
curl -X POST -H "Content-Type: application/json" \
-d '{"action":"cleanup"}' \
http://localhost:5000/api/system/maintenance# Example encoding
import base64
data = b"Hello World"
encoded = base64.b85encode(data) # ASCII85/Base85
decoded = base64.b85decode(encoded)# File: repo_001/raw_samples/ab/sample_abc123.png.b85
# Dataset Sample File
# Repo: repo_001
# Batch: 0
# Generated: 2024-01-15T14:30:00.000Z
# Encoding: base85
# Original: myphoto.jpg
# Hash: abc123def456
<Base85 Encoded Data>
{
"files": {
"file_id": {
"id": "abc123",
"filename": "original.jpg",
"size": 1048576,
"chunks": [
{"repo": "repo_001", "path": "...", "index": 0},
{"repo": "repo_002", "path": "...", "index": 1}
]
}
},
"system_state": {
"total_files": 42,
"total_size": 104857600,
"active_batches": [0, 1, 2]
}
}-
Upload Fails
- Check disk space in temp_files directory - Verify repository directories exist - Check system logs for errors -
Download Fails
- Verify file ID exists - Check chunk integrity via health check - Ensure repositories are accessible -
Slow Performance
- Reduce MAX_PARALLEL_CHUNKS - Increase chunk size - Check system resources -
Memory Issues
- Large files use streaming - Reduce CACHE_SIZE in config - Monitor temp_files directory
logs/app_YYYYMMDD.log- Application logslogs/error_YYYYMMDD.log- Error logs- Check logs for detailed error information
- Compression: Add optional compression before encoding
- Encryption: Optional client-side encryption
- WebDAV Support: Mount as network drive
- CLI Interface: Command-line tool for automation
- Docker Support: Containerized deployment
- Cloud Sync: Sync with actual cloud storage
- Versioning: File version history
- Sharing: File sharing with expiration
- Performance: Faster chunk processing
- Storage: Better encoding efficiency
- Memory: Lower memory footprint
- Scalability: More repositories, larger capacity
This project is for educational and demonstration purposes. Use responsibly and in accordance with all applicable laws and terms of service.
- Base85/ASCII85: Adobe's ASCII85 encoding standard
- Flask: Lightweight Python web framework
- Vue.js: Progressive JavaScript framework
- Tailwind CSS: Utility-first CSS framework
For issues, questions, or contributions:
- Check the troubleshooting section
- Review system logs
- Open an issue with detailed information
- Include relevant logs and error messages
Version: 2.5.0
Last Updated: January 2024
Status: Production Ready
Complexity: Advanced
This comprehensive README covers:
1. **Installation & Setup** - Getting started instructions
2. **Architecture** - System design and data flow
3. **Features** - Complete feature list
4. **API Documentation** - All endpoints with examples
5. **Configuration** - All system parameters
6. **Usage Guides** - How to use the web interface
7. **Technical Details** - Implementation specifics
8. **Troubleshooting** - Common issues and solutions
9. **Future Plans** - Roadmap and enhancements
The README is professional, thorough, and suitable for both technical users and developers who want to understand or modify the system.