A streamlined CLI tool for profiling Tenstorrent's TT-Metal tests and extracting device kernel performance metrics
- Automated Profiling: Seamlessly runs Tenstorrent's TT-Metal profiler with pytest
- CSV Analysis: Automatically extracts and parses performance CSV files
- Real-time Output: Shows profiling progress in real-time
- Performance Metrics: Calculates total DEVICE KERNEL DURATION
- Simple CLI: Easy-to-use command-line interface
- Flexible: Supports named profiles and various test paths
- Operation-based Profiling: Profile specific operations by name (e.g.,
ttperf add) - Dynamic Configuration: Customize tensor shape, dtype, and layout for operations
- Config File Support: Set defaults via
~/.ttperf.yamlor./.ttperf.yaml - CI-friendly:
--quietflag suppresses decorative output;--verboseenables debug logging
# Install from PyPI (recommended)
pip install ttperf
# With YAML config file support
pip install "ttperf[yaml]"Or install from source:
git clone https://github.com/Aswincloud/ttperf.git
cd ttperf
pip install -e .ttperf automatically searches for your TT-Metal installation using a simple two-step process:
# Option 1: Set PYTHONPATH to your tt-metal location
export PYTHONPATH=/path/to/your/tt-metal
ttperf add
# Option 2: Run from within tt-metal directory (or any subdirectory)
cd /path/to/your/tt-metal
ttperf relutt-metal Path Search Order:
PYTHONPATHenvironment variable (if specified)- Current working directory (walks up directory tree to find tt-metal root)
# Run profiling on a specific test
ttperf test_performance.py
# Run with a custom profile name
ttperf my_profile pytest test_performance.py
# Run on a specific test method
ttperf tests/test_ops.py::test_matmul
# Profile specific operations by name
ttperf add
ttperf relu
ttperf matmul
# Custom tensor configuration
ttperf add --shape 1,1,32,32 --dtype bfloat16 --layout tile
ttperf relu --shape 1,1,64,64 --dtype float32 --layout row_major
# Memory options
ttperf add --dram # Use DRAM memory (default)
ttperf relu --l1 # Use L1 memory
# CI-friendly (no emoji/decorative output)
ttperf --quiet add
# Copy CSV output to a directory
ttperf add --output-dir ./results/
# Enable verbose debug logging
ttperf --verbose addttperf [OPTIONS] [PROFILE_NAME] [pytest] <test_path_or_operation>
Options:
--version Show version information
--help, -h Show this help message
--list-ops, -l List all supported operations
--debug, -d Show real-time profiler output
--verbose, -v Enable verbose logging (debug messages)
--quiet, -q Suppress decorative/emoji output (for CI)
--shape SHAPE Tensor shape, e.g. 1,1,32,32 (default: 1,1,32,32)
--dtype DTYPE Data type: bfloat16/bf16, float32/fp32/f32, int32/i32 (default: bfloat16)
--layout LAYOUT Memory layout: tile, row_major/rm (default: tile)
--memory-config Memory configuration: dram, l1 (default: dram)
--dram Use DRAM memory (default)
--l1 Use L1 memory
--output-dir DIR Copy generated CSV to this directory
Create ~/.ttperf.yaml (global) or ./.ttperf.yaml (project-local) to set defaults:
# ~/.ttperf.yaml
shape: 1,1,32,32
dtype: bfloat16
layout: tile
memory_config: dram
output_dir: ./resultsCLI flags always override config file values.
ttperf test_conv.pyttperf conv_benchmark pytest test_conv.pyttperf tests/ops/test_matmul.py::test_basic_matmul# Basic operations
ttperf add
ttperf subtract
ttperf multiply
# Activation functions
ttperf relu
ttperf sigmoid
ttperf tanh
ttperf gelu
# Mathematical operations
ttperf sqrt
ttperf exp
ttperf log
# Comparison operations
ttperf gt
ttperf lt
ttperf eq
# Reduction operations
ttperf max
ttperf min
ttperf sum
# Backward operations
ttperf add_bw
ttperf relu_bwttperf add --shape 1,1,32,32
ttperf relu --shape 2,3,64,128
ttperf add --dtype float32
ttperf add --layout row_major
ttperf add --shape 1,1,64,64 --dtype float32 --layout row_major
ttperf add --dram --shape 1,1,128,128
ttperf relu --l1 --dtype float32ttperf --list-ops
# or
ttperf -lAuto-generated profile name: temp_test_add
Running test...
============================================================
TEST SUMMARY
============================================================
Test: add
Status: PASSED
Configuration: shape=(1, 1, 32, 32), dtype=bfloat16, layout=tile, memory_config=dram (custom)
CSV Path: /path/to/profile_results.csv
DEVICE KERNEL DURATION [ns] total: 1234567.89 ns
============================================================
- Command Parsing: Analyzes input arguments to determine profile name and test path/operation
- Config Loading: Reads
~/.ttperf.yamlor./.ttperf.yamlfor defaults (CLI flags take priority) - Operation Detection: If an operation name is provided, maps it to the corresponding test method
- Dynamic Configuration: If custom configuration is provided, sets environment variables for the test
- Profile Execution: Runs the Tenstorrent's TT-Metal profiler with the specified test
- Output Monitoring: Streams profiling output in real-time (with
--debug) - CSV Extraction: Parses the output to find the generated CSV file path, verifies it exists
- Performance Analysis: Reads the CSV and calculates total device kernel duration
- Output Copy: Optionally copies the CSV to
--output-dirif specified
The tool extracts the following key metrics:
- DEVICE KERNEL DURATION [ns]: Total time spent in device kernels
- CSV Path: Location of the detailed profiling results
- Real-time Progress: Live output during profiling (with
--debug)
- Format: Comma-separated integers (e.g.,
1,1,32,32) - Default:
1,1,32,32 - Example:
--shape 2,3,64,128
- Valid Options:
bfloat16(orbf16),float32(orfp32/f32),int32(ori32) - Default:
bfloat16 - Example:
--dtype float32
- Valid Options:
tile,row_major(orrm) - Default:
tile - Example:
--layout row_major
- Python 3.8+
- pandas
- Tenstorrent's TT-Metal development environment
- pytest
- PyYAML (optional, for config file support)
ttperf/
βββ ttperf/
β βββ __init__.py
β βββ ttperf.py # Main CLI implementation
β βββ data/
β βββ operation_configs.json
β βββ test_eltwise_operations.py
βββ tests/
β βββ test_ttperf.py # Unit tests
βββ pyproject.toml
βββ README.md
βββ .gitignore
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is an independent utility that interfaces with Tenstorrent's TT-Metal profiling tools. It is not affiliated with or endorsed by Tenstorrent Inc. The tool serves as a convenience wrapper around existing TT-Metal profiling infrastructure.
If you encounter any issues, please create an issue on GitHub.
Aswin Z
- GitHub: @Aswincloud
- Portfolio: aswincloud.com
- Tenstorrent's TT-Metal development team for the profiling tools
- Python community for excellent libraries like pandas