[go: up one dir, main page]

Skip to content

Cycle-level, trace-driven, parallel GPU simulator for NVIDIA Pascal.

License

Notifications You must be signed in to change notification settings

romnn/microgpusim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

box

Prerequisites

Install the latest CUDA 11 toolkit.

wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
# this will not attempt to also install the CUDA driver
sudo sh cuda_11.8.0_520.61.05_linux.run --toolkit --silent --override

Building

cargo build --release --workspace --all-targets
cargo build -p trace --release # single package

Note: To speed up build across workspaces, we strongly recommend to use sccache. Instructions for installing and setting up sccache can be found here.

Trace an application

# using our box memory tracer
LD_PRELOAD=./target/release/libtrace.so <executable> [args]
LD_PRELOAD=./target/release/libtrace.so ./test-apps/vectoradd/vectoradd 100 32

# using the accelsim tracer
./target/release/accelsim-trace ./test-apps/vectoradd/vectoradd 100 32

See the accelsim instructions.

Profile an application

cargo build --release --workspace --all-targets
sudo ./target/release/profile <executable> [args]
sudo ./target/release/validate ./test-apps/simple_matrixmul/matrixmul 32 32

./accelsim/gtx1080/accelsim_mem_debug_trace.txt

Run simulation

cargo run -- --path test-apps/vectoradd/traces/vectoradd-100-32-trace/

Python package

python setup.py develop --force

Testing

cargo test --workspace -- --test-threads=1

Performance profiling

First, configure permissions for running perf on linux.

Check this on how to setup flamegraphs. Here is a TLDR for x86 linux:

sudo apt install linux-tools-common linux-tools-generic linux-tools-$(uname -r)
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
cargo install flamegraph
cargo flamegraph --bin=gpucachesim -- --path ./results/vectorAdd/vectorAdd-10000-32/trace
cargo install cargo-criterion
cargo criterion -- vectoradd
valgrind --tool=drd --exclusive-threshold=10 ./target/release/gpucachesim --parallel --non-deterministic 2 ./results/vectorAdd/vectorAdd-dtype-32-length-100/trace/commands.json

Coverage

# install coverage tooling
rustup component add llvm-tools-preview
cargo install grcov

# collect code coverage in tests (todo)
cargo xtask coverage

cargo xtask accelsim convert-config -c ./accelsim/gtx1080/gpgpusim.config -c ./accelsim/gtx1080/gpgpusim.trace.config -o output.config

Publishing traces (used by CI)

rclone sync ./results drive:gpucachesim

Missing features and current limitations

  • only traces and executes memory instructions and exit instructions
    • note: divergent control flow is still captured during the trace by the active thread mask
  • currently lacks write hit handlers
  • currently lacks a cycle accurate interconnect
  • currently lacks texture and constant caches (will panic on the latter instructions)

Goals

  • step 1: we want to count memory accesses to L1, L2, DRAM
  • step 2: we want to count cache hits and misses