converted to 1.6 markup
|Deletions are marked like this.||Additions are marked like this.|
|Line 3:||Line 3:|
|The vmtrace package can be downloaded from Marcelo's [http://hera.kernel.org/~marcelo/mm/vmtrace/ kernel.org] page.||The vmtrace package can be downloaded from Marcelo's [[http://hera.kernel.org/~marcelo/mm/vmtrace/|kernel.org]] page.|
|Line 7:||Line 7:|
|Date: Sat, 12 Nov 2005 14:27:20 -0200[[BR]]
From: Marcelo Tosatti <>[[BR]]
To: linux-mm, linux-kernel[[BR]]
Cc: Peter Zijlstra <>, Rik van Riel <>[[BR]]
Subject: vmtrace v0.0[[BR]]
|Date: Sat, 12 Nov 2005 14:27:20 -0200<<BR>>
From: Marcelo Tosatti <><<BR>>
To: linux-mm, linux-kernel<<BR>>
Cc: Peter Zijlstra <>, Rik van Riel <><<BR>>
Subject: vmtrace v0.0<<BR>>
|Line 17:||Line 17:|
|From the previous email to linux-mm:[[BR]]||From the previous email to linux-mm:<<BR>>|
Vmtrace is a kernel patch and set of utilities to analyze memory access patterns. This tool may help Linux kernel developers determine exactly what the page replacement algorithm should do, something which has not been done recently.
The vmtrace package can be downloaded from Marcelo's kernel.org page.
Date: Sat, 12 Nov 2005 14:27:20 -0200
From: Marcelo Tosatti <>
To: linux-mm, linux-kernel
Cc: Peter Zijlstra <>, Rik van Riel <>
Subject: vmtrace v0.0
I've been working on updating the vmtrace patch to current 2.6-git along with some post-processing utilities.
From the previous email to linux-mm:
"The sequence of pages which a given process or workload accesses during its lifetime, a.k.a. "reference trace", is very important information. It has been used in the past for comparison of page replacement algorithms and other optimizations. We've been talking on IRC on how to generate reference traces for memory accesses, and a suggestion came up to periodically unmap all present pte's of a given process. The following patch implements a "kptecleaner" thread which is woken at a certain interval (hardcoded to HZ/2 at present)."
At the moment the most interesting tool is average IRF calculation, which can be used to identify workloads in which LRU policy underperforms due to lack of frequency information.
The tarball can be found at
Thanks to Peter Zijlstra and Rik van Riel for comments and incentive.
* Capture daemon * vmtrace-capture.c: Captures data from relayfs channel and writes it into a file. To be used at workload execution. * Post processing tools * vmtrace-print.c: Example to iterate over all trace entries. vmtrace-reorder.c: Makes sure the trace is ordered by sequence number. vmtrace-split.c: Splits a single vmtrace entry into per-mapping entries. vmtrace-irp.c: Calculates Inter Reference Period between accesses to a mapping's pages. This data is used to calculate per-page "average IRF" (Inter Reference Frequency), as follows: page-avg-IRF = (sum d(i, i+1)) <i=1...i=nr_accesses> --------------- nr_accesses where d(i, i+1) is the inverse of the delta between the current access and the next access to the page. and average IRF of the entire mapping: mapping-avg-IRF = sum (i's avg.irf) <i=1...i=nr_pages> ----------------- nr_pages vmtrace-relation.c: Calculates the numerical relation between accesses to two different mappings. This is an attempt to estimate how interleaved the accesses are.
Example of parsing mdb randomic query bench trace (test explained in more detail at http://www.linux-mm.org/PageReplacementTesting): # file captured with "vmtrace-capture": $ ls -la /tmp/vmt.txt -rwxrwxrwT 1 root root 1338372 2005-12-12 18:29 /tmp/vmt.txt # split it into per-mapping traces $ ./vmtrace-split /tmp/vmt.txt /tmp/mdb-rand/ # format is <inode_nr,bdev> $ ls /tmp/mdb-rand/ 0-0 65c72-0 7ec2-0 aa54c-0 d9fd7-0 65c71-0 65c75-0 aa527-0 aaaf9-0 e5d01-0 $ for i in /tmp/mdb-rand/* ; do ./vmtrace-irp $i ; done | grep IRF /tmp/mdb-rand/0-0: avg. IRF of all reaccessed pages(33): 12.949683 /tmp/mdb-rand/65c71-0: avg. IRF of all reaccessed pages(1808): 0.011185 /tmp/mdb-rand/65c72-0: avg. IRF of all reaccessed pages(176): 0.092401 /tmp/mdb-rand/65c75-0: avg. IRF of all reaccessed pages(16): 1.171296 /tmp/mdb-rand/7ec2-0: avg. IRF of all reaccessed pages(1): 0.000079 /tmp/mdb-rand/aa527-0: avg. IRF of all reaccessed pages(65): 0.038615 /tmp/mdb-rand/e5d01-0: avg. IRF of all reaccessed pages(6): 0.878549 The most interesting numbers here are: 65c71 is largedb.dat (7440358 bytes) 65c72 is largedb.idx (720896 bytes) /tmp/mdb-rand/65c71-0: avg. IRF of all reaccessed pages(1808): 0.011185 /tmp/mdb-rand/65c72-0: avg. IRF of all reaccessed pages(176): 0.092401 Which means that the index file is accessed about 9 times more frequently than the data file itself. The text mapping of the "mdb" binary (at address 0) is much more frequently accessed than both database files: /tmp/mdb-rand/0-0: avg. IRF of all reaccessed pages(33): 12.949683
- Create a Makefile!
- Add versioning to "struct vm_trace_entry"!
- Use dcache cookies to transform inode+bdev into path (much easier to work with). At the moment, if files get deleted during the workload you're doomed, having no way to identify deleted inode numbers.
- vmtrace-split currently splits in per "inode+bdev" type only, should work on uid, pid, pwd, etc, etc.
- vmtrace-split opens an unlimited amount of files, it should cap at 1024 open fd's and reuse those on demand.
- vmtrace-phase: recognize and classify common access patterns.
- Generate visual information such as avg.IRF histograms, address versus time graphs, etc, etc.