• Immutable Page
  • Info
  • Attachments

Bluegene hugepages

Implementing huge pages on a Blue Gene/L I/O node

Satya Popuri

Graduate student, University of Illinois at Chicago

Blue Gene/L architecture

Blue Gene/L is a supercomputer developed by IBM in partnership with Lawrence Livermore National Laboratory. The computer is organized as an array of nodes interconnected by high speed networks. Some of the nodes are designated as "compute" nodes - these are responsible for performing number crunching. some other nodes are called "I/O nodes", responsible for performing all I/O operations. Each node consists of a custom made chip that integrates two PowerPC 440 chips. Refer to http://en.wikipedia.org/wiki/Blue_Gene for more details about the architecture. Blue Gene/L ships with a modified version of Linux on the I/O nodes and an IBM proprietary kernel running on the compute nodes. This article is about implementing huge pages for the Blue Gene ION kernel.

Huge Page support in PowerPC 440

The PowerPC 440 chip is an 32-bit implementation the IBM Book E architecture (http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600682CC7). The PPC 440 MMU is always active from the start and paging is enabled right at boot time. The most striking difference I found between x86 and PowerPC is the way address translation is handled. In the x86 case you'd tell the processor where your page directory is - i.e load the physical address of the page directory into a special control register - CR3. The TLB is automatically loaded and synced by the processor. So, when the processor finds a missing page, it issues a page fault exception to the OS. In the case of PowerPC, the processor doesn't care where your page tables are. Instead, it allows you to load TLB entries directly. Here, you get a TLB fault (or a TLB miss) instead of a page fault. So when a virtual address needs to be translated and there is no TLB entry corresponding to it, you'd get a TLB miss exception thrown at you. The OS will then walk the page tables, find the translation an load it into the TLB. It may so happen that the page tables do not have a translation entry. This is a page fault - the page fault handler is called to create/load the missing page. As you can see, this call to page fault handler is a software call, and not a hardware exception.

PowerPC 440 TLB entries are 64 bits long and contain all the necessary information needed to access any given virtual address. One of the important parameters is the page size. A TLB entry reserves 4 bits to represent page size, so it can theoretically can support 16 different page sizes. But only some page sizes are supported by this processor - 1K, 4K, 16K, 64K, 256K, 1M, 16M, 256M. By loading a base virtual address, base physical address of a page frame and the size, we can have the processor translate all virtual addresses in a page.

Huge page support in the Linux Kernel

Hugetlb page support is a fairly recent addition to the linux kernel. It has a generic architecture independent layer (mm/hugetlb.c) and architecture specific implementation of a few functions used by the generic code (arch/$ARCHNAME/mm/hugetlbpage.c usually). Physical memory for huge pages must be reserved prior to their usage. This is done by putting a value into /proc/sys/vm/nr_hugepages. The sysctl handler for this tries to allocate as much contiguous physical memory as possible. After this step, application programs can allocate huge pages in one of two ways - system V shared memory or hugetlbfs. See Documentation/vm/hugetlbpage.txt for more details on this. In this document, we'll deal with the arch specific pieces of hugetlb page implementation on ppc44x architecture.

LinuxMM: Bluegene hugepages (last edited 2007-08-17 18:43:29 by satya)