It's not clear to kernel developers what current and potential users of large pages want. In an effort to find out, we are conducting email interviews with Linux users, asking what their experiences have been and what functionality they would find useful. == The Questions == The following survey text will be sent to all participants to encourage consistency in the responses. ---- Thanks in advance for answering our questions. This information will be shared with many Linux developers, so please be as detailed as you can in your answers. We don't need to know the name of your organization or application, but we would appreciate it if you could tell us that as well. If you want your response to be discussed at the 2006 OLS large pages BOF (see http://www.linuxsymposium.org/2006/view_abstract.php?content_key=289), please return your answers by July 10, 2006. Also, please consider attending the large pages BOF if you will be at OLS. Thanks! 1. What does your organization do? 2. What does your application do? 3. How are large pages useful to you? 4. What is your experience with the Linux hugetlbfs implementation of large page support? 5. Does Linux's large page support affect your decision whether or not to use Linux? If so, how? 6. What would make large page support in Linux better for your application? ---- == The Answers == === The UK Astrophysical Fluids Facility (UKAFF) === '''A1''' The UK Astrophysical Fluids Facility (UKAFF) is a national supercomputing facility for theoretical astrophysics research http://www.ukaff.ac.uk '''A2''' There are a variety of user written applications but probably the most relevant for a discussion on large pages are those which use a technique called Smoothed Particle Hydrodynamics (SPH). This is a particle based (rather than grid based) method of computing fluid dynamics which was originally developed for Astrophysics by Joe Monaghan and his collaborators in 1977. For further information about the specifics of SPH, this article is probably a good starting point: [http://ukads.nottingham.ac.uk/cgi-bin/nph-iarticle_query?1992ARA%26A..30..543M&data_type=PDF_HIGH&type=PRINTER&filetype=.pdf] The Gadget-2 code used for the large "Millenium" cosmology simulation at Max Planck has some SPH elements to it. SPH is used at UKAFF for a variety of astrophysics simulations such as this one http://www.ukaff.ac.uk/starcluster/ '''A3''' One problem with large SPH simulations in for astrophysics problems such as those simulated on the UKAFF systems is the rate at which particles are mixed up. By this I mean that a particles nearest neighbours change on a short timescale - unlike, for example, a smooth steady flow of water. Watch the movies of the Star Cluster simulation (URL above) or the Neutron Star Mergers http://www.ukaff.ac.uk/movies/nsmerger/ and you'll probably get the idea. This leads to memory accesses becoming very random - you need to calculate interactions between neighbouring particles and these get mixed up. Sorting the particles regularly to reduce the randomness tends to carry a very high cpu time overhead. Partial sorts or less frequent sorting are often used to try and balance time spent sorting with time spent doing science but this still leaves a significant level of andomness to the memory accesses. These memory accesses are then frequently cache misses which introduces a high latency to the memory request. Increasing page sizes from 4K to 16M significantly reduces this problem as the number of tlb misses drops. Typically it will reduce runtimes by 25-30% but in an extreme case I've seen an SPH code run 3x faster simply by enabling large pages. '''A4''' We have no real experience as it's completely unusable. With scientists writing their own codes rather than using a standard code, often in Fortran 77, there's no sensible way that they can implement large pages in their applications. Previously our used worked on an SGI system running IRIX where implementation of large pages was simply by kernel options being set on the system to enable large pages and then the user set runtime environment variables. No code changes were required. The current Linux implementation requires memory to be reserved explicitly for large pages causing problems for any large page application which doesn't fit within the allocation and for any small page application which doesn't fit within the unreserved memory. Furthermore, as different applications need different amounts of large pages (or small pages), in a production environment we would need to efficiently change the amount of memory reserved for large pages. For an application that will use almost all of the 32GB on the system it's virtually impossible to dynamically reserve enough memory and a reboot is necessary. Fine in a lab on a development system but totally impractical for a production system. '''A5''' That depends on the hardware. On our pSeries systems we were told that we could use large pages on Linux but this has turned out to be incorrect. The choice to use linux was therefore wrong as we cannot get good enough performance from these expensive systems. If I was buying a standard x86-64 based server then it probably wouldn't affect my descision as the much lower cost outweighs the loss of the extra performance. Having said that, if UKAFF's next system were x86-64 based I'd certainly look to see whether there is usable large page support in Solaris if Linux hasn't improved by then. '''A6''' * works without code changes for Fortran 77 & 90, C and C++ * an IRIX-like implementation where there is no need to explicitly reserve memory for large pages * the ability to effectively coalesce small pages into large pages * if an explicit reservation is still required, a large page application which "overflows" the reservation should be able to use small pages for the overflowing part * this overflow must not be implemented so that the entire application uses small pages as this is useless where we might have, for example, 85% of the memory allocated to large pages and the application needs 90% of the memory. Letting 5% use small pages is better than trying to fit the entire application in the 15% unreserved space.