LinuxMM:

Nitin Gupta

<nitingupta910 AT gmail DOT com>


Manages storage for variable sized data objects

It is designed especially for embedded devices.



This page describes the problem it tries to solve and its design details.

Problem Statement
Normally when you allocate arbitrary sized objects using kmalloc()/vmalloc() there is big space wastage due to internal fragmentation. So, if memory is at premium, tight storage is required for these variables sized data items which is what VStore does. This however comes at cost of some speed.

How to use it?
To store data:

vstore_write(objectID /* out */, data_to_store, len)

To restore this data:

vstore_read(objectID, buffer, len /* out */, alloc_flags)

NOTE: In vstore_read(), make sure that buffer is big enough to store data assoc. with object 'ObjectID' - the 'len' (out) param will tell how much data was read into the buffer. Thus, to make sure that buffer size is always sufficient, either you know maximum size for objects stored (more common case) or you will have to track size of each object stored to provide buffer of right size.

Design Details


Design Goals


Each data item stored is split into multiple chunks (small contiguous physical space) and these chunks can then be spread across multiple physical pages. The Metadata associated with each chunk is stored in the beginning of chunk itself with necessary padding added to maintain alignment constraints.

Chunks can be one of three types:

Following gives metadata layout for each of these types of chunks:

Some jargon first:

M     = Metadata - size has to be n * ALIGN_BYTES
M(A)  = address of next chunk relative to page start (PAGE_SHIFT - ALIGN_SHIFT bits)
M(S)  = size of chunk (same as M(A))
F     = flags (2 bits)
        F(1): last chunk
        F(2): next chunk in diff page
FP    = prev chunk's flags
FN    = next chunk's flags
PFN   = PageFrameNo where next chunk exists (BITS_PER_LONG - PAGE_SHIFT bits)
OD(i) = i-th optional metadata field. Optional fields are enclosed in []

- Free chunk

M(A) | F | M(S) | pad(align) | <free space> | pad(align)

- Busy chunk type-1 (next chunk in same page)

M(A) | F | M(S) | [ M(S) of prev chunk ] | pad(align) | <data> | pad(align)
        - OD(1) exists if F(2) is not set

- Busy chunk type-2 (next chunk in different page)

M(A) | F | PFN | [ M(S) of prev chunk ] | [ M(S) of this chunk ] | pad(align) | <data> | pad(align)
        - only OD(1) exists if          FP(2) && !F(1)
        - only OD(2) exists if          F(1) && !FP(2)
        - both OD(1) and OD(2) exist if FP(2) && F(1)

NOTE:

Chunk management

                 Freelists
       +---------+---------+---------+---------+---------+---------+---------+---------+
 Order |    4    |    5    |    6    |   --- --- --- --- |    10   |   11    |    12   |
       +---------+---------+---------+---------+---------+---------+---------+---------+
              ^                  ^                                     ^
              | page->lru        |----|                                |
              V                       V                                V
          +----------+  offset -> +----------+               pages with [2^11, 2^12)
          |    1     |            |  | F     |                     bytes free
page->    |  | M(A)  |      +---> +--|-------+
 offset-->+--|-------+      |     |  | 2   | |
          |  | F  |  +      |     +--|-----|-+
          +--|----|--+      |     |  | 3   | |
          |  | 2  |  |------+     +--|-----|-+
          +--|----|--+ PFN + M(A) |  |     | |
          |  | F  V  |            |  V F   | |
          +--|-------+            +--------|-+
          |  V 1     |            |    2   V |
          +----------+            +----------+
              ^
              | page->lru
              V
          more pages

NOTE:


Implementation

Now, we can represent these three chunk types as:

#define ALIGN_SHIFT    2
#define ALIGN_BYTES    (1 << ALIGN_SHIFT)
#define MA_SIZE        PAGE_SHIFT - ALIGN_SHIFT
#define MS_SIZE        MA_SIZE

- Representing Free chunk

struct free_chunk {
        unsigned long MA: MA_SIZE;
        unsigned long flags: 2;
        unsigned long MS: MS_SIZE;
} __attribute__ ((packed));

- Representing Busy chunk type-1 (next chunk in same page)

struct busy_type1_chunk {
        unsigned long MA: MA_SIZE;
        unsigned long flags: 2;
        unsigned long MS1: MS_SIZE;
        unsigned long MS2: MS_SIZE;
} __attribute__ ((packed));

- Representing Busy chunk type-2 (next chunk in different page)

struct busy_type2_chunk {
        unsigned long MA: MA_SIZE;
        unsigned long flags: 2;
        unsigned long PFN: BITS_PER_LONG - PAGE_SHIFT;
        unsigned long MS1: MS_SIZE;
        unsigned long MS2: MS_SIZE;
} __attribute__ ((packed));

VStore operations

/*
 * Args (IN):
 *      obj_id: handle of the required object (as returned by vstore_write)
 *      data: buffer space to read object data into
 * Args (OUT)
 *      len: length of data read into buffer
 *
 * Return:
 *      0 on success, <0 on error (should NEVER occur)
 *      Errors:
 *              -ENOMEM
 *
 * NOTE: Make sure that buffer 'data' has enough space for object 'obj_id'.
 */
int vstore_read(unsigned long obj_id, void *data, unsigned int *len);

/*
 * Args (IN):
 *      data: pointer to data to be copied to vstore
 *      len: data length (in bytes)
 *      alloc_flags: allocation flags used when we need to allocate more chunks to store this data (e.g. GFP_KERNEL)
 * Args (OUT):
 *      obj_id: handle to object stored
 *
 * Return:
 *      0 on success, <0 on error
 *      Errors:
 *              -ENOMEM
 */
int vstore_write(unsigned long *obj_id, void *data, unsigned int *len, unsigned long alloc_flags);

vstore_read
Time: O(num_chunks)

Sub ops:

  1. Collect data from all chunks into given buffer - O(num_chunks)
  2. Free these chunks and merge them with adjacent free chunks - O(num_chunks)

vstore_write
Time: O(num_chunks) (if enough free chunks are already in freelists)

Sub ops:

  1. Allocate required no. of chunks from freelists - O(num_chunks)
  2. If sufficient free chunks are not available, allocate additional chunks (allocate 0-order page and add as single chunk in highest order freelist) - might sleep depending on alloc_flags.
  3. Copy data to these chunks - O(num_chunks)

LinuxMM: VStore (last edited 2017-12-30 01:05:10 by localhost)