Friday, April 27, 2012

Virtual File System / zlib integration

I'm currently trying to optimize resource loading before starting a full level design swing on my 3D shooter.

The main problem is that file disk access is slow. This is true for virtually all platforms, including my platforms of interest: iPhone and Windows. This is due to several factors: disk memory is typically abundant but slow, and it can be shared by several processes which makes it a costy resource to access.

Data packing and partitioning


So in order to optimize the resource loading, I made sure the disk is accessed in a managed and efficient way. Firstly, the game data is now packed into a single "data.fs" file instead of a bunch of file free-for-all. Secondly, this data file is organized into "partitions". The main idea is to be able to pre-load a bunch of files into RAM memory, and access them later in a super fast way.

But since RAM memory is itself a scarce resource, partitions must be carefully managed, loaded, and disposed of when not needed. In particular, partitions should not be too big so they don't bloat the RAM and take ages to load. Also, the first partition to be loaded must be small so that the game boots quickly, which give an impression of speed. Here is a quick desc of my partitions:
  • Boot Partition: data needed to boot the game: (fonts, main menu graphics, ..) is loaded in the foreground thread as soon as the game starts.
  • Common Partition: data common to all game levels (Player data, HUD data, ..), is loaded in the background as soon as the game starts.
  • Partition1: data for Level1, is loaded in the background when Level1 is requested 
  • Partition2: data for Level2, is loaded in the background when Level2 is requested 
  • Partition3: data for Level3, is loaded in the background when Level3 is requested
  • etc.
To facilitate development, the data files are accessed directly in Debug mode (the data.fs pack is not used.) However, the data files need to be reorganized into sub-folders, each one representing a partition.

In Release mode, the data.fs is used, so I have a build step that prepares it. I wrote a tool that reads the structure of the data folder, and outputs the corresponding pack in a binary format. At the beginning of the pack, there is a meta information header about the files: The offset of each partition in the pack, and the offset of each file with respect to its partition. After the header, the content of each partition is appended.


Reading files from memory


The last step is to actually use this file system :) In Release mode, everytime the engine requests file data, it is transparently redirected to the file data in RAM (that is part of a previously loaded partition) instead of the data read from disk. I have a file map that is loaded at the beginning (from the data.fs header), that is used to match the file offset in RAM from the file path.

This integration is completely transparent to engine code, so when libpng or tinyxml call a shoot::File::Read (which is the Shoot equivalent of fread), they are given the corresponding data from RAM really quickly.

Compression using zlib


Since I have a lot of data in xml format (Levels, Resource descriptors, Entity templates, ..), I decided to optimize things further and compress each partition using zlib. Basically zlib was already there as part of the libpng integration I just had to use it. Zlib is so easy to use, here is a quick example:
unsigned long compressedDataSize = unsigned long(originalDataSize * 1.1) + 12;

compress(dataCompressed, &compressedDataSize , data, originalDataSize);

uncompress(data, &originalDataSize, dataCompressed, compressedDataSize);

Monday, April 23, 2012

libPNG integration

I decided to integrate libPNG while I was tracking a memory leak that seemed to originate from the SOIL image library. Later it turned out that the leak was caused by my engine code not by SOIL, but still, integrating libPNG was the right thing to do. Not only the integration went very smooth, it also worked on iPhone on the first try.

The main advantage of libPNG is that it's designed to let you customize memory allocation and file i/o functions, in contrast with SOIL. This made the integration very smooth, as I did not need to modify a single line of code in libPNG to make it use my custom allocator and File class. Here is my full PNGLoader class, which returns the texture data needed by glTexImage2D. It is largely inspired from the PNG loading code from Morten Nobel's blog.


/* 

Amine Rehioui
Created: April 22nd 2012

*/

#include "Precompiled.h"

#include "PNGLoader.h"

#include "File.h"

#include "png.h"
#include "pnginfo.h"

namespace shoot
{
    //! PNG malloc
    png_voidp PNGMalloc(png_structp png_ptr, png_size_t size)
    {
        return snew u8[size];
    }

    //! PNG free
    void PNGFree(png_structp png_ptr, png_voidp ptr)
    {
        delete[] (u8*)ptr;
    }

    //! PNG error
    void PNGError(png_structp png_ptr, png_const_charp msg)
    {
        SHOOT_ASSERT(false, msg);
    }

    //! PNG warning
    void PNGWarning(png_structp png_ptr, png_const_charp msg)
    {
        SHOOT_WARNING(false, msg);
    }

    //! PNG read
    void PNGRead(png_structp png_ptr, png_bytep data, png_size_t length)
    {
         File* pFile = (File*)png_get_io_ptr(png_ptr);
         pFile->Read(data, length);
    }

    //! loads a texture
    u8* PNGLoader::Load(const char* strPath, s32& width, s32& height, s32& channels)
    {
         File* pFile = File::Create(strPath);
         pFile->Open(File::M_ReadBinary);

         png_structp png_ptr = png_create_read_struct_2(PNG_LIBPNG_VER_STRING, 
         /*error_ptr*/NULL,
         PNGError,
         PNGWarning,
         /*mem_ptr*/NULL, 
         PNGMalloc,
         PNGFree);

         SHOOT_ASSERT(png_ptr, "png_create_read_struct_2 failed");

         png_infop info_ptr = png_create_info_struct(png_ptr);
         SHOOT_ASSERT(info_ptr, "png_create_info_struct failed");

         png_set_read_fn(png_ptr, pFile, PNGRead);

         u32 sig_read = 0;
         png_set_sig_bytes(png_ptr, sig_read);
  
         png_read_png(png_ptr, info_ptr, PNG_TRANSFORM_STRIP_16 | PNG_TRANSFORM_PACKING | PNG_TRANSFORM_EXPAND, NULL);

         width = info_ptr->width;
         height = info_ptr->height;
         switch (info_ptr->color_type)
         {
         case PNG_COLOR_TYPE_RGBA:
             channels = 4;
         break;

         case PNG_COLOR_TYPE_RGB:
             channels = 3;
         break;

         default: SHOOT_ASSERT(false, "Unsupported PNG format");
         }

         u32 row_bytes = png_get_rowbytes(png_ptr, info_ptr);        
         png_bytepp row_pointers = png_get_rows(png_ptr, info_ptr);

         u8* data = snew u8[row_bytes * height];  
         for (s32 i=0; i<height; ++i)
         {
              memcpy(data+(row_bytes*i), row_pointers[i], row_bytes);
         }

         png_destroy_read_struct(&png_ptr, &info_ptr, NULL);
         pFile->Close(); 
         delete pFile;
         return data;
    }
}

The drawback is that libPNG obviously does not support formats other than PNG, in contrast with SOIL which could also do JPG, BMP, TGA. But the smooth integration and the fact that it also worked on iPhone makes it really worth it. Also, PNG answers all needs for my 3D game for now.

Rendering Strategy

I recently optimized the rendering in Shoot, and came out with something faster yet with a very simple implementation.

My goal was to minimize the graphic driver state changes. This typically involves grouping the entities per material and geometry, so as the ones that share the same properties get rendered using the same draw call. This gives a much better performance when the entity count is up. I could push it further using geometry instancing, whereas the world transforms of several entities are "pushed" along a single draw call, but for now, I do without it.

Without further talking here is roughly how it works:
//! vertex info
struct VertexInfo
{
    VertexBuffer* pVertexBuffer;
    std::vector<Matrix44> aTransforms;
};

typedef std::map< u32, VertexInfo > VertexMap;

//! render info
struct RenderInfo
{
    Material* pMaterial;
    VertexMap m_VertexMap;
};

typedef std::map< u32, RenderInfo > RenderMap;

//! adds an entity to a render map
void EntityRenderer::AddToRenderMap(RenderMap& renderMap, RenderableEntity* pEntity)
{
    Material* pMaterial = pEntity->GetMaterial();
    VertexBuffer* pVertexBuffer = pEntity->GetVertexBuffer();
    renderMap[pMaterial->GetID()].pMaterial = pMaterial;
    renderMap[pMaterial->GetID()].m_VertexMap[pVertexBuffer->GetID()].pVertexBuffer = pVertexBuffer;
    renderMap[pMaterial->GetID()].m_VertexMap[pVertexBuffer->GetID()].aTransforms.push_back(pEntity->GetTransformationMatrix());
}

//! renders from a render map
void EntityRenderer::Render(RenderMap& renderMap)
{
    for(RenderMap::iterator it = renderMap.begin(); it != renderMap.end(); ++it)
    {
        Material* pMaterial = (*it).second.pMaterial;
        pMaterial->Begin();

        for(VertexMap::iterator it2 = (*it).second.m_VertexMap.begin();
            it2 != (*it).second.m_VertexMap.end();
            ++it2)
        {
            VertexInfo& vertexInfo = (*it2).second;
            vertexInfo.pVertexBuffer->Begin();

            std::vector<Matrix44>& aTransforms = vertexInfo.aTransforms;
            for(u32 i=0; i<aTransforms.size(); ++i)
            {
                GraphicsDriver::Instance()->SetTransform(GraphicsDriver::TS_World, aTransforms[i]);
                vertexInfo.pVertexBuffer->Draw();
            }
            vertexInfo.pVertexBuffer->End();
        }    

       pMaterial->End();
    }
}

//! renders the entities (Pseudo code)
void EntityRenderer::Render()
{
    ...Setup 3D view

    Render(m_SkyBoxMap);

    Render(m_Solid3DMap);

    Render(m_Transparent3DMap);

    ...Setup 2D View

    Render(m_Solid2DMap);

    Render(m_Transparent2DMap);
}