Click here to Skip to main content
15,867,568 members
Articles / Programming Languages / C++

Managing Sparse Files on Windows

Rate me:
Please Sign up or sign in to vote.
4.94/5 (14 votes)
19 Jan 2010CPOL7 min read 47.7K   34   14
This article helps users understand Sparse files on Windows and how to create and manipulate these files from within their Windows applications.

Introduction

This article focuses on how a developer can write functions that allows him or her to manipulate Sparse files from within a Windows application. Sparse files are managed in a special way by the file system, and typically can contain several regions of unallocated and\or zeroed out ranges. The file system cleverly optimizes the on-disk consumption of storage by only persisting the allocated regions and just keeping track of unallocated regions as well as Sparse regions (zeroed out regions) in meta data.

Background

Sparse diles are defined as files that contain large regions that do not have any data stored in them or are explicitly zeroed out. In case of a normal file, even though large regions are zeroed out, the file will still consume the same amount of space if there were any other valid data in those regions (non-zero). Using Sparse files, the user can tell the Operating System and the file system that this is a special file and the regions that are zeroed out are just empty spaces. File systems like NTFS will typically optimize the way they store a Sparse file, and just allocate space for the allocated regions within the file. For the ranges that need to be marked as sparse, the user needs to tell the file system to specially set zero (through an IOCTL) such that NTFS will make an internal update to its metadata, marking the range as sparse, and not allocating any extra space on disk for the extra zeroes. The requirement to use Sparse files is specific to an application, hence a sparse file is transparent to the application that uses it. Applications need to be aware of the APIs that allow it to query, manage, and manipulate a Sparse file. An application may decide to make a normal file as Sparse. This is allowed, but then, it's also the responsibility of the application to ensure that it scans the files for regions of zero that it needs to explicitly mark as Sparse. A normal file can be made Sparse by explicitly sending an IOCTL to the file system asking for the internal attribute of the file to be set Sparse.

Sparse file related operations in Windows

The first thing that a developer or an application needs to do is to check if the volume on which the Sparse file is going to be created actually supports Sparse files. Essentially, this call is to verify whether the file system supports Sparse files or not. The Win32 API, GetVolumeInformation, is used to get back the various attributes of the volume from which we can use the FILE_SUPPORTS_SPARSE_FILES flag to check for the volume's ability to support Sparse files. The code snippet below can be used to simply assess whether Sparse files are supported or not.

C++
BOOL SparseFileSuppored(LPCTSTR lpVolRootPath)
{
    DWORD dwFlags;

    GetVolumeInformation(
        lpVolRootPath, 
        NULL, 
        MAX_PATH, 
        NULL, 
        NULL,
        &dwFlags, 
        NULL, 
        MAX_PATH);

   if(dwVolFlags & FILE_SUPPORTS_SPARSE_FILES) return TRUE
   return FALSE;

}

Now that we have made an assessment if the volume can host the Sparse file, the next step is to be able to create a Sparse file on the given volume. The steps to create a Sparse file do not follow a separate path. The user needs to create a file as he or she would do normally (using CreateFile); however, once a file has been successfully created, the user needs to use the file system control, FSCTL_SET_SPARSE, to mark the file as Sparse. If the user does not issue this code, then the file continues to remain a normal file. The code snippet below shows how to create and mark a file as Sparse.

C++
HANDLE CreateSparseFile(LPCTSTR lpSparseFileName)
{
    // Use CreateFile as you would normally - Create file with whatever flags 
    //and File Share attributes that works for you
    DWORD dwTemp;

    HANDLE hSparseFile = CreateFile(lpSparseFileName, 
                                    GENERIC_READ|GENERIC_WRITE, 
                                    FILE_SHARE_READ|FILE_SHARE_WRITE, 
                                    NULL, 
                                    CREATE_ALWAYS, 
                                    FILE_ATTRIBUTE_NORMAL, 
                                    NULL);

    if (hSparseFile == INVALID_HANDLE_VALUE) 
        return hSparseFile;

    DeviceIoControl(hSparseFile, 
                    FSCTL_SET_SPARSE, 
                    NULL, 
                    0, 
                    NULL, 
                    0, 
                    &dwTemp, 
                    NULL);
    return hSparseFile;
}

In the above code snippet, if you notice, we go through the creation of the file like we would normally do. Once the file has been successfully created, we go on to set the the file as Sparse. Not setting the file as Sparse would result in the file system allocating all regions, whether zero or not, explicitly. Only for a Sparse file will the file system optimize by not allocating zeroed out areas.

Once we have created the Sparse file, the next operation would be to be able to mark certain regions as Sparse. Sparse regions are marked using the FSCTL command code FSCTL_SET_ZERO_DATA. This code basically tells the file system to mark the specified range as zeroes, which internally is just stored as a range within the file system meta-data and does not explicitly allocate any space for it. If you don't set the file as Sparse, then this command will essentially result in physically making zeroes in the specified range. The code snippet below can be used to mark specific ranges to be Sparse using FSCTL_SET_ZERO_DATA.

C++
DWORD SetSparseRange(HANDLE hSparseFile, LONGLONG start, LONGLONG size)
{
    // Specify the starting and the ending address (not the size) of the 
    // sparse zero block
    FILE_ZERO_DATA_INFORMATION fzdi;
    fzdi.FileOffset.QuadPart = start;
    fzdi.BeyondFinalZero.QuadPart = start + size;    
    // Mark the range as sparse zero block
    DWORD dwTemp;
    SetLastError(0);
    BOOL bStatus = DeviceIoControl(hSparseFile, 
                                 FSCTL_SET_ZERO_DATA, 
                                 &fzdi, 
                                 sizeof(fzdi), 
                         NULL, 
                                 0, 
                                 &dwTemp, 
                                 NULL);
    if (bStatus) return 0; //Sucess
    else
    {
        DWORD e = GetLastError();
        return(e); //return the error value
    }
}

The FILE_ZERO_DATA_INFORMATION structure is used to specify the range that needs to be zeroed out or marked Sparse. The values to compute the range are anyways passed to this function, and has to be loaded into the FZDI structure. If the DeviceIoCOntrol call succeeds, then the file system would have marked the specified range as Sparse zero range.

So basically, now you know how to query the file system for Sparse file support, create a Sparse file, and mark a region Sparse. This should get you started with building your own Sparse file support in your application. However a few more helper functions would be nice to add to your application. These helper functions will assist you to get the size of a Sparse file (allocated size as well as the full size), get the sparse regions in a Sparse file, and last but not the least, to determine whether a file is a Sparse file or not.

We will first start with finding out if a file is Sparse or not. Determining if a file is Sparse is quite easy, you will need to pass the handle of the file to a Win32 API - GetFileInformationByHandle(). This functions takes two parameters: one is the handle of the file that we need information about, and the other is the BY_HANDLE_FILE_INFORMATION structure (BHFI for short). The BHFI structure consists of a dwFileAttributes bit mask that can be tested for the Sparse file attribute - FILE_ATTRIBUTE_SPARSE_FILE.

C++
BOOL IsSparseFile(LPCTSTR lpFileName)
{
    // Open the file for read
    HANDLE hFile = CreateFile(lpFileName,
                                  GENERIC_READ,
                                  FILE_SHARE_READ, 
                                  NULL, 
                            OPEN_EXISTING, 
                                  FILE_ATTRIBUTE_NORMAL, 
                                  NULL);
    if (hFile == INVALID_HANDLE_VALUE)
        return FALSE;

    // Get file information
    BY_HANDLE_FILE_INFORMATION bhfi;
    GetFileInformationByHandle(hFile, &bhfi);
    CloseHandle(hFile);
        
    if (bhfi.dwFileAttributes & FILE_ATTRIBUTE_SPARSE_FILE) return TRUE:

    return false;
}

Now, let's try to find out the size of our Sparse file. There are two sizes associated with a Sparse file. One is the file size, which is the sum of both the allocated and the unallocated ranges. The other is the size of the Sparse file considering only of the allocated regions. Both these data give us important information regarding our Sparse file, and help us manage the allocations within the Sparse file better. Remember, for those who have set quotas on their directories, the quota is based on the full size of the Sparse file and not just the allocated size on disk. There are two Win32 API functions that are useful for getting our file sizes. The first is GetFileSizeEx() and the other is GetCompressedFileSize(). The code snippet below illustrates a function that will take the name of the file and then print out both the full size of the file and the on-disk size of the file.

C++
BOOL GetSparseFileSize(LPCTSTR lpFileName)
{
    // Retrieves the size of the specified file, in bytes. The size includes 
    // both allocated ranges and sparse ranges.
    HANDLE hFile = CreateFile(lpFileName, 
                                  GENERIC_READ,
                                  FILE_SHARE_READ, 
                                  NULL,
                                  OPEN_EXISTING,
                                  FILE_ATTRIBUTE_NORMAL,
                                  NULL);

    if (hFile == INVALID_HANDLE_VALUE)
        return FALSE;    
    LARGE_INTEGER liSparseFileSize;
    GetFileSizeEx(hFile, &liSparseFileSize);    

    // Retrieves the file's actual size on disk, in bytes. The size does not 
    // include the sparse ranges.

    LARGE_INTEGER liSparseFileCompressedSize;
    liSparseFileCompressedSize.LowPart = GetCompressedFileSize(lpFileName, 
       (LPDWORD)&liSparseFileCompressedSize.HighPart);
    // Print the result
    _tprintf(_T("\nFile total size: %I64uKB\nActual size on disk: %I64uKB\n"), 
    liSparseFileSize.QuadPart / 1024, 
    liSparseFileCompressedSize.QuadPart / 1024);

    CloseHandle(hFile);
    return TRUE;
}

Finally, some applications will need to find out all the allocated ranges within the Sparse file. This is useful for an application to make effective allocations based on the available vacant ranges within the Sparse file. We need to use the file system control code FSCTL_QUERY_ALLOCATED_RANGES in order to get all the allocated ranges. When a DeviceIoControl is issued on the file, a buffer with the allocated ranges is returned in the FILE_ALLOCATED_RANGE_BUFFER (FARB) structure. The DeviceIoCOntrol returns an array of the FARB structure equal to the number of allocation ranges found on the Sparse file. The code snippet below shows how the Sparse file can be queried for the Sparse ranges and all the ranges can be obtained.

C++
BOOL GetSparseRanges(LPCTSTR lpFileName)
{
    // Open the file for read
    HANDLE hFile = CreateFile(lpFileName, 
                                  GENERIC_READ,
                                  FILE_SHARE_READ,
                                  NULL
                          OPEN_EXISTING, 
                                  FILE_ATTRIBUTE_NORMAL, 
                                  NULL);
    if (hFile == INVALID_HANDLE_VALUE)
        return FALSE;

    LARGE_INTEGER liFileSize;
    GetFileSizeEx(hFile, &liFileSize);

    // Range to be examined (the whole file)
    FILE_ALLOCATED_RANGE_BUFFER queryRange;
    queryRange.FileOffset.QuadPart = 0;
    queryRange.Length = liFileSize;

    // Allocated areas info
    FILE_ALLOCATED_RANGE_BUFFER allocRanges[1024];

    DWORD nbytes;
    BOOL bFinished;
    _putts(_T("\nAllocated ranges in the file:"));
    do
    {
        bFinished = DeviceIoControl(hFile, FSCTL_QUERY_ALLOCATED_RANGES, 
            &queryRange, sizeof(queryRange), allocRanges, 
            sizeof(allocRanges), &nbytes, NULL);

        if (!bFinished)
        {
            DWORD dwError = GetLastError();

            // ERROR_MORE_DATA is the only error that is normal
            if (dwError != ERROR_MORE_DATA)
            {
                _tprintf(_T("DeviceIoControl failed w/err 0x%8lx\n"), dwError);
                CloseHandle(hFile);
                return FALSE;
            }
        }

        // Calculate the number of records returned
        DWORD dwAllocRangeCount = nbytes / 
            sizeof(FILE_ALLOCATED_RANGE_BUFFER);

        // Print each allocated range
        for (DWORD i = 0; i < dwAllocRangeCount; i++)
        {
            _tprintf(_T("allocated range: [%I64u] [%I64u]\n"), 
                allocRanges[i].FileOffset.QuadPart, 
                allocRanges[i].Length.QuadPart);
        }

        // Set starting address and size for the next query
        if (!bFinished && dwAllocRangeCount > 0)
        {
            queryRange.FileOffset.QuadPart = 
                allocRanges[dwAllocRangeCount - 1].FileOffset.QuadPart + 
                allocRanges[dwAllocRangeCount - 1].Length.QuadPart;
            
            queryRange.Length.QuadPart = liFileSize.QuadPart - 
                queryRange.FileOffset.QuadPart;
        }

    } while (!bFinished);

    CloseHandle(hFile);
    return TRUE;
}

End note

Sparse files find many interesting uses, some of them are in databases, snapshots, file based volumes, storing persistent sparse matrix for math applications etc. The above set of functions can be used as is to integrate Sparse file support and management from within your application. I have found this to be very useful especially while fixing some defects and testing a third party file system for Sparse file support on Windows.

Reference

MS SDK documentation and samples.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Engineer Hewlett Packard
India India
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionGetSparseRange fails to return ranges? Pin
ehaerim8-Oct-15 20:16
ehaerim8-Oct-15 20:16 
AnswerRe: GetSparseRange fails to return ranges? Pin
Paul Tait19-Jul-22 17:53
Paul Tait19-Jul-22 17:53 
QuestionQuestions Pin
Swarajya Pendharkar1-May-12 4:53
Swarajya Pendharkar1-May-12 4:53 
GeneralMy vote of 5 Pin
T800G25-Dec-11 10:09
T800G25-Dec-11 10:09 
GeneralGood article Pin
0pal31-Mar-10 20:05
0pal31-Mar-10 20:05 
GeneralRe: Good article Pin
Member 996029722-Jan-14 14:20
professionalMember 996029722-Jan-14 14:20 
GeneralRe: Good article Pin
ehaerim8-Oct-15 13:48
ehaerim8-Oct-15 13:48 
QuestionManaged Wrappers? Pin
BobElward25-Jan-10 10:25
BobElward25-Jan-10 10:25 
AnswerRe: Managed Wrappers? Pin
OvermindDL125-Jan-10 17:32
OvermindDL125-Jan-10 17:32 
Generalgood job Pin
Arlen Navasartian19-Jan-10 21:27
Arlen Navasartian19-Jan-10 21:27 
take my 55555555

-------
Arlen.N

GeneralBug in sparse files [modified] Pin
xanatos19-Jan-10 21:23
xanatos19-Jan-10 21:23 
Generalvote 5 Pin
Rozis19-Jan-10 6:29
Rozis19-Jan-10 6:29 
GeneralRe: vote 5 Pin
Rajkumar-Kannan19-Jan-10 16:37
Rajkumar-Kannan19-Jan-10 16:37 
GeneralRe: vote 5 Pin
ehaerim8-Oct-15 13:49
ehaerim8-Oct-15 13:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.