15,039,359 members
Articles / General Programming / Algorithms
Article
Posted 10 Jul 2010

52K views
50 bookmarked

# Iterative Implementation of Recursively Enumerating Files and Sub Folders

Rate me:
Yet another implementation to enumerate files

The demo application, QuickOpenFiles.

## Introduction

Microsoft provided a sample code for Listing Files in a Directory, but that example code cannot be used to list files in the sub-directory, so how can we do that?

Enumerating/searching files and sub folders is in fact a rather basic skill, people have posted many articles on that, you can see the other examples here. However, many of those implementations enumerate files in a recursive way. As a beginner in programming, I just wonder how to do that non-recursively.

## Recursive Implementation

In many cases, recursive implementation can be really elegant, if you already know how to do this recursively, then you may just simply skip this part.

To search files in Windows, we can use the APIs `FindFirstFile`/`FindNextFile` (or `SHGetDataFromIDList`, or the `FileSystem `library in boost), and a typical implementation is to do this recursively:

C++
```#include "strsafe.h"

bool EnumerateFolder(LPCTSTR lpcszFolder, int nLevel = 0)
{
WIN32_FIND_DATA ffd;
TCHAR szDir[MAX_PATH];
HANDLE hFind = INVALID_HANDLE_VALUE;

StringCchCopy(szDir, MAX_PATH, lpcszFolder);
StringCchCat(szDir, MAX_PATH, TEXT("\\*"));

// Find the first file in the directory.

hFind = FindFirstFile(szDir, &ffd);

if (INVALID_HANDLE_VALUE == hFind)
{
return false;
}

// List all the files in the directory with some info about them.

TCHAR szOutLine[MAX_PATH] = {0};
for (int ii = 0; ii < nLevel; ++ii)
szOutLine[ii] = _T('\t');

do
{
if (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
if ( _T('.') != ffd.cFileName[0] )
{
_tprintf(TEXT("%s%s   <DIR>\n"),
szOutLine, ffd.cFileName);
StringCchCopy(szDir+_tcslen(lpcszFolder)+1,
MAX_PATH, ffd.cFileName);

EnumerateFolder(szDir, nLevel+1);    // Here's the recursive call!
}
}
else
{
_tprintf(TEXT("%s%s\n"), szOutLine, ffd.cFileName);
}
}
while (FindNextFile(hFind, &ffd) != 0);

FindClose(hFind);
return true;
}```

## The Issues of Recursion

The readability of recursive implementation is usually good, but I started to think about the potential Stack Overflow problem.

In Windows, each thread is granted a default stack size of 1 MB, and that might be big enough in most cases.

In the above function `EnumerateFolder()`, it will take around 580 bytes for the local variables (ignore the `szOutLine`). That means, in theory, we can enumerate up to 1808 levels of subdirectories.

However, since the buffer size (`MAX_PATH`) of the path is limited to 260 characters, it is not possible to run into that amount of levels.

In fact, Microsoft does provide the mechanism to break that limitation.

In the ANSI version of this function, the name is limited to `MAX_PATH` characters. To extend this limit to 32,767 widecharacters, call the Unicode version of the function and prepend "\\?\" to the path.

Although some people said that 260 characters is good enough, and the stack might not that easily overflow, I guess avoiding recursion when possible can't be a bad idea; am I correct? This implementation is more like a practice for beginners like me, so let's just have fun coding it.

## Iterative Implementation

In general, there are two ways to do a searching iteratively:

Since Raymond Chen explained that breadth-first searching is better for file system tree walking, I decided to post the BFS implementation snippet here.

For those who still want to know how to implement this using DFS, you can dig into the demo code in this page and open the source file FileEnumerator.h, then you will see that I have prepared some `#ifdef `switch there, you can simply try comment out/uncomment this macros:

C++
```//#define FILEENUMERATOR_RECURSION
//#define FILEENUMERATOR_DOCOUNTING

#ifndef FILEENUMERATOR_RECURSION
#define FILEENUMERATOR_BFS
#endif```

Now let's write a BFS implementation. First, we need a class that encapsulates the basic operations:

C++
```class CFileFinder
{
public:
CFileFinder(LPCTSTR lpcszInitDir, FindFileData& fileInfo)
: m_fileInfo(fileInfo)
{
Init(lpcszInitDir);
}

public:
inline bool FindFirst()
{
return EnumCurDirFirst();
}

inline bool FindCurDirNext()
{
bool bRet = ::FindNextFile(m_hFind, &m_fileInfo) != FALSE;
if ( bRet )
{
m_szPathBuffer.resize(m_nFolderLen);
m_szPathBuffer += m_fileInfo.cFileName;
}
else
{
::FindClose(m_hFind);
m_hFind = INVALID_HANDLE_VALUE;
}
return bRet;
}

virtual bool Finish() const
{ return INVALID_HANDLE_VALUE == m_hFind; }

inline LPCTSTR GetPath() const
{return STRBUFFER(m_szPathBuffer) + EXTEND_FILE_PATH_PREFIX_LEN;}

inline const FindFileData& GetFileInfo() const	{ return m_fileInfo; }

inline bool IsDirectory() const
{ return (m_fileInfo.dwFileAttributes &
FILE_ATTRIBUTE_DIRECTORY) == FILE_ATTRIBUTE_DIRECTORY; }
inline bool IsDot() const
{ return (m_fileInfo.cFileName[0] == '.') &&
((m_fileInfo.cFileName[1] == '.') ||
(m_fileInfo.cFileName[1] == '\0')); }
protected:
virtual bool EnumCurDirFirst()
{
m_szPathBuffer.resize(m_nFolderLen+2);
m_szPathBuffer[m_nFolderLen++]	= _T('\\');
m_szPathBuffer[m_nFolderLen]	= _T('*');

HANDLE hFind = ::FindFirstFile(STRBUFFER(m_szPathBuffer), &m_fileInfo);
bool bRet = hFind != INVALID_HANDLE_VALUE;
if ( bRet )
{
m_hFind			= hFind;
m_szPathBuffer.resize(m_nFolderLen);
m_szPathBuffer += m_fileInfo.cFileName;
}
return bRet;
}

void Init(LPCTSTR lpcszInitDir)
{
m_nFolderLen = _tcslen(lpcszInitDir);
m_szPathBuffer = lpcszInitDir;

if ( m_szPathBuffer[m_nFolderLen-1] == _T('\\') )
{
m_szPathBuffer.erase(m_nFolderLen-1);
--m_nFolderLen;
}
}

protected:
FindFileData&	m_fileInfo;
tstring			m_szPathBuffer;
UINT			m_nFolderLen;
HANDLE			m_hFind;
};```

Then we need to put the pending sub-directories into a queue:

C++
```#include <boost/shared_ptr.hpp>
typedef boost::shared_ptr<CFileFinder>	FileFindPtr;

typedef std::queue<FileFindPtr>				FileFindQueue;

bool CFileEnumeratorBase::EnumerateBFS
( LPCTSTR lpcszInitDir, FindFileData& findFileData, HANDLE hStopEvent /*= NULL*/ )
{
FileFindPtr finder = NULL;
try
{
finder = new CFileFinder(lpcszInitDir, findFileData);
}
{
CFE_ASSERT(0);
return false;
}

bool bRet = true;
FileFindQueue finderQueue;

if ( !finder->FindFirst() )
{
m_dwLastError = ::GetLastError();
OnError(finder->GetPath());
return false;
}
else
{
while( !finder->Finish() && !IsStopEventSignaled() )
{
const FindFileData& fileInfo = finder->GetFileInfo();

if( finder->IsDirectory() )
{
if ( !finder->IsDot() )
{
if ( CheckUseDir
(finder->GetPath(), fileInfo) )
{
HandleDir(finder->GetPath(),
fileInfo);

FileFindPtr newFinder = NULL;
try
{
newFinder =
new CFileFinder
(finder->GetPath(),
findFileData);
finderQueue.push
(newFinder);
}
{
CFE_ASSERT(0);
}
}
}
}
else
{
if ( CheckUseFile(finder->GetPath(), fileInfo) )
{
HandleFile(finder->GetPath(), fileInfo);
}
}
if ( !finder->FindCurDirNext() )
{
FinishedDir( finder->GetPath() );
if ( finderQueue.empty() )
break;
else
{
while ( !IsStopEventSignaled() )
{
FileFindPtr nextFinder =
finderQueue.front();
finderQueue.pop();

finder = nextFinder;

if ( !finder->FindFirst() )
{
m_dwLastError =
::GetLastError();
if ( !OnError
(finder->GetPath()) )
{
return false;
}
}
else
break;
}
}
}
}
}
return bRet;
}```

## Using the Code

The initial design of classes was inspired by Andreas Saurwein Franci Gonçalves' article, please read his article first! Then you need to implement your own class derived from either `CFileEnumeratorBase` or `CFilteredFileEnumerator`.

`CFileEnumeratorBase` is provided for basic file enumeration, and `CFilteredFileEnumerator` is provided for pattern filtering enumeration.

Following is the example class derived from `CFilteredFileEnumerator`:

C++
```class CFileInfoEnumerator : public CFilteredFileEnumerator
{
public:
CFileInfoEnumerator();
virtual ~CFileInfoEnumerator();
public:
enum FileInfoType
{
FIT_INVALID        = -1,
FIT_FIRST,
FIT_FILENAME    = FIT_FIRST,
FIT_FILEPATH,
FIT_FILEEXT,
FIT_MODIFIED,

FIT_SIZE,
};

struct FileInfo
{
std::string sFileInfo[FIT_SIZE];
};
public:
int                GetFileCount() const    { return m_arrFileInfo.size(); }

const FileInfo*    GetFileInfo(int nIndex);
protected:
virtual void    HandleFile(LPCTSTR lpcszPath, const FindFileData& ffd);
protected:
typedef std::vector<fileinfo>    FileInfoArray;
FileInfoArray    m_arrFileInfo;
};

#define DEFAULT_SORT_ASCENDING true

CFileInfoEnumerator::CFileInfoEnumerator()
: m_nSortInfoType(FIT_INVALID)
, m_bSortAscending(DEFAULT_SORT_ASCENDING)
{

}

CFileInfoEnumerator::~CFileInfoEnumerator()
{

}

void CFileInfoEnumerator::HandleFile( LPCTSTR lpcszPath, const FindFileData& ffd )
{
FileInfo fileInfo;
fileInfo.sFileInfo[FIT_FILENAME] = ffd.cFileName;

fileInfo.sFileInfo[FIT_FILEPATH] = lpcszPath;

std::string::size_type nDotPos = fileInfo.sFileInfo[FIT_FILENAME].rfind(_T('.'));
if ( nDotPos >= 0 )
fileInfo.sFileInfo[FIT_FILEEXT] =
fileInfo.sFileInfo[FIT_FILENAME].c_str()+nDotPos+1;

SYSTEMTIME st;
#define FTIME_BUFFER_SIZE    255

TCHAR szModified[FTIME_BUFFER_SIZE+FTIME_BUFFER_SIZE],
szLocalDate[FTIME_BUFFER_SIZE], szLocalTime[FTIME_BUFFER_SIZE];
FILETIME ft = ffd.ftLastWriteTime;

FileTimeToLocalFileTime( &ft, &ft );
FileTimeToSystemTime( &ft, &st );
GetDateFormat( LOCALE_USER_DEFAULT, DATE_SHORTDATE,
&st, NULL, szLocalDate, FTIME_BUFFER_SIZE );
GetTimeFormat( LOCALE_USER_DEFAULT, TIME_NOSECONDS,
&st, NULL, szLocalTime, FTIME_BUFFER_SIZE );

_sntprintf(szModified, FTIME_BUFFER_SIZE+FTIME_BUFFER_SIZE,
_T("%s %s"), szLocalDate, szLocalTime);
fileInfo.sFileInfo[FIT_MODIFIED] = szModified;

m_arrFileInfo.push_back(fileInfo);
}

const CFileInfoEnumerator::FileInfo* CFileInfoEnumerator::GetFileInfo( int nIndex )
{
if ( 0 <= nIndex && nIndex < (int)m_arrFileInfo.size() )
{
return &m_arrFileInfo.at(nIndex);
}
return NULL;
}```

And to use it:

C++
```CFileInfoEnumerator enumerator;
TCHAR chDir[MAX_PATH];
GetCurrentDirectory(MAX_PATH, chDir);

enumerator.SetFilterPatterns(_T("*.exe;*.dll;*.h;*.c;*.cpp;"));
enumerator.Enumerate(chDir);```

I wrote a console application to demonstrate the use of the source code.

## History

• 2010-7-10: Initial release
• 2010-11-25: Provided the DFS implementation
• 2010-12-29: Skip the junction (reparse point) to avoid infinite loop

## Share

 Software Developer China
No Biography provided

 First Prev Next
 My vote of 4 Rozis30-Dec-10 12:10 Rozis 30-Dec-10 12:10
 nice Pranay Rana30-Dec-10 7:57 Pranay Rana 30-Dec-10 7:57
 have 5 For any question : http://pranayamr.blogspot.com/ vote my article : Learn SQL to LINQ ( Visual Representation ) Calling WCF Services using jQuery
 system("dir /b /s /a-d c:\\*.* >c:\\allfiles_in_c.txt"); Member 448140729-Nov-10 21:15 Member 4481407 29-Nov-10 21:15
 Becareful Junction and Invalid entry! Member 448140729-Nov-10 21:08 Member 4481407 29-Nov-10 21:08
 Re: Becareful Junction and Invalid entry! yonken30-Nov-10 0:58 yonken 30-Nov-10 0:58
 Re: Becareful Junction and Invalid entry! Member 448140730-Nov-10 20:31 Member 4481407 30-Nov-10 20:31