I've got a frustrating problem, I hope folks here can help out. I have an Excel file (.xlsx format) that I have to read with a C++ application. I have no control over the Excel file itself. This used to work when the Excel file was done with Office 2003 (.xls) but the producer of the file is now using Office 2007 (.xlsx) and this problem cropped up.
The interesting columns in the data file are the first two, a 'part number' and a 'support band'. A sample of the first few records is below.
<small>201493-B21 699<br />
201723R-B22 5J0<br />
201723R-B22 6FG<br />
218231R-B22 562<br />
218231R-B22 699</small>
When I read this, I no longer get the second field as a string, I get "699.0" as if it was interpreted as a number. The 2nd record does not return "5J0" but a blank field "" (presumably because it is an illegal number). I've tried lots of things but it always works out this way.
I access the file with
sDsn.Format("ODBC;DRIVER={%s};DSN=\"\";DBQ=%s;ReadOnly=True", sDriver, FileHelper::GetFullPathspec(filename));
for (ii=0; ii<dim(ExcelTabs); ii++)
{
try
{
database.Open(NULL, false, false, sDsn);
CRecordset recset(&database);
sSql.Format("SELECT * FROM [%s$]", ExcelTabs[ii]);
recset.Open(CRecordset::forwardOnly, sSql, CRecordset::readOnly);
recset.GetFieldValue((short)0, thePart);
recset.GetFieldValue((short)1, theBand);
...
... yada yada yada
and I've tried using the CBVariant records but the string fields there are not right either.
So, without changing the content of the .xlsx file, which comes from an internal web site, how can I force the interpretation of the 2nd field to be text or characters intead of numeric? Is there something I can do with the SQL statement that fetches the data? Or to the database.Open()? I'm at a loss here. Google / Bing / Yahoo have been some help, particularly with using the newer xlsx ODBC driver, but no other clues.
Thanks in advance,
------------------------------------------------
Edit 10/14
------------------------------------------------
After some investigation and experimentation, it appears that the xlsx ODBC driver ("Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)") is actually honoring the cell formatting for the column in question. I have asked the person who creates the file to format the cells as 'text' in their master copy and that should correct the problem.
Just FYI, the reason for struggling to read the Excel data as it comes from the supplier rather than just save it in a differnent format or other modifications, like using CSV files and reading the data however I see fit, is that these files are part of a very large (over 100 input files, some text, some Excel) and complex process that is automated. If I were to add changing data formats to the 'setup step' of the process, that would introduce places where mistakes could be made. I am the architect / implementor of the process an applications but an intern or other such 'administrative assistant' type person would be doing the work and complicating their instructions / steps is not desirible.
So, I'm going to stop chasing this for a while and see if the producer can make the format changes.
Thanks.