Introduction
This project is not earth shattering or revolutionary, it is simply a means of
coming to terms with ASP.NET and C# development on my part, and to hopefully
expose some knowledge and ideas to others.
This project began with the need to create an intranet portal that contained,
among other things, the local weather forecast. The design for the forecast
information was to be just like a local TV station�s web site. Since I could
not use their site, nor was paying for a service to provide the information an
option, it was determined that screen scrapping the local TV site would be a
good solution. I decided this would be a good introduction the .NET world so I
used ASP.NET with C# as the coding language.
WARNING
It should go without saying that screen scraping is not the best solution in
many cases. You are completely at the mercy of the third party site, if the
layout changes, you must rework you solution. It may also present some legal
question as to your rights to use someone else�s work.
Details
The first step in the design was to call up the providing site,
http://www.pittsburgh.com/partners/wpxi/weather/ in this case, and look
at the HTML to find the information needed. In my case I was able to search for
the heading
<B>Current Conditions for Pittsburgh</B>
The weather information was found in two tables so it was just a matter of
searching the HTML text and extracting the tables. I could then pass this as
the innerHTML content for a table on my webpage.
<TABLE id="Table1" width="100%" border="0">
<TR>
<TD align="middle" colSpan="2"><STRONG>Local Weather Forecast</STRONG></TD>
</TR>
<TR>
<TD><%=GetWeather()%></TD>
<TD><%=GetForecast()%></TD>
</TR>
<TR>
<TD align="middle" colSpan="2">information provided by WPXI</TD>
</TR>
</TABLE>
Aquire the HTML
Using the .NET library it was easy to aqurire the HTML from the site. As can be
seen we just need to create a WebResponse
object and feed the
ResponseStream
into a instance of StreamReader
. From there I parse through it to remove the
empty lines and assign the result to a string. I'm using StringBuilder.Append
method as an alternative to appending to the string based on the recommendation
of Charles Petzold in
Programming Microsoft Windows with C#. Here he demonstrates the using
the StringBuilder
is 1000x faster than appending to a string.
WebRequest req = WebRequest.Create(strURL);
StreamReader stream = new StreamReader(req.GetResponse().GetResponseStream());
System.Text.StringBuilder sb = new System.Text.StringBuilder();
string strLine;
while( (strLine = stream.ReadLine()) != null )
{
if(strLine.Length > 0 )
sb.Append(strLine);
}
stream.Close();
m_strSite = sb.ToString();
Extract the tables
After the text has been acquired it is simply a matter of extracting and
returning the substring. To fix the relative path of the images I run the
substring through another method to insert the absolute path before returning
it.
private string FindWeatherTable()
{
int nIndexStart = 0;
int nIndexEnd = 0;
int nIndex = 0;
try
{
if( (nIndex = Find("Current Conditions for Pittsburgh", 0)) > 0 )
{
nIndexStart = Find("<TABLE", nIndex);
if(nIndexStart > 0 )
{
nIndex = Find("</TABLE>", nIndex);
if(nIndex > 0 )
{
nIndexEnd = Find("</TABLE>", nIndex+1);
if(nIndexEnd > 0 )
nIndexEnd += 8;
}
}
}
return CorrectImgPath(m_strSite.Substring(nIndexStart,
nIndexEnd - nIndexStart));
}
catch(Exception e)
{
return e.Message;
}
}
private string CorrectImgPath(string s)
{
int nIndex = 0;
try
{
string strInsert = "http://www.pittsburgh.com";
while( (nIndex = s.IndexOf("/images/",
nIndex + strInsert.Length + 1)) > 0 )
{
s = s.Insert(nIndex, strInsert);
}
return s;
}
catch(Exception e)
{
return e.Message;
}
}
Conclusion
The complete site used ADO.NET to connect to a SQL Server database and
provide the viewer with schedule and appointment information as well
as corporate information. They also had the ability to add events to their
calendar. For simplicity I choose not include these features in this sample. I
just wanted to share a beginner C# and ASP.NET exploration to give others
some ideas.