Introduction
This article introduces a program that can extract message headers from Newsgroups, and can export to text file or Microsoft Excel file.
Background
For a long time, I was looking for a tool that can extract information from newsgroup server, which can help in analyzing what topic people mostly focus on, who are concerning with what content, where are they from, who are the most active people in the newsgroup, etc. But unfortunately, I could not get one from the Internet. Some can be used, but they need payment. So, I decided to write a program myself.
NNTP Wrapper Class
There are already some articles on The Code Project describing NNTP commands. The one I mostly referred to is the article written by TY Lee. I extended a few of the classes to make them more suitable for exposing user’s interface.
Basically the classes follow the NNTP definition, and use network stream to communicate with newsgroup server. Class NewsgroupClient
wraps the methods that can send commands to server and retrieve data correspondingly. For example, ListGroup()
method lists the newsgroup created on the server. SelectGroup()
method selects a group as the active one and retrieves article ranges. DownloadHeaders()
method gets article headers from the current active group.
The connections are constructed in the same thread of UI, so the Application.DoEvents()
method is inserted in some places of the code to make sure that the user can operate the application while data is transferring.
When each article header is retrieved, an event is fired so that the information can be displayed immediately.
User Interface
The user interface of this program is divided into two portions: the Newsgroups tree on the left side of the Form, and the article list on the right. Here I use SpringSys OrchidGrid to construct the main parts of the UI, because it can work in tree mode and support exporting data to Excel.
The left tree view has two levels, the top level is for the server nodes, the second level displays the newsgroups on each server.
After adding a Newsgroup server, a NewsgroupClient
object will be created and stored in the server node. Meanwhile, all the newsgroups created on the server will be listed by calling the ListGroup()
method. Obviously, one NewsgroupClient
corresponds to a Newsgroup server and is responsible for all the later network communications.
The code below retrieves the Newsgroups from a server and adds them to the server node:
private bool _updating = false;
private void UpdateGroups(GridTreeNode node)
{
NewsgroupClient ngClient = node.Tag as NewsgroupClient;
if (ngClient == null) return;
_updating = true;
this.StartProgress();
if (!ngClient.Connected)
{
string server = node.Data as string;
this.lblMsg.Text = string.Format("Connecting to server {0} ......", server);
if (!ngClient.Connect(server))
{
_updating = false;
this.StopProgress();
MessageBox.Show("Error connecting to server " + server);
return;
}
}
node.ClearChildren();
int index = node.Row.Index + 1;
this.lblMsg.Text = "Retrieving groups from server......";
string[] newsgroups = ngClient.ListGroup();
ogServer.Redraw = false;
for (int i = 0; i < newsgroups.Length; i++)
{
Row row = this.ogServer.Rows.Insert(index + i, newsgroups[i]);
row.TreeImage = this.imageList1.Images[1];
}
ogServer.Redraw = true;
this.lblMsg.Text = "Completed!";
this.StopProgress();
this.ogServer.AutoSizeColWidth();
_updating = false;
}
The data of the tree will be persisted into a text file when the application is closed. Next run the program, and data will be restored from the text file. This is done by the PersistGroups()
and LoadGroups()
methods.
By checking the nodes, we can specify which groups are going to be explored before downloading the article headers. Here we use the check box node feature of the grid. The code below would make sure that checking a server node would check all the group nodes belonging to that server.
this.ogServer.Tree.CheckAction = TreeCheckAction.Children;
The following code browses each row of the tree grid and downloads only for the nodes that are checked.
private bool _downloading = false;
private void btnDownload_Click(object sender, EventArgs e)
{
if (_downloading || _updating) return;
try
{
_downloading = true;
this.StartProgress();
foreach (Row row in ogServer.Rows)
{
if (row.IsNode) continue;
if (row.UserData != null) continue;
if (row.TreeChecked == CheckState.Checked)
DownLoadHeaders(row);
}
}
catch
{
}
finally
{
this.StopProgress();
_downloading = false;
if (_currentClient != null)
_currentClient._forceStop = false;
}
}
NewsgroupClient _currentClient;
private void DownLoadHeaders(Row row)
{
GridTreeNode node = row.Node;
NewsgroupClient ngClient = node.Tag as NewsgroupClient;
if (ngClient == null) return;
_currentClient = ngClient;
_currentClient._forceStop = false;
if (!ngClient.Connected)
{
string server = node.Data as string;
this.lblMsg.Text = string.Format("Connecting to server {0} ......", server);
if (!ngClient.Connect(server))
{
_updating = false;
this.StopProgress();
MessageBox.Show("Error connecting to server " + server);
return;
}
}
string group = row[0] as string;
if (group == null) return;
this.lblMsg.Text = string.Format("Downloading from group {0} ......", group);
try
{
ngClient.SelectGroup(group);
ArrayList headers = ngClient.DownloadHeaders
(ngClient.CurrentGroup.LowID, ngClient.CurrentGroup.HighID);
if (headers == null)
{
this.lblMsg.Text = "Download message header failed";
}
else
{
this.lblMsg.Text = "Success!";
row.Style = _visitedStyle;
row.UserData = "Visited";
}
}
catch
{
this.lblMsg.Text = string.Format
("Error happen while downloading from group {0}", group);
}
finally
{
}
}
Our target is not only to display the article headers in a friendly user interface, but also the most important one is to export the data into a text file or Excel file for later processing. Fortunately, OrchidGrid
has built-in support for data exporting, they are methods ExportToDelimitedFile()
and ExportToExcel()
. We don't need to write extra code for this functionality. Please look at the code.
You can also write your own exporting code if you like. In this application, I commented some code that would export only the email address of the article author to a text file.
During the network operations, progress bar and message titles are all active to indicate the progress. You can stop or cancel an operation at any moment as well.
If you need other header data in addition to “Subject,” “From”, “Date”, you can modify the ArticleHeader
class and adjust the code for your project.
Try a Sample Server
Let’s try a newsgroup server for example – “msnews.microsoft.com”, it has a bunch of Newgroups, and some contain thousands of articles.
Input the server address “msnews.microsoft.com” and press the Enter key, the server and the newsgroups on that server are listed on the left tree. Check some newsgroups as you like and click the button “Download Message Headers”, you will get all the headers in the selected newsgroups. Then, you can export the headers to text or Excel file.
Please see the screen shot of this application at the top of this page.
Hope you like this tool and think it is useful.
History
- 9th September, 2006: Initial post
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.