Click here to Skip to main content
15,891,136 members
Articles / Programming Languages / XML

[Obsolete] Image Capture Whole Web Page using C#

Rate me:
Please Sign up or sign in to vote.
4.74/5 (60 votes)
22 Jun 2005CPOL4 min read 936.5K   16.9K   234   206
[Obsolete]Capture whole web pages as a single image using C#.

Sample Image - capture.gif

Introduction

[Obsolete] - I am updating this to say that the below code is obsolete now that modern browsers all have tons of page capture extensions.  I am leaving the code for historical/hysterical sake, but don't waste time trying to implement it.  - Doug

--------------------------------------------------------------------------------------------------------------------------

This article presents a C# routine for capturing an entire web page as an image. Many capture examples show how to grab a screen shot, but do not show how to gather information that is below the scrolling region of an application. The most common example of a scrolling problem or “run-over” program is a web page.

This application grabs the page, plus, as a bonus, it demonstrates how to let the client adjust the size of the image and the quality of the JPEG. It shows how to write the name of the webpage onto the image, draw Standard Resolution Guides, save a bitmap as a JPEG and open the directory where the captures are stored.

Background

In a recent application, I wanted to provide our Quality Assurance testers the ability to capture an entire web page. I wanted them to do this by clicking a button from within a BHO (Browser Helper Object) that is used for another testing task. I also wanted to reduce the size of the capture, because the images are e-mailed and can quickly fill up our mailbox quotas.

Using the code

The easiest way to use this code is to download the source, trim out the code functions that may not be wanted (quality of capture, size of image, URL writing, guides, or the open directory function). After the code is trimmed down and the program can compile without errors, copy the source and its dependencies into the desired project.

The first issue to face when copying the source code into a project is the need to refer SHDocVw.dll and MSHTML.dll. In Visual Studio, go to Project, Add Reference, and then select the COM tab. Now, go down to the Microsoft section and look for "Microsoft Internet Controls". Select it, and then find "Microsoft HTML Object Library" (see the above image).

After adding the references, add these necessary directives into the project. (A few other directives are needed, if the code is not loaded into a form.)

C#
using System.Text;
using System.Runtime.InteropServices;
using System.Diagnostics;
using System.IO;
using System.Drawing.Imaging;
using SHDocVw;
using mshtml;

Import user32 functions

C#
[DllImport("user32.dll", CharSet=CharSet.Auto)]
public static extern IntPtr FindWindowEx(IntPtr parent /*HWND*/, 
  IntPtr next /*HWND*/, string sClassName, IntPtr sWindowTitle);

[DllImport("user32.dll", ExactSpelling=true, CharSet=CharSet.Auto)] 
public static extern IntPtr GetWindow(IntPtr hWnd, int uCmd); 

[DllImport("user32.Dll")]
public static extern void GetClassName(int h, StringBuilder s, int nMaxCount);

[DllImport("user32.dll")]
private static extern bool PrintWindow(IntPtr hwnd, IntPtr hdcBlt, uint nFlags);

public const int GW_CHILD = 5; 
public const int GW_HWNDNEXT = 2;

Find an open browser and assign a browser document for it.

C#
SHDocVw.WebBrowser m_browser = null;
SHDocVw.ShellWindows shellWindows = new SHDocVw.ShellWindowsClass();

//Find first availble browser window.
//Application can easily be modified to loop through and
//capture all open windows.
string filename;
 foreach (SHDocVw.WebBrowser ie in shellWindows)
 {
     filename = Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
     if (filename.Equals("iexplore"))
     {
         m_browser = ie;
         break;
     }
 }
 if (m_browser == null)
 {
     MessageBox.Show("No Browser Open");
     return;
 }

 //Assign Browser Document
 mshtml.IHTMLDocument2 myDoc = (mshtml.IHTMLDocument2)m_browser.Document;

The width and height of the web page must be determined along with the resolution settings of the clients screen.

C#
//Set scrolling on.
myDoc.body.setAttribute("scroll", "yes", 0);

//Get Browser Window Height
int heightsize = (int)myDoc.body.getAttribute("scrollHeight", 0);
int widthsize = (int)myDoc.body.getAttribute("scrollWidth", 0);

//Get Screen Height
int screenHeight = (int)myDoc.body.getAttribute("clientHeight", 0);
int screenWidth = (int)myDoc.body.getAttribute("clientWidth", 0);

To capture the whole web page, fragments of the page will have to be grabbed and stitched together to make the whole page. After the first fragment is captured, the browser is scrolled down for the next capture. As the fragments are captured, they are stitched into a target bitmap. The process is repeated until the whole page is captured. For pages that are wider than the clients screen, the page gets scrolled over horizontally, and then the above process is repeated.

C#
//Get bitmap to hold screen fragment.
Bitmap bm = new Bitmap(screenWidth, screenHeight,
   System.Drawing.Imaging.PixelFormat.Format16bppRgb555);

//Create a target bitmap to draw into.
Bitmap bm2 = new Bitmap(widthsize + URLExtraLeft, heightsize +
   URLExtraHeight - trimHeight,
        System.Drawing.Imaging.PixelFormat.Format16bppRgb555);
Graphics g2 = Graphics.FromImage(bm2);

Graphics g = null;
IntPtr hdc;
Image screenfrag = null;
int brwTop = 0;
int brwLeft = 0;
int myPage = 0;
IntPtr myIntptr = (IntPtr)m_browser.HWND;

//Get inner browser window.
int hwndInt = myIntptr.ToInt32();
IntPtr hwnd = myIntptr;
hwnd = GetWindow(hwnd, GW_CHILD);
StringBuilder sbc = new StringBuilder(256);

//Get Browser "Document" Handle
while (hwndInt != 0)
{
    hwndInt = hwnd.ToInt32();
    GetClassName(hwndInt, sbc, 256);

    if(sbc.ToString().IndexOf("Shell DocObject View", 0) > -1)
    {
        hwnd = FindWindowEx(hwnd, IntPtr.Zero,
            "Internet Explorer_Server", IntPtr.Zero);
        break;
    }
    hwnd = GetWindow(hwnd, GW_HWNDNEXT);
 }

//Get Screen Height (for bottom up screen drawing)
while ((myPage * screenHeight) < heightsize)
{
    myDoc.body.setAttribute("scrollTop", (screenHeight - 5) * myPage, 0);
    ++myPage;
}

//Rollback the page count by one
--myPage;

int myPageWidth = 0;
 while ((myPageWidth * screenWidth) < widthsize)
{
    myDoc.body.setAttribute("scrollLeft", (screenWidth - 5) * myPageWidth, 0);
    brwLeft = (int)myDoc.body.getAttribute("scrollLeft", 0);
    for (int i = myPage; i >= 0; --i)
    {
        //Shoot visible window
        g = Graphics.FromImage(bm);
        hdc = g.GetHdc();
        myDoc.body.setAttribute("scrollTop", (screenHeight - 5) * i, 0);
        brwTop = (int)myDoc.body.getAttribute("scrollTop", 0);
        PrintWindow(hwnd, hdc, 0);
        g.ReleaseHdc(hdc);
        g.Flush();
        screenfrag = Image.FromHbitmap(bm.GetHbitmap());
        g2.DrawImage(screenfrag, brwLeft + URLExtraLeft, brwTop +
           URLExtraHeight);
    }
    ++myPageWidth;
}

Finally, save the above target to a time stamped JPEG file.

Points of Interest

I had a lot of fun and suffered a lot of frustration with this project. The captures are really nice. Try it out on one of the "Code Project" pages.

Not shown in this article, but available in the source is the saving of the file to JPEG. I tried GIF and bitmap, but settled on JPEG for size. The main goal was to be able to e-mail these files without taking up a lot of our mailbox quota.

In the actual application, I have an option to copy the file to the clipboard. I never was able to get the clipboard image into a "device dependent bitmap" state that didn't take up much size. I would copy the image, and then paste it into my Outlook e-mail, only to have the e-mail be about a MB big. When I would open the JPEG in Photoshop, then select it, copy it and paste it into Outlook, the Adobe device dependent bitmap was under 100 KB. The same happened with the simple Windows Paintbrush application.

Because of time constraints, I settled on just copying the JPEG file to Outlook. Any solutions on how to turn a large device independent bitmap into a bitmap with a small memory footprint would be welcomed.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States
My experience with programming began with Turbo Pascal while working on my Physics degree back in 1989.

After getting out of school, I used pre-VBA Excel macros to write some really fancy applications to help with the job I was doing. This inspired me to try to write "Windows" programs and to search out Visual Basic 3.0.

I wrote a bunch of small applications and ran them against Access and FoxPro. However, this still wasn't my primary job.

In 1994, I went on my first contract, a 3-month deal that turned into 3-years. I learned a lot more about development. Development was in VB3, VB4 and ASP. I got a chance to admin NT4 and SQL Server 6 and 6.5.

After moving on to another company, I spent another 2 years with VB and then 5 years with Java and JSP.

In March of 2004, I installed Visual Studio 2003. I tasted C#, and became hopelessly addicted.

My other interests are my 3 sons, my wife Smile | :) , metal detecting, yard work, travel and learning new things.

update: Now love SharePoint, PowerShell, JQuery, C#, PHP

location: Atlanta, Georgia

Comments and Discussions

 
GeneralVery nice feature. Can you please do same thing for Winforms Pin
Madhavaraok11-Jul-07 2:13
Madhavaraok11-Jul-07 2:13 
QuestionCan you provide same thing for Windows application Pin
Madhavaraok11-Jul-07 2:10
Madhavaraok11-Jul-07 2:10 
AnswerRe: Can you provide same thing for Windows application Pin
Ravi Sant20-Jun-11 1:34
Ravi Sant20-Jun-11 1:34 
GeneralmyDoc.body.getAttribute("clientHeight", 0) - return value Pin
rmandel11-Jun-07 5:37
professionalrmandel11-Jun-07 5:37 
GeneralKnowing when loading is complete Pin
mikemcmeekin4-Apr-07 7:08
mikemcmeekin4-Apr-07 7:08 
GeneralRe: Knowing when loading is complete Pin
Shuriken8717-May-09 18:15
Shuriken8717-May-09 18:15 
GeneralActually dont worry I appear to have this working now Pin
Sacha Barber19-Feb-07 5:43
Sacha Barber19-Feb-07 5:43 
GeneralA simliar idea that I need some help with PLEASE [modified] Pin
Sacha Barber16-Feb-07 23:50
Sacha Barber16-Feb-07 23:50 
My name is Sacha Barber. I am doing a project (article for codeproject) and am 90% done, but I am having a little trouble with the last 10%, and its simliar problem to this actually.

Would you mind if I asked for your help with it. It is as follows:

The problem is that I have a panel control (that I want to print) that contains other child controls in it. This panel is on a form, but is too large to show all its contents, so has scrollbars shown.

The panel comtains child controls, that in some case are owner drawn (OnPaint stuff)

Is it possible to print control contents all of it, that is outside of the viewable area (past the scroll bars) and the area seen (viewable) on the form

I have tried the standard control solution

Control.SavaAsBitmap


And also tried using the


[DllImport("GDI32.DLL", EntryPoint="BitBlt", SetLastError=true,
CharSet=CharSet.Unicode, ExactSpelling=true,
CallingConvention=CallingConvention.StdCall)]
static extern bool BitBlt(IntPtr hdcDest, int nXDest, int nYDest,
int nWidth, int nHeight, IntPtr hdcSrc,
int nXSrc, int nYSrc, System.Int32 dwRop);


All have the same effect, which is that they simply print the shown area on the panel with scrollbars and nothing more.

There is loads more that should be saved as part of the bitmap (the stuff past the scrollbars basically)

Perhaps what im trying to do is just not possible.


You see, I can print a datagrid, with one 4 lines shown (that has another 64 outside the viewable area (only viewable if scrolled into view)) this works fine. But its all one control, so that could explain why this saves/prints just fine.

I should be able to do the same with one large control that has several other controls contained. Shouldnt I?



Do you see the problem


I have also tried (in vain) the following code examples

http://www.codeproject.com/csharp/FormPrintPackage.asp?df=100&forumid=369774&noise=5&select=1898593&msg=1898593

http://www.codeproject.com/csharp/imagecapture.asp
http://www.bobpowell.net/capture.htm


I know that this can be done, I just dont have the skills to do it, whilst you obviously do.

I also know that I can perform scrolling like you have done, using the AutoScrollPosition = new Point(x,y) for my scrollable control. So im nearly there I could just do with some help from you.


If you could just see it in your heart to help me with this, issue. I would be extremely grateful.
You would have not only get very mucho respect from me, but I would also mention your assistance in my article and would be happy to include a link to this article on my new article.

Basically I am trying to bribe you, by the offer of generally highly praising your excellent efforts (this one and possible help), and your comradery, and just what a fine fellow you are for helping out another codeprojecter in his time of need

Did I mention I also gave this article a 5 Vote score.

Have I got your help yet, plead, beg, wimper, whine etc etc

I also think my new article may do OK, so it could be good for both of us, no??

PLEASE PLEASE PLEASE PLEASE HELP ME
PLEASE PLEASE PLEASE PLEASE HELP ME
PLEASE PLEASE PLEASE PLEASE HELP ME
PLEASE PLEASE PLEASE PLEASE HELP ME


Another thing, although its simliar to this code, I have been looking and looking and cant find a singl examnple that allows a standard scrollable windows form control to be saved to an image, like you are showing here. So this could be another very good article for you to actually publish here.


All failing this im going to finish the article post it, and ask someone with more GDI+ experience to take it on.






-- modified at 6:33 Saturday 17th February, 2007


-- modified at 6:40 Saturday 17th February, 2007


-- modified at 6:57 Saturday 17th February, 2007


-- modified at 7:04 Saturday 17th February, 2007

sacha barber

GeneralCpatuer Web Image into Excel_Form VB Pin
B.L.Praveen11-Jan-07 22:54
B.L.Praveen11-Jan-07 22:54 
GeneralRe: Cpatuer Web Image into Excel_Form VB Pin
Douglas M. Weems13-Jan-07 10:56
Douglas M. Weems13-Jan-07 10:56 
GeneralInternet Explorer -> doesn't work for this pages Pin
Dorian Darius4-Dec-06 22:09
Dorian Darius4-Dec-06 22:09 
GeneralRe: Internet Explorer -> doesn't work for this pages Pin
Not_Possible19-Dec-06 0:55
Not_Possible19-Dec-06 0:55 
GeneralRe: Internet Explorer -&amp;gt; doesn't work for this pages [modified] Pin
givemempower824-Jan-07 7:02
givemempower824-Jan-07 7:02 
GeneralRe: Internet Explorer -&amp;gt; doesn't work for this pages Pin
Sire40418-Jun-07 23:09
Sire40418-Jun-07 23:09 
QuestionProblem in getting ScreenShot of client computer screen (asp.net) Pin
jitendra kumar rajput1-Dec-06 2:25
jitendra kumar rajput1-Dec-06 2:25 
AnswerRe: Problem in getting ScreenShot of client computer screen (asp.net) Pin
Recep Guzel19-Jan-07 4:30
Recep Guzel19-Jan-07 4:30 
QuestionVery Nice Work. Pin
David Strickland / Swingvote28-Nov-06 3:54
David Strickland / Swingvote28-Nov-06 3:54 
GeneralFirefox Pin
Dorian Darius24-Nov-06 0:11
Dorian Darius24-Nov-06 0:11 
GeneralRe: Firefox Pin
Douglas M. Weems24-Nov-06 14:39
Douglas M. Weems24-Nov-06 14:39 
GeneralRe: Firefox [modified] Pin
Dorian Darius30-Nov-06 22:41
Dorian Darius30-Nov-06 22:41 
GeneralRe: Firefox Pin
Dorian Darius2-Dec-06 4:56
Dorian Darius2-Dec-06 4:56 
GeneralRe: Firefox Pin
Not_Possible19-Dec-06 1:00
Not_Possible19-Dec-06 1:00 
GeneralRe: Firefox Pin
omar galeano3-Aug-08 12:44
omar galeano3-Aug-08 12:44 
GeneralDoesn't work in Win2K Pin
jheckmann24-Sep-06 12:28
jheckmann24-Sep-06 12:28 
QuestionScreen capture tool for outlook express Pin
arvindhr15-Sep-06 19:37
arvindhr15-Sep-06 19:37 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.