Click here to Skip to main content
15,892,674 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi everybody,

For many times I have used a StreamREader to obtain the entire content of a .html page of the web to later work with it using the following method:

VB
Private Function GetPageText() as String
    Dim inputStr As String = ""
    Dim thiClient As New Net.WebClient
    Dim respStream As IO.Stream = Nothing
    Dim stmRd As IO.StreamReader = Nothing

    sespStream = thiClient.OpenRead("http://" & site &   filePath & "/TopPage_7500.html")

    If respStream.ReadByte() <> -1 Then
       stmRd = New IO.StreamReader(RespStream)
       inputStr = StmRd.ReadToEnd
    End If

    '****
    Return inputStr
End function


This has always worked for me until today; all I am getting is this garbage: �<
Could anybody offer guidance on this matter? the page has 102KB of data. Here is the begining of it:

XML
<html xmlns:axsl="http://localhost" lang="es">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>Inicio</title>
    <style type="text/css"></style>
    <meta http-equiv="Content-Script-Type" content="text/javascript">
    <meta http-equiv="Cache-Control" content="no-cache">
    <meta http-equiv="Pragma" content="no-cache">
    <meta http-equiv="Expires" content="-1">
    <script language="JavaScript" src="/scripts/common.js" type="text/javascript"></script>
    <script language="JavaScript" src="/scripts/reload.js" type="text/javascript"></script>
    <script language="JavaScript" src="/scripts/config.js" type="text/javascript"></script>
    <link href="/css/common.css" type="text/css" rel="stylesheet">
    <script language="Javascript" type="text/javascript">
        function reloadPage(){
            location.reload();
        }
    </script>
    <script language="javascript">
        var wsMenu_jumpUrl_control = ""

;

Thank you for your time and help.

Best regards to all,

Alex.
Posted
Updated 11-Sep-10 11:51am
v2
Comments
Sandeep Mewara 12-Sep-10 0:38am    
Sure your '"http://" & site & filePath & "/TopPage_7500.html"' url formed correctly?

1 solution

Hi Sandeep Mewara,

Thank you for your help. I am 100% sure of it since I have another file in the same path that is read as expected.

respStream = thiClient.OpenRead("http://" & site & filePath & "/TopPage_2500.html")

Here is begining of it:
XML
<html lang="es">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <style type="text/css">body {background-image:url('/images/bkg03.gif');background-repeat:repeat-x;}</style>
    <style type="text/javascript">
ids.iL.position = "absolute";
ids.iL.visibility = "hidden";
tags.td.color="white";
</style>
    <script language="javascript" type="text/javascript">
//??????????????
var brw_v = navigator.appVersion.charAt(0);
var brw_n = navigator.appName.charAt(0);
var iIE4 = false;
var iNN4 = false;
var iNN6 = false;
if((brw_v >= 4)&&(brw_n == "M"))iIE4 = true;
if((brw_v >= 4)&&(brw_v < 5)&&(brw_n == "N"))iNN4 = true;
if((brw_v >= 5)&&(brw_n == "N"))iNN6 = true;
var Laymax = 4; //?????????
var layX = 250; //?????????
var layY = 0;   //?????????
var layW = 250; //???????
var layH = 100; //????????
var apos = "'"


Actually, here is the complete path to both files, perhaps you will be able to see what escapes my eye.

Working one:
http://200.49.137.1/abraun/PrintTest/TopPage_2075[^]

Non-working one:
http://200.49.137.1/abraun/PrintTest/TopPage_7500[^]

Please feel free to access these pages at any time and as many times as you see fit. One thing I just noticed. The copy of the _7500.html page I have stored in the local HD does open properly in any browser when doubled clicked, but the web downloaded version returns pure garbage. Chrome even sais it is in (Simplified Han) Chinese ???? This very well may be the issue, I just do not know how to solve it.

Once again, thank you for your time Sandeep.

Alex.


Sandeep,

I have played with the encoding used by the streamreader, and using the System.Text.Encoding.BigEndianUnicode I get this: ︼html xmlns:axsl="http://localhost" lang="es">

This is obviously not the entire content of the page, but it tells me that it is an encoding issue, which unfortunately I have no idea how to solve. Could you offer some advice?

Thanks Again.
 
Share this answer
 
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900