
November 2nd, 2003, 10:39 AM
|
|
Contributing User
|
|
Join Date: Sep 2002
Location: Norway
Posts: 184
Time spent in forums: 1 h 22 m 24 sec
Reputation Power: 7
|
|
|
Screen scraping between two points
I have permissions to scrape a Norwegian page, but I need to scrape it between two points in the source-code. I'm able to do this in regular ASP, but then it won't read Norwegian charachters properly. I have found a solution for that in ASP.NET using UTF7-encoding. My problem is that this code scrapes the entire page, I need the code only to read between two points. Is anyone able to help rewrite this code so it will be able to scrape between two points? I don't know ASP.NET myself so I'm not able to do it.
Code:
<%@ Import Namespace="System.Net" %>
<%@ Import Namespace="System.IO" %>
<script language="VB" runat="server">
Sub Page_Load(Src As Object, E As EventArgs)
myPage.Text = readHtmlPage("http://www.yourdomain.no")
End Sub
Function readHtmlPage(url As String) As String
Dim objResponse As WebResponse
Dim objRequest As WebRequest
Dim result As String
objRequest = System.Net.HttpWebRequest.Create(url)
objResponse = objRequest.GetResponse()
'use UTF7 Encoding to read special charachters and the Norwegian alphabet
Dim sr As New StreamReader(objResponse.GetResponseStream(), System.Text.Encoding.UTF7)
result = sr.ReadToEnd().toString
'clean up StreamReader
sr.Close()
return result
End Function
</script>
<html>
<body>
<asp:literal id="myPage" runat="server"/>
</body>
</html>
|