Pages

28 Aug 2013

Reading HTML Data from any website

Hi,

This is how I solved the problem of extracting info from a html table.
Means, The site which have data in a tabular or list format, then you can read data by using HTMLAgility Pack.


Download HTMLAgilityPack.dll and add referce to it in your project.



Sample code are written as:-

HTML
<BODY>
<TABLE>
<TR>
<TD>Row 0, Col 0</TD>
<TD>Row 0, Col 1</TD>
</TR>
<TR>
<TD>Row 1, Col 0</TD>
<TD>Row 1, Col 1<TD>
</TR>
</TABLE>
</BODY>

Code
// Load the html document
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://myServer/myTable.htm");

// Get all tables in the document
HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//TABLE");

// Iterate all rows in the first table
HtmlNodeCollection rows = tables[0].SelectNodes(".//TR");
for (int i = 0; i < rows.Count; ++i) {

// Iterate all columns in this row
HtmlNodeCollections cols = rows[i].SelectNodes(".//TD");
for (int j = 0; j < cols.Count; ++j) {

// Get the value of the column and print it
string value = cols[j].InnerText;
Console.WriteLine(value);
}
}

Result
Row 0, Col 0
Row 0, Col 1
Row 1, Col 0
Row 1, Col 1

No comments:

Post a Comment