Html Parsing Help Needed


Well-known member
Feb 2, 2013
Programming Experience
I am trying to write a small app that gets some data from a website....I am using the HtmlAgilityPack to parse the html.
I can get most of the data I need to get because it is formatted in a way that makes it easy to locate. there are a few pieces of data that aren't formatted so well and this is where I need some help. below is the html that I need to parse for the data. the "div class="product-details" tag has the data I need but to find a way to separate each line. when I get the data is all jumbled into a single object.anyone know how to separate it out
so I have for example
"Bearings for Ariens 54063/ 45-035" by itself
"Ariens 54063" by itself
"Outer Dimension: 1-3/4" by itself
"Inner Dimension: 3/4" by itself
"Width: 1/2" by itself
and so on......

here is the html

<div class="product-details" style="margin-bottom:20px;">
        <h1 style="font-style:italic; border-bottom:1px solid #4C0000;">Product Description</h1>
<p>Bearing for Ariens 54063 / 45-035</p>
<b><u>Replaces OEM: </u></b>
 Ariens 54063
<b><u>Size: </u></b>
Outer Dimension: 1-3/4"
Inner Dimension: 3/4"
Width: 1/2"
Snowthrower Input Shaft Support Bearing
 Models 824, 932 and 1032. Model Numbers: 924022, 024, 026,
 028, 029, 032, 046, 048, 050, 072, 074, 075, and 079.
<br>Oregon Part Number: 45-035

here is the code I use to get the data

 HtmlWeb page = new HtmlWeb();
            HtmlAgilityPack.HtmlDocument webPage = page.Load(testWeb);
            HtmlNodeCollection headerList = webPage.DocumentNode.SelectNodes("//*[@id='location']/a");
            lastBreadCrumb = webPage.DocumentNode.SelectSingleNode("//*[@id='location']/font").InnerText;
            partTitle = webPage.DocumentNode.SelectSingleNode("//*[@id='center-main']/h1").InnerText;
            partNumber = webPage.DocumentNode.SelectSingleNode("//*[@class='property-value']").InnerText;
            listPrice = webPage.DocumentNode.SelectSingleNode("//*[@class='property-value product-taxed-price']").InnerText;
            ourPrice = webPage.DocumentNode.SelectSingleNode("//*[@class='product-price-value']").InnerText;
            partDesc = webPage.DocumentNode.SelectSingleNode("//*[@id='center-main']/div[1]/div/div[2]").InnerText;
            string[] descParts = partDesc.Split('\n');

thank you for any help

Top Bottom