Answered WebBrowser how to find wfd-id

CrashedCrash · Aug 24, 2020

Dear Community I need ur help,
im strugglin right now with it to find an element in html source over the wfd-id to copy the blank text from there.
in the attachment u find the html code. Maybe one of u have an solution :'(
would IE11 load the wfd-id`s I can grap it easy but it dont do.

Example:

            if (WebBrowser.ReadyState == WebBrowserReadyState.Complete)
            {
                foreach (HtmlElement Element in WebBrowser.Document.GetElementsByTagName("span"))
                {
                    if (Element.GetAttribute("wfd-id") == "136")
                    {
                        WebBrowser.Document.ExecCommand("Copy", false, null);
                        label1.Text = Clipboard.GetText();
                    }
                }
            }

NoUserHere · Aug 24, 2020

Welcome to the forums.

May I ask why you are scraping data?
Is an API not available for the website?
If it's your website, why don't you write an API for API calls?

You're close to achieving this. I must have answered this question a thousand times over the years. Shockingly people are still scraping sites with this control instead of relying or calling an API instead.

What you are looking for is GetElementByID : HtmlDocument.GetElementById(String) Method (System.Windows.Forms)

JohnH · Aug 24, 2020

All you need to do is to get Element.InnerText

JohnH · Aug 24, 2020

Sheepings said:
What you are looking for is GetElementByID

The element has no id.

NoUserHere · Aug 24, 2020

Should be able to use wfd-id as the ID. I've not tried it. Long long time since I scraped with the WBC.

Can also use GetElementByTagName as they are doing. But additional filtering will be required if there are many of the same tags.

@CrashedCrash can you provide the URI of the source page so I can try it?

NoUserHere · Aug 24, 2020

I know what you're saying through @JohnH. It should be an actual id=

Skydiver · Aug 24, 2020

There is also the HTML agility pack HTML agility pack which may make this a little easier.

NoUserHere · Aug 25, 2020

Sheepings said:
Should be able to use wfd-id as the ID. I've not tried it. Long long time since I scraped with the WBC.

Well that was a silly suggestion. I wasn't sure if get element by id was strict on actually wanting an actual ID="myid". Anyway...
Assume for example sake the html is :

C#:

<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <title></title>
</head>
<body>
    <span>I like TextB</span>
    <table border="0">
        <tr>
            <td>
                <span data-toggle="tooltip" data-placement="top" title="Click To Copy" wfd-id="130">TextA</span>
            </td>
        </tr>
        <tr>
            <td>
                <span data-toggle="tooltip" data-placement="top" title="Click To Copy" wfd-id="135">TextB</span>
            </td>
        </tr>
        <tr>
            <td>
                <span data-toggle="tooltip" data-placement="top" title="Click To Copy" wfd-id="135">TextB</span>
            </td>
        </tr>
    </table>
</body>
</html>

And some sample C# code :

C#:

            HtmlElementCollection htmlElementCollection = webBrowser1.Document.GetElementsByTagName("span");
            if (htmlElementCollection != null)
            {
                foreach (HtmlElement obj in htmlElementCollection)
                {
                    if (obj.InnerText.Contains("TextB"))
                    {

                    }
                }
            }

Checking only the inner text isn't very helpful if the same phrase is listed partially more than once on the page. So If this was the case, you'd end up with more than one entry meeting your conditional logic. So I can understand why people prefer to get a specific element by tag or ID instead of just reading the inner text.

This is why its important to have the correct html code, and if the html of the website you are scraping is your own, you really should add an actual ID="myid" to your source code as already suggested; then you can get the element by ID, as I've shown above.

But what if you don't have access to the html source code to change it? Then you need to go through alternative routes, just as you are. This example will do what you need :

C#:

        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            string match = IsAMatch();
            if (match != null)
            {
                /* Do something with value */
            }
        }

        private string IsAMatch()
        {
            foreach (HtmlElement html_Obj in webBrowser1.Document.GetElementsByTagName("span"))
            {
                if (html_Obj.GetAttribute("wfd-id").Contains("135"))
                {
                    return html_Obj.InnerText;
                }
            }
            return null;
        }

If you wanted to go the manual ciphering route of the obj.OuterHtml which will give you : "<SPAN title=\"Click To Copy\" wfd-id=\"135\" data-placement=\"top\" data-toggle=\"tooltip\">TextB</SPAN>"

First check the outerHtml contains wfd-id=\"135\", then you can use substring, skip, indexof to cut the outer html off at the closing span tag. Then using indexof >. Then substring with indexof for the first index of < using skip on the first opening bracket of < which will give you your text value.

This is obviously easier if using something like the html agility pack linked by Skydiver.

Answered WebBrowser how to find wfd-id

CrashedCrash

New member

Attachments

NoUserHere

Well-known member

JohnH

C# Forum Moderator

JohnH

C# Forum Moderator

NoUserHere

Well-known member

NoUserHere

Well-known member

Skydiver

NoUserHere

Well-known member

Similar threads

Share this page

Latest posts