Answered WebBrowser how to find wfd-id

CrashedCrash

New member
Joined
Aug 24, 2020
Messages
1
Programming Experience
1-3
Dear Community I need ur help,
im strugglin right now with it to find an element in html source over the wfd-id to copy the blank text from there.
in the attachment u find the html code. Maybe one of u have an solution :'(
would IE11 load the wfd-id`s I can grap it easy but it dont do.

Example:
            if (WebBrowser.ReadyState == WebBrowserReadyState.Complete)
            {
                foreach (HtmlElement Element in WebBrowser.Document.GetElementsByTagName("span"))
                {
                    if (Element.GetAttribute("wfd-id") == "136")
                    {
                        WebBrowser.Document.ExecCommand("Copy", false, null);
                        label1.Text = Clipboard.GetText();
                    }
                }
            }
 

Attachments

  • TEXT-C2P.PNG
    TEXT-C2P.PNG
    4.4 KB · Views: 21
  • MORE.PNG
    MORE.PNG
    27.3 KB · Views: 21
Last edited:
Welcome to the forums.

May I ask why you are scraping data?
Is an API not available for the website?
If it's your website, why don't you write an API for API calls?

You're close to achieving this. I must have answered this question a thousand times over the years. Shockingly people are still scraping sites with this control instead of relying or calling an API instead.

What you are looking for is GetElementByID : HtmlDocument.GetElementById(String) Method (System.Windows.Forms)
 
All you need to do is to get Element.InnerText
 
Should be able to use wfd-id as the ID. I've not tried it. Long long time since I scraped with the WBC.

Can also use GetElementByTagName as they are doing. But additional filtering will be required if there are many of the same tags.

@CrashedCrash can you provide the URI of the source page so I can try it?
 
There is also the HTML agility pack HTML agility pack which may make this a little easier.
 
Should be able to use wfd-id as the ID. I've not tried it. Long long time since I scraped with the WBC.
Well that was a silly suggestion. I wasn't sure if get element by id was strict on actually wanting an actual ID="myid". Anyway...
Assume for example sake the html is :
C#:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <title></title>
</head>
<body>
    <span>I like TextB</span>
    <table border="0">
        <tr>
            <td>
                <span data-toggle="tooltip" data-placement="top" title="Click To Copy" wfd-id="130">TextA</span>
            </td>
        </tr>
        <tr>
            <td>
                <span data-toggle="tooltip" data-placement="top" title="Click To Copy" wfd-id="135">TextB</span>
            </td>
        </tr>
        <tr>
            <td>
                <span data-toggle="tooltip" data-placement="top" title="Click To Copy" wfd-id="135">TextB</span>
            </td>
        </tr>
    </table>
</body>
</html>
And some sample C# code :
C#:
            HtmlElementCollection htmlElementCollection = webBrowser1.Document.GetElementsByTagName("span");
            if (htmlElementCollection != null)
            {
                foreach (HtmlElement obj in htmlElementCollection)
                {
                    if (obj.InnerText.Contains("TextB"))
                    {

                    }
                }
            }
Checking only the inner text isn't very helpful if the same phrase is listed partially more than once on the page. So If this was the case, you'd end up with more than one entry meeting your conditional logic. So I can understand why people prefer to get a specific element by tag or ID instead of just reading the inner text.

This is why its important to have the correct html code, and if the html of the website you are scraping is your own, you really should add an actual ID="myid" to your source code as already suggested; then you can get the element by ID, as I've shown above.

But what if you don't have access to the html source code to change it? Then you need to go through alternative routes, just as you are. This example will do what you need :
C#:
        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            string match = IsAMatch();
            if (match != null)
            {
                /* Do something with value */
            }
        }

        private string IsAMatch()
        {
            foreach (HtmlElement html_Obj in webBrowser1.Document.GetElementsByTagName("span"))
            {
                if (html_Obj.GetAttribute("wfd-id").Contains("135"))
                {
                    return html_Obj.InnerText;
                }
            }
            return null;
        }

If you wanted to go the manual ciphering route of the obj.OuterHtml which will give you : "<SPAN title=\"Click To Copy\" wfd-id=\"135\" data-placement=\"top\" data-toggle=\"tooltip\">TextB</SPAN>"

First check the outerHtml contains wfd-id=\"135\", then you can use substring, skip, indexof to cut the outer html off at the closing span tag. Then using indexof >. Then substring with indexof for the first index of < using skip on the first opening bracket of < which will give you your text value.

This is obviously easier if using something like the html agility pack linked by Skydiver.
 
Back
Top Bottom