Resolved Regex Failures beyond my understanding

ConsKa

Well-known member
Joined
Dec 11, 2020
Messages
140
Programming Experience
Beginner
I am trying to match a word and replace with a capitalised version, and I don't understand why the word is not being seen?

I have used:

C#:
@"\bor\b", "OR"

Which I add to a dictionary as key and value, with the regex being the key.

My understanding, and all my reading seems to suggest, that this will search through a string looking for a non-word character, then an 'o' and then an 'r' and if it then finds a non-word character after it matches those two, it will consider that a full match.

I do a check to see if the key exists in the string so as to avoid an exception error:

C#:
foreach (KeyValuePair<string, string> entry in dict)
                    {
                        if (item.Contains(entry.Key))
                        {
                            var outPut2 = Regex.Replace(item, string.Join("|", dict.Keys.Select(k => k.ToString()).ToArray()), m => dict[m.Value]);
                            strOut.Text += outPut2 + Environment.NewLine;
                        }
                    }

Item is the phrase: "ConsKa or Conske"

The KeyValuePair does not match the @"\bor\b" just skips right over it.

I don't really understand why it isn't matching?

There is no issue with the dictionary, nor the Linq, as those terms where I have not had to use Regex (characters that never appear in the middle of a word) are found and replaced with no issue. It is when I come to deal with characters that could appear in the middle of a word where I want to rely on the regex that this issue arises.
 
Solution
If the regex expressions doesn't depend on line boundaries you can do this to replace all matching expressions in text:
C#:
var input = InputTextbox.Text;
foreach (var entry in dict)
    input = Regex.Replace(input, entry.Key, entry.Value);
OutputTextbox.Text = input;
So "Contains" cannot match a Regex?

Could I then, reverse that so that

C#:
if (result.Contains(entry.Value.ToLower()))

So that it is checking the value, as the only change here is capitalisation - I can then send it into the Regex.Replace that will find the regex.

Given it is the same word.

Yep, that gets me into the loop and annoyingly....I have to do @"\bor\b" in the dictionary for it to work.....

man, what an absolutely mind bending treat this was...
 
Last edited:
So "Contains" cannot match a Regex?
No, that is a string function.
And as I said, neither do you need a Contains or Match.
 
No, that is a string function.
And as I said, neither do you need a Contains or Match.
I think I do because otherwise wouldn't my output be full of the same string untouched for every other regex test that I am doing in the Dictionary which doesn't apply?

So if I do 4 Regex tests through the foreach var entry in dict...I would have:

dave or dave
dave OR dave
dave or dave
dave or dave

As my output? One would be the one the Regex replace acted on, the other 3 would be ones it just passed through.

The match/contains test, only puts the string through the regex when it matches the dictionary regex and therefore only adds it to the output, once it has been changed?

I can add an else statement to pass a single untouched string if no Regex applies.

Unless you had another thought on this?
 
Look at post 13, text is processed through all regex expressions, and finally shown in UI control.
 
The input text is an array, so I need to foreach item in that array, test it against the Regex, and test it against each item in the Regex Dictionary then create an output.

This does it:

C#:
foreach (var result in test)
            {
                foreach (var item in dict)
                {
                    if (Regex.IsMatch(result, item.Key))
                    {
                        outPut = Regex.Replace(result, item.Key, item.Value);
                    }
                }
                strOut.Text += outPut + Environment.NewLine;
            }

Creates a 4 string output, which is the 4 strings that were input that were changed. This doesn't help me though when I add the 5th string which doesn't need changing, but would still like to keep in the list. Will think on that.

Just swapping out the Contains for the proper Regex.IsMatch gets me where I think I need to be.

This is tested against a 4 string array....I have to test it against a 15,000 string array and see what type of performance hit it takes to do all these loops.

So if you have a better way? I am here to learn.
 
It is only an array because you split the string to an array.
strInput.Text.Split('\r', '\n');
which is also pointless, since you could just get the Lines property from textbox. But why do you need to process each line individually?
 
Why do you still think you need to Regex.Match before Regex.Replace? Regex.Replace will do a match and replace if there is a match. Why add yet another match? Pointless as I said, you're just adding cpu cycles that has no effect.
Regex.Replace Method said:

Returns​

String
A new string that is identical to the input string, except that the replacement string takes the place of each matched string. If pattern is not matched in the current instance, the method returns the current instance unchanged.
 
It is only an array because you split the string to an array.

which is also pointless, since you could just get the Lines property from textbox. But why do you need to process each line individually?
C#:
string[] tempArray = strInput.Lines;

I don't think this works with WPF? As it is referred to as a Winform namespace on MSDN, and I get an error.

I know this is C# general, because the question wasn't specific to WPF it was more syntax related.

Why do you still think you need to Regex.Match before Regex.Replace? Regex.Replace will do a match and replace if there is a match. Why add yet another match? Pointless as I said, you're just adding cpu cycles that has no effect.

I will look at it again, but it isn't simply a matter of removing the regex.match.

It may be down to where I have the output being gathered, I will look at it, but currently this:

C#:
foreach (var result in test)
            {
                foreach (var item in dict)
                {
                    outPut = Regex.Replace(result, item.Key, item.Value);
                    
                }
                strOut.Text += outPut + Environment.NewLine;
            }

put in:

david w/2 wilkins
david or davie
william and williams
"Trevor Phillips"

get back:

david w/2 wilkins
david or davie
william and williams
"Trevor Phillips"

If I wrap the outPut = Regex.Replace in an if statement:

C#:
if (Regex.IsMatch(result, item.Key))
                    {
                        outPut = Regex.Replace(result, item.Key, item.Value);
                    }

put in:

david w/2 wilkins
david or davie
william and williams
"Trevor Phillips"

get back:

david W/2 wilkins
david OR davie
william AND williams
Trevor Phillips

Which matches my Regex.
 
I know why it is doing that, because on the second dict loop, result is still the original result, and if there is no IF statement, the regex looks at result again, and says, no result is fine and result becomes outPut - the original result.

As it isn't changing the result when it gets a match, it is changing the outPut.

You can't make it result = Regex.Replace - because result is part of the foreach loop and it won't let you do that.
 
@JohnH Thank you for being patient...

Yes, I re-read everything you said, I felt I was missing something.

I can just put the inputString in as a single string, no requirement to foreach loop over each line - is what you were trying to tell me and I was too dense to get.

Thank you, for keep pounding away.

Not only is that so much easier, but it is ridiculously fast.

Apologies it took me more than one read through.
 
You can't make it result = Regex.Replace - because result is part of the foreach loop and it won't let you do that.
There is a simple solution for that and it is called a variable :)Assign the loop to a separate variable to work with.
inputString in as a single string
Not only is that so much easier, but it is ridiculously fast.
Great!
 
Why are you doubling the work? First you call Match() and then if there is a match, you are calling Replace(). Knowing that Replace() won't change anything if there is no match, you can just call Replace() directly.
 
Why are you doubling the work? First you call Match() and then if there is a match, you are calling Replace(). Knowing that Replace() won't change anything if there is no match, you can just call Replace() directly.
Yeah it took me a while to understand that you can enter a string of 16,000 lines from the textbox and the regex.replace will work just fine across all of it and give you the output you expect.

I just assumed you had to run each line through it for it to work, but nope it is far better than that.
 
Back
Top Bottom