Resolved Regex Failures beyond my understanding

ConsKa

Well-known member
Joined
Dec 11, 2020
Messages
140
Programming Experience
Beginner
I am trying to match a word and replace with a capitalised version, and I don't understand why the word is not being seen?

I have used:

C#:
@"\bor\b", "OR"

Which I add to a dictionary as key and value, with the regex being the key.

My understanding, and all my reading seems to suggest, that this will search through a string looking for a non-word character, then an 'o' and then an 'r' and if it then finds a non-word character after it matches those two, it will consider that a full match.

I do a check to see if the key exists in the string so as to avoid an exception error:

C#:
foreach (KeyValuePair<string, string> entry in dict)
                    {
                        if (item.Contains(entry.Key))
                        {
                            var outPut2 = Regex.Replace(item, string.Join("|", dict.Keys.Select(k => k.ToString()).ToArray()), m => dict[m.Value]);
                            strOut.Text += outPut2 + Environment.NewLine;
                        }
                    }

Item is the phrase: "ConsKa or Conske"

The KeyValuePair does not match the @"\bor\b" just skips right over it.

I don't really understand why it isn't matching?

There is no issue with the dictionary, nor the Linq, as those terms where I have not had to use Regex (characters that never appear in the middle of a word) are found and replaced with no issue. It is when I come to deal with characters that could appear in the middle of a word where I want to rely on the regex that this issue arises.
 
Solution
If the regex expressions doesn't depend on line boundaries you can do this to replace all matching expressions in text:
C#:
var input = InputTextbox.Text;
foreach (var entry in dict)
    input = Regex.Replace(input, entry.Key, entry.Value);
OutputTextbox.Text = input;

JohnH

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
1,197
Location
Norway
Programming Experience
10+
It is only an array because you split the string to an array.
strInput.Text.Split('\r', '\n');
which is also pointless, since you could just get the Lines property from textbox. But why do you need to process each line individually?
 

JohnH

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
1,197
Location
Norway
Programming Experience
10+
Why do you still think you need to Regex.Match before Regex.Replace? Regex.Replace will do a match and replace if there is a match. Why add yet another match? Pointless as I said, you're just adding cpu cycles that has no effect.
Regex.Replace Method said:

Returns​

String
A new string that is identical to the input string, except that the replacement string takes the place of each matched string. If pattern is not matched in the current instance, the method returns the current instance unchanged.
 

ConsKa

Well-known member
Joined
Dec 11, 2020
Messages
140
Programming Experience
Beginner
It is only an array because you split the string to an array.

which is also pointless, since you could just get the Lines property from textbox. But why do you need to process each line individually?
C#:
string[] tempArray = strInput.Lines;

I don't think this works with WPF? As it is referred to as a Winform namespace on MSDN, and I get an error.

I know this is C# general, because the question wasn't specific to WPF it was more syntax related.

Why do you still think you need to Regex.Match before Regex.Replace? Regex.Replace will do a match and replace if there is a match. Why add yet another match? Pointless as I said, you're just adding cpu cycles that has no effect.

I will look at it again, but it isn't simply a matter of removing the regex.match.

It may be down to where I have the output being gathered, I will look at it, but currently this:

C#:
foreach (var result in test)
            {
                foreach (var item in dict)
                {
                    outPut = Regex.Replace(result, item.Key, item.Value);
                    
                }
                strOut.Text += outPut + Environment.NewLine;
            }

put in:

david w/2 wilkins
david or davie
william and williams
"Trevor Phillips"

get back:

david w/2 wilkins
david or davie
william and williams
"Trevor Phillips"

If I wrap the outPut = Regex.Replace in an if statement:

C#:
if (Regex.IsMatch(result, item.Key))
                    {
                        outPut = Regex.Replace(result, item.Key, item.Value);
                    }

put in:

david w/2 wilkins
david or davie
william and williams
"Trevor Phillips"

get back:

david W/2 wilkins
david OR davie
william AND williams
Trevor Phillips

Which matches my Regex.
 

ConsKa

Well-known member
Joined
Dec 11, 2020
Messages
140
Programming Experience
Beginner
I know why it is doing that, because on the second dict loop, result is still the original result, and if there is no IF statement, the regex looks at result again, and says, no result is fine and result becomes outPut - the original result.

As it isn't changing the result when it gets a match, it is changing the outPut.

You can't make it result = Regex.Replace - because result is part of the foreach loop and it won't let you do that.
 

ConsKa

Well-known member
Joined
Dec 11, 2020
Messages
140
Programming Experience
Beginner
@JohnH Thank you for being patient...

Yes, I re-read everything you said, I felt I was missing something.

I can just put the inputString in as a single string, no requirement to foreach loop over each line - is what you were trying to tell me and I was too dense to get.

Thank you, for keep pounding away.

Not only is that so much easier, but it is ridiculously fast.

Apologies it took me more than one read through.
 

JohnH

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
1,197
Location
Norway
Programming Experience
10+
You can't make it result = Regex.Replace - because result is part of the foreach loop and it won't let you do that.
There is a simple solution for that and it is called a variable :)Assign the loop to a separate variable to work with.
inputString in as a single string
Not only is that so much easier, but it is ridiculously fast.
Great!
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
3,386
Location
Chesapeake, VA
Programming Experience
10+
Why are you doubling the work? First you call Match() and then if there is a match, you are calling Replace(). Knowing that Replace() won't change anything if there is no match, you can just call Replace() directly.
 

ConsKa

Well-known member
Joined
Dec 11, 2020
Messages
140
Programming Experience
Beginner
Why are you doubling the work? First you call Match() and then if there is a match, you are calling Replace(). Knowing that Replace() won't change anything if there is no match, you can just call Replace() directly.
Yeah it took me a while to understand that you can enter a string of 16,000 lines from the textbox and the regex.replace will work just fine across all of it and give you the output you expect.

I just assumed you had to run each line through it for it to work, but nope it is far better than that.
 
Top Bottom