Resolved Regex Failures beyond my understanding

ConsKa

Well-known member
Joined
Dec 11, 2020
Messages
140
Programming Experience
Beginner
I am trying to match a word and replace with a capitalised version, and I don't understand why the word is not being seen?

I have used:

C#:
@"\bor\b", "OR"

Which I add to a dictionary as key and value, with the regex being the key.

My understanding, and all my reading seems to suggest, that this will search through a string looking for a non-word character, then an 'o' and then an 'r' and if it then finds a non-word character after it matches those two, it will consider that a full match.

I do a check to see if the key exists in the string so as to avoid an exception error:

C#:
foreach (KeyValuePair<string, string> entry in dict)
                    {
                        if (item.Contains(entry.Key))
                        {
                            var outPut2 = Regex.Replace(item, string.Join("|", dict.Keys.Select(k => k.ToString()).ToArray()), m => dict[m.Value]);
                            strOut.Text += outPut2 + Environment.NewLine;
                        }
                    }

Item is the phrase: "ConsKa or Conske"

The KeyValuePair does not match the @"\bor\b" just skips right over it.

I don't really understand why it isn't matching?

There is no issue with the dictionary, nor the Linq, as those terms where I have not had to use Regex (characters that never appear in the middle of a word) are found and replaced with no issue. It is when I come to deal with characters that could appear in the middle of a word where I want to rely on the regex that this issue arises.
 
Solution
If the regex expressions doesn't depend on line boundaries you can do this to replace all matching expressions in text:
C#:
var input = InputTextbox.Text;
foreach (var entry in dict)
    input = Regex.Replace(input, entry.Key, entry.Value);
OutputTextbox.Text = input;
So as I do, I worked on this some more.

I thought maybe it was this failure:

C#:
if (item.Contains(entry.Key))

Changed the if statement by doing the following:

C#:
Regex foundWords = new Regex(@"\bor\b");

Match ted = foundWords.Match(item);
if (ted.Success)

This matches no problem, and enters into the loop.

However, the dictionary still says Key not found. Despite my entering the key exactly as I entered above for foundWords.
 
This:

C#:
Regex foundWords = new Regex(@"\bor\b");

Comes up colour coded, the \b is in bright pink, to indicate that it is a Regex I suspect, a bit of IntelliSense at work and it works.

But this:

C#:
dict.Add("\bor\b", "OR");

Is what is required for for the dictionary key to recognise the regex.....why no @ sign? Why an @ sign sometimes, but not other times?

"string pattern = @"\w+ # Matches all the characters in a word.";"

From here:

 
The @ preceding a string literal indicates that it is a verbatim string literal. In a verbatim string literal, the backslash (\) character is treated as a literal character, rather than as an escape character. How you want backslashes treated is what decides whether you use the @ symbol or not. In the first code snippet in post #3, you want the backslash characters treated literally because it is the Regex itself that will turn them into escape characters when it parses the pattern provided. In the second code snippet, you want the string to contain '\b' characters so the backslashes need to be treated as escape characters. @"\bor\b" is equivalent to "\\bor\\b" and that's what we had to write before verbatim string literals were a thing. It makes code harder to read, which is why the new feature was introduced.
 
Thanks JMC, but I was aware of what the @ and \\ does for string literals.

My issue is that when you read on this stuff, the @ is usually included in the regex.

I have done a little testing and other regex appear to show an unrecognised character - if you do not include the @ or \\ before a \ for example:

"\-|\,"

Will throw an IntelliSence problem, and you need to @ this regex.

I am wondering whether \b is recognised by C# - whereas other regex are not. Now, the articles I am finding of people using this are like 9 years old, and they are saying No, it is not and you need to @ or \\ the regex \b - but things change.

I am going to put my code below, because I do not understand why it simply isn't working.

There are two problems:

1. It does not recognise "dave or dave" as containing a Key.

2. On the odd occasion when I get it to recognise that "dave or dave" has a key - by using different code, it doesn't change to the value and the output is "dave or dave"

C#:
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("\bor\b", "OR"); // I have tried @"\bor\b" and @"\\bor\\b" none of them work

string[] test = strInput.Text.Split('\r', '\n');

// the previous line creates entries of "" this removes them
test = test.Where(x => !string.IsNullOrEmpty(x.Trim())).ToArray();

foreach (var result in test) // result is dave or dave
{
    foreach (var entry in dict) // the entry appears as {[or, OR]}
    {
        if (result.Contains(entry.Key)) // entry.Key view = \bor\b in text visualiser or in html visualiser or
        {
            var outPut = Regex.Replace(result, entry.Key, entry.Value);
            strOut.Text += outPut + Environment.NewLine;
        }
    }
}

I can't even get this to enter the loop anymore. I do not know why, I am assuming that this: or is simply the way that VS visualises the test whitespace or whitespace - which is the regex expression.

Any help here? As I thought my understanding of the regex and of the test above was correct, and break pointing through it appears to show me exactly what I would expect to see. There is no error, it simply skips it as not containing the Key.

I have noticed that the code on the page has changed what I typed slightly.

"in text visualiser or"

The or here has a box on either side that I do not appear to be able to replicate.
 
Keep it simple when trying to figure things out:
C#:
var input = "dave or dave";
var pattern = @"\bor\b";
var replacement = "OR";
var output = Regex.Replace(input, pattern, replacement);
 
Yep that works, as I kind of expected it to.

What I don't understand is why putting that into a dictionary, so the pattern is the key, and the replacement is the value doesn't work.

Even simplifying it right down:

C#:
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add(@"\bor\b", "OR"); // doing ("\\bor\\b, "OR") or ("\bor\b", "OR") makes no difference here

var output = Regex.Replace(input, dict.Keys.ToString(), dict.Values.ToString());

Break points show the key as \\bor\\b and the value as OR.

Output is dave or dave.

All the reading I have done seems to suggest the above should work.
 
What does dict.Keys.ToString() return?

By the way, actually using an entry Key/Value produces same result as in post 6. I would say 'of course', because a string is a string, and the argument to the Replace function is just a string.
C#:
var entry = dict.ElementAt(0);
var output = Regex.Replace(input, entry.Key, entry.Value);
 
This works:

C#:
foreach (var d in dict)
            {
                var matches = Regex.Matches(input, d.Key);

                foreach (Match match in matches)
                {
                    var output = Regex.Replace(input, match.Value, d.Value);
                    strOut.Text = output;
                }
            }

Which seems to suggest that Match can match a Regex in a dictionary, but Replace cannot match a Regex when in a dictionary?

That doesn't seem right, as it isn't like I just decided to do this in a dictionary, I did a lot of reading on people doing it in dictionaries.
 
What does dict.Keys.ToString() return?

By the way, actually using an entry Key/Value produces same result as in post 6. I would say 'of course', because a string is a string, and the argument to the Replace function is just a string.
C#:
var entry = dict.ElementAt(0);
var output = Regex.Replace(input, entry.Key, entry.Value);
Break points show the dict.Keys.ToString() as \\bor\\b

Seems to show that regardless of how you enter into the dictionary (@, \, \\)

The output is just dave or dave - as it doesn't recognise the or in the dict.Key part of the function, so doesn't do anything.

If I wrap it in a If statement - dave or dave contains dict.Key - then it just skips it as being false.
 
Even simplifying it right down:

C#:
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add(@"\bor\b", "OR"); // doing ("\\bor\\b, "OR") or ("\bor\b", "OR") makes no difference here

var output = Regex.Replace(input, dict.Keys.ToString(), dict.Values.ToString());

Break points show the key as \\bor\\b and the value as OR.
You're inspecting the wrong thing in your breakpoint. You may be inspecting dict, but recall that what you are passing into the Replace() call is dict.Keys.ToString(). Let's go see what dict.Keys.ToString() returns:
Test code:
C#:
var dict = new Dictionary<string, string>();
dict.Add(@"\bor\b", "OR");
Console.WriteLine(dict.Keys.ToString());

Output:
Code:
System.Collections.Generic.Dictionary`2+KeyCollection[System.String,System.String]

So how are you expecting "dave or dave" to match "System.Collections.Generic.Dictionary`2+KeyCollection[System.String,System.String]" ?
 
Which seems to suggest that Match can match a Regex in a dictionary, but Replace cannot match a Regex when in a dictionary?
No, the problem in post 5 is due to this:
if (result.Contains(entry.Key))
Key is \bor\b and result does not contain that.

Dictionary has nothing to do with this, the dictionary just stores the strings.
Break points show the dict.Keys.ToString() as \\bor\\b
Then you're not look at the right place.
Immediate Window said:
?dict.Keys.ToString()
"System.Collections.Generic.Dictionary`2+KeyCollection[System.String,System.String]"
Anyway, checking for contains or Matches is pointless, because if the regex doesn't match it also won't replace.
Assigning to strOut.Text inside loop is also not a good idea, only last assignment and replace is shown.
 
If the regex expressions doesn't depend on line boundaries you can do this to replace all matching expressions in text:
C#:
var input = InputTextbox.Text;
foreach (var entry in dict)
    input = Regex.Replace(input, entry.Key, entry.Value);
OutputTextbox.Text = input;
 
Solution
Understood, but look back at the original code I wrote.

It doesn't have ToString().

I added that to get it to run as IntelliSense was saying you couldn't have dict.Value in the code that was developed without a ToString();

The reason it is wrapped in an IF statement is that an exception error pops up if the Key is not found in the word, key doesn't exist error. - though I accept this is likely to the way in which the loops were being written.

Lastly, yes, if you look at the original code it was += I didn't bother doing the += when we simplified it down to a single entry.

The problem with not using \b is:

daore or daore = daORe OR daORe

Which I want to hopefully avoid.

Honestly, I do not understand why this doesn't work:

C#:
foreach (var result in test)
            {
                foreach (var entry in dict)
                {
                    if (result.Contains(entry.Key))
                    {
                        var outPut = Regex.Replace(result, entry.Key, entry.Value.ToUpper());
                        strOut.Text += outPut + Environment.NewLine;
                    }
                }
            }

Everything I have read says that should work. I added to upper, just incase it was replacing or with or and being case insensitive....it makes no different as it simply isn't finding a match.
 
Last edited:
Well, that because result has the value "ConsKa or Conske", but you are trying to see if "\\bor\\b" is it on line 5. Since the Contains() is going to return false, then lines 7-8 will not execute.
 
Last edited:
Back
Top Bottom