Regex expression for replacing particular symbols in text- ®and ™

Socks93

Active member
Joined
Oct 22, 2021
Messages
30
Programming Experience
Beginner
Hi,

I'm wanting to create a regular expression which will take a string, and if that string includes symbols ® or™ I want to then wrap those symbols with a <Sup> tag.

e.g. "hello® this is a test™"

outcome = hello <sup>®</sup> this is a test <sup>™</sup>

Can anyone help?

Thanks
 
And how is this related to web services? Moving to general C#.
 
Why does in need to be a regex? Won't a simple loop and string replacement work?

Recall that as concise as a regex may look, it actually has to generate and execute a whole state machine to do it's work. Also regular expressions are often not as easily readable, even by experienced programmers, without investing a lot of time trying to parse it, or just assuming that the comments near the code are accurate (assuming that someone documents the regular expression and what it does).
 
Anyway, here's the regular expression that you wanted:
C#:
[®™]
Simply match the characters that you want to surround.

Since you only asked for the regular expression, but didn't ask for the replacement string, you'll have to click on the spoiler below to see the replacement string:
Code:
using System;
using System.Text.RegularExpressions;
                    
public class Program
{
    public static void Main()
    {
        var input = "hello® this is a test™";
        var output = Regex.Replace(input, "[®™]", "<sup>$1</sup>");
        
        Console.WriteLine(input);
        Console.WriteLine(output);
    }
}
 
Anyway, here's the regular expression that you wanted:
C#:
[®™]
Simply match the characters that you want to surround.

Since you only asked for the regular expression, but didn't ask for the replacement string, you'll have to click on the spoiler below to see the replacement string:
Code:
using System;
using System.Text.RegularExpressions;
                   
public class Program
{
    public static void Main()
    {
        var input = "hello® this is a test™";
        var output = Regex.Replace(input, "[®™]", "<sup>$1</sup>");
       
        Console.WriteLine(input);
        Console.WriteLine(output);
    }
}

@Skydiver thank you for the prompt response, and apologies for putting the question in the incorrect thread!

Ive used your logic above however the output gives me the following:

hello<sup>$1</sup> this is a test<sup>$1</sup>

Shouldn't it be: hello<sup>®</sup> this is a test<sup>™</sup>?

Thanks
 
LOL! I missed the grouping symbols since I was just typing from memory. Can't reference a replacement group, if a match group is not found/created.

C#:
ode:

using System;
using System.Text.RegularExpressions;
                  
public class Program
{
    public static void Main()
    {
        var input = "hello® this is a test™";
        var output = Regex.Replace(input, "([®™])", "<sup>$1</sup>");
      
        Console.WriteLine(input);
        Console.WriteLine(output);
    }
}
 
LOL! I missed the grouping symbols since I was just typing from memory. Can't reference a replacement group, if a match group is not found/created.

C#:
ode:

using System;
using System.Text.RegularExpressions;
                 
public class Program
{
    public static void Main()
    {
        var input = "hello® this is a test™";
        var output = Regex.Replace(input, "([®™])", "<sup>$1</sup>");
     
        Console.WriteLine(input);
        Console.WriteLine(output);
    }
}

Lol, thanks works fine now dude. Thank you! :)
 
Your welcome.

Part of forum etiquette is not to do a huge quote and have just a one liner. Just post your message. If there is something specifically you want to talk about, just quote that section.
 
Could also have used $0 in this case I.e. the entire match, without parentheses in the pattern
 
Quite possible. Worth trying. This is why I mentioned that RE's tend to require extra brainpower to read and understand them, and reason with them.
 
Last edited:
In the spirit of not using a regular expression:
C#:
using System;
using System.Text;
                   
public class Program
{
    public static string WrapCharacters(char [] chars, string prefix, string suffix, string input)
    {
        int length = input.Length;
        var sb = new StringBuilder();
        int lastIndex = 0;
        int index = 0;
        while (lastIndex < length && (index = input.IndexOfAny(chars, lastIndex)) >= 0)
        {
            sb.Append(input[lastIndex..index]);
            sb.Append($"{prefix}{input[index]}{suffix}");
            lastIndex = index + 1;
        }
        sb.Append(input.Substring(lastIndex, length - lastIndex));       
        return sb.ToString();
    }

    public static string SuperScriptCharacters(char [] chars, in string input)
        => WrapCharacters(chars, "<sup>", "</sup>", input);

    public static void Main()
    {
        char [] superScriptChars = { '®', '™' };
        var input = "hello® this is a test™";
        Console.WriteLine(input);
        Console.WriteLine(SuperScriptCharacters(superScriptChars, input));       
    }
}
 
Last edited:
Or a much simpler WrapCharacters() for post #12 that is easier to understand:
C#:
public static string WrapCharacters(char [] chars, string prefix, string suffix, string input)
{
    foreach(char ch in chars)
        input = input.Replace($"{ch}", $"{prefix}{ch}{suffix}");
    return input;
}
 
For comparison, which of the following bits of code would you rather look at when its 3AM in the morning and you are trying to trackdown some mysterious bug which is mangling your strings?

C#:
static string WrapUsingReplace(string input)
{
    char [] superScriptChars = { '®', '™' };
    foreach(char ch in superScriptChars)
        input = input.Replace($"{ch}", $"<sup>{ch}</sup>}");
    return input;
}

static string WrapUsingRegularExpressions(string input)
{
    return Regex.Replace(input, "[®™]", "<sup>$0</sup>")
}
 
My only real beef with the first is the number of string allocations it requires..


As for which I'd chose, it maybe depends on the performance target of the production system:

1693937531916.png



Replace is a lot faster in this instance, but burns far more memory on string allocations

Maybe if both were critical I'd look at a stringbuilder, span or other technique
 

Latest posts

Back
Top Bottom