Tokenize a string

Sarkazm

New member
Joined
Sep 10, 2017
Messages
2
Programming Experience
Beginner
I'm trying to get a driver to accept user input and then return the string with each word and delimiter on it's own line, but not to include spaces like this:

User Input:
"The quick, brown!fox jumps+over"

Output:
1. The
2. quick
3. ,
4. brown
5. !
6. fox
7. jumps
8. +
9.over

The code I have works with a hardcoded string except that it's leaving out all of the delimited characters like , + and so on, but when I try it with user input it won't output anything at all.
What I need help with:
1) Understanding why user input doesn't produce output (There's some kind of endless loop that doesn't affect the hardcoded string which I cant figure out)
2) Understanding why the delimiter characters aren't being output at all

Any other suggestions that might help to make this a bit cleaner are appreciated. I'm also trying to get the top line that prompts for input to be red, and the rest of the text blue, and then after the user inputs a string and the output is displayed I want a top line to say something simple like Output, which I know can be done by putting code for red font where I want it and following that up by code for blue font and then doing that again on the output screen, just didn't know if there's a cleaner way to do that without repeating, but it's not a must, just something I'm curious about to try to find ways to make things cleaner and simpler.

Driver.cs
namespace TokenizerProject
{
    class Driver
    {
 
        public static void Main()
        {
            Setup();
            String startingString;
            String delimiters = "!@#$%^&*()-=_+{}|\\][:"';?><,./ ";
 
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine("Please enter text to be processed:\n");
 
            Console.ForegroundColor = ConsoleColor.Blue;
            startingString = Console.ReadLine();
 
            Console.WriteLine(startingString);
            Console.Clear();
            //Console.ReadKey();
 
            //startingString = "Sarah, Jonathan, and Ross went to the game;";
 
 
            String[] tokens = Tools.Tokenize(startingString, delimiters);
            PrintTokens(tokens);
 
            Console.ReadLine();
        }
 
        public static void PrintTokens(String[] tokens)
        {
            int i = 1;
 
            foreach (String token in tokens)
            {
                Console.WriteLine(i + ".    " + token);
                i++;
            }
        }
 
        public static void Setup()
        {
            Console.Title = "String tokenizer";
            Console.BackgroundColor = ConsoleColor.White;
            Console.Clear();
        }
    }
}


Tools.cs
namespace TokenizerProject
{
    public static class Tools
    {
 
        public static String[] Tokenize(string line, string delims)
        {
            Char[] delimiters = delims.ToCharArray();
            String trimmedLine = line.Trim();
            int stringLength = trimmedLine.Length;
            List<String> tokenizedList = new List<string>();
 
            String subString;
            int startPos = 0;
            int endPos = 0;
            string[] tokens;
 
            while (startPos < stringLength)
            {
                endPos = trimmedLine.IndexOfAny(delimiters, startPos);
                if (endPos - startPos > 0)
                {
                    subString = trimmedLine.Substring(startPos, endPos - startPos);
                    tokenizedList.Add(subString);
                }
                else if (endPos - startPos == 0)
                {
                    subString = trimmedLine.Substring(startPos, 1);
                    tokenizedList.Add(subString);
                }
                startPos = endPos + 1;
            }
 
            tokens = tokenizedList.ToArray();
 
            return tokens;
 
        }
 
 
    }
}
 
Last edited by a moderator:
Here is a regex suggestion, it searches for either
  • single character that is NOT a letter (including international), a number or space
  • one or more letters/numbers
var input = "The quick, brown!fox jumps+over";
var pattern = "[^\\p{L}\\p{N} ]|[\\p{L}\\p{N}]+";
foreach (Match token in Regex.Matches(input, pattern)) {
	Console.WriteLine(token.Value);
}
 
I'm not permitted to use Regex because it's not something we've 'officially' covered and we were told that we can't use split because it won't work for this.

What we were told: The Split method in the String class will not work for this purpose because it discards the delimiters it finds. You will need to write your own similar method using methods and properties of the String class such as Substring, IndexOf, IndexOfAny, IsNullOrEmpty, Empty, PadLeft, PadRight, Remove, Trim, and so forth.
 
Sorry to hear that, but you should have explained these restrictions in first post. It's not my cup of tea unfortunately, but others may feel compelled to look into your code :)
 
Your delimeter(s) is a single string so it will try to match that exact string. What you want is a character array so it will match any character in that array. Then in your Tokenize method, you can greatly simplify that, you are doing a lot of work that just doesn't need to be done (plus it won't even work). Try something like this for your Driver class code, I removed some of the console input stuff so put that back in:

C#:
String startingString = "The quick, brown!fox jumps+over";
            char[] delimiters = 
                new char[] { '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '-', '=', '_', '+', '{', '}', '|', '\\', ']', '[', ':', '"', '\'', ';', '?', '>', '<', ',', '.', '/' };




            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine("Please enter text to be processed:\n");


            
            String[] tokens = Tokenize(startingString, delimiters);
            PrintTokens(tokens);


            Console.ReadLine();

Then your Tokenize method would just look like this:

C#:
public static String[] Tokenize(string line, char[] delims)
        {
            return line.Split(delims);


        }
 
Your delimeter(s) is a single string so it will try to match that exact string.
No, delims.ToCharArray() is called.

The Split method in the String class will not work for this purpose because it discards the delimiters it finds.
 
Back
Top Bottom