How to determine ascii code for character

peterhw

Member
Joined
Jan 29, 2019
Messages
8
Location
Scotland
Programming Experience
10+
Not really sure whether the question is exactly correct but I have a 'non printing' character in a string. If I look at the string in EXCEL I determine the character has a numeric equivalent of 160 (code(mychar)).

Using 160 doesn't appear to work in c# (Visual studio)
Equally statements like don't appear to isolate the character

C#:
Regex.Replace(line, @"\s+", "^");                    
line = Regex.Replace(line, @"\s+", "^");
Regex.Replace(line, @"\p{Z}", "^");
line = Regex.Replace(line, @"\p{Z}", "^");

I have managed to isolate the character into a single character by taking the first character of a string when I know the position.
If I then try to 'cast' to an integer I get a value of 65533. So now work backwards and eventual success with the following and see the '^' character in my string.


C#:
char mychar = (char)65533;        
line = line.Replace(mychar, '^');

but.... what is this code and why does Regex not pick it out as whitespace ?
 
Last edited:

JohnH

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
868
Location
Norway
Programming Experience
10+
There seems to be something mixed up, character 160 is non-breaking space and it does match with regex \s. It also converts to/from numeric value 160.

65533 (hex FFFD) is replacement character for unknown value in encoding, you typically see <?> symbols when text is read with wrong encoding.
 

peterhw

Member
Joined
Jan 29, 2019
Messages
8
Location
Scotland
Programming Experience
10+
A non-breaking space

John,
Many thanks for quick response.

There seems to be something mixed up, character 160 is non-breaking space and it does match with regex \s. It also converts to/from numeric value 160.

65533 (hex FFFD) is replacement character for unknown value in encoding, you typically see <?> symbols when text is read with wrong encoding.

Looks like a ? mark inside a diamond - like this - Date:?31/12/2018 (this showed as white question mark inside black diamond before posting)
The data is a text file output from a well known banking organisation (in the UK / Europe)

Obviously now I know what it is I can remove but puzzled

You state character 160 is non-breaking space - so would have expected it to be removed as white space using \s. Is there any other combination of regex that removes/replace such a character.

How should I best address such a character?

Thanks again
 
Last edited:

JohnH

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
868
Location
Norway
Programming Experience
10+
\s would be suitable for nbsp (160), or \xa0 to seek that character specifically, but since you can match 65533 the problem seems to be that you are reading the text file with wrong encoding and gets replacement characters in the text.
 

peterhw

Member
Joined
Jan 29, 2019
Messages
8
Location
Scotland
Programming Experience
10+
A non-breaking space

\s would be suitable for nbsp (160), or \xa0 to seek that character specifically, but since you can match 65533 the problem seems to be that you are reading the text file with wrong encoding and gets replacement characters in the text.

Many thanks again but how should I read the file correctly and how would I know how to read the file.
I read the file with streamreader with code below.
C#:
        OpenFileDialog myNewFileDialog = new OpenFileDialog();              
            myNewFileDialog.InitialDirectory = "E:\\MyFiles";                // where to start from
            myNewFileDialog.FileName = "To_31_Dec_2018_TextFile.txt";       // file  name default 
            myNewFileDialog.Filter = "txt files (*.txt)|*.txt";                // optional filter
            myNewFileDialog.ShowDialog();
            string myFname = myNewFileDialog.FileName;
            using (StreamReader sr = File.OpenText(myFname))
            {
                string line;
                while ((line = sr.ReadLine()) != null)
                {
                   line = line.Replace(": ", "");
 

JohnH

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
868
Location
Norway
Programming Experience
10+
File.OpenText "Opens an existing UTF-8 encoded text file for reading". If the file is not UTF8 then you should find out what encoding it is first. Sometimes you can try Encoding.Default (system's active code page), otherwise you need to know which encoding to use. File.ReadAllText/ReadAllLines can be use to read text file with given encoding.
 

peterhw

Member
Joined
Jan 29, 2019
Messages
8
Location
Scotland
Programming Experience
10+
A non-breaking space

File.OpenText "Opens an existing UTF-8 encoded text file for reading". If the file is not UTF8 then you should find out what encoding it is first. Sometimes you can try Encoding.Default (system's active code page), otherwise you need to know which encoding to use. File.ReadAllText/ReadAllLines can be use to read text file with given encoding.

Many thanks - I'll have a 'play' with this option.
 
Top Bottom