Question Unicode (utf16) to win-1251

ant555

New member
Joined
Jun 28, 2021
Messages
4
Programming Experience
Beginner
Hi.
I'm trying to decode string from unicode (utf-16) to windows-1251.
C#:
string b = "Ó382ÍÎ76";
Encoding win1251 = Encoding.GetEncoding(1251);
byte[] uniByte = Encoding.Unicode.GetBytes(b);
textBox1.Text = win1251.GetString(uniByte);
On microsoft doc said that "GetString(Byte[]) When overridden in a derived class, decodes all the bytes in the specified byte array into a string." but i gets only first symbol of b variable, or any symbol when specifying an index number.
What am I doing wrong?
 
Solution
Part of the problem here is that the OP seems to be under the mistaken impression the the character `Ó` (U+00D3) in Unicode is going to map to Windows 1251 'У' (0xD3). Unfortunately that impression is incorrect. The call to Encoding.Convert() maps U+00D3 to Window 1251 'O' (0x4F). Similar mappings are happening for the other 'I's with accents which simply get mapped over to vanilla 'I' (0x49).

jmcilhinney

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
4,060
Location
Sydney, Australia
Programming Experience
10+
If the bytes you provide as the argument contain a representation of a null character then using the string object will only use the characters up to that null. The other characters will still be in the string object but won't be used. As a test, try calling GetChars instead and then examine each character in the result to see if there's a null character in there.
 

ant555

New member
Joined
Jun 28, 2021
Messages
4
Programming Experience
Beginner
If the bytes you provide as the argument contain a representation of a null character then using the string object will only use the characters up to that null. The other characters will still be in the string object but won't be used. As a test, try calling GetChars instead and then examine each character in the result to see if there's a null character in there.
C#:
string b = "Ó382ÍÎ76";
Encoding win1251 = Encoding.GetEncoding(1251);
byte[] uniByte = Encoding.Unicode.GetBytes(b);
textBox1.Text = BitConverter.ToString(uniByte);
I get D3-00-33-00-38-00-32-00-CD-00-CE-00-37-00-36-00 - there is no null character.
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
3,555
Location
Chesapeake, VA
Programming Experience
10+
The problem is that in this line you are passing in the wrong input:
C#:
win1251.GetString(uniByte);
The call to GetString() is expecting to see bytes that are encoded in Windows 1251, but you are passing in bytes encoded in Unicode from Encoding.Unicode.GetBytes() the lline before.
 

ant555

New member
Joined
Jun 28, 2021
Messages
4
Programming Experience
Beginner
The problem is that in this line you are passing in the wrong input:
C#:
win1251.GetString(uniByte);
The call to GetString() is expecting to see bytes that are encoded in Windows 1251, but you are passing in bytes encoded in Unicode from Encoding.Unicode.GetBytes() the lline before.
So I am getting the first character decoded correctly? I thought that in this case, I will get the wrong symbol that I expected.

Also I have tried Encoding.Convert Method and I get the whole array but the wrong values: O382II76. Correct value is У382НО76

C#:
string b = "Ó382ÍÎ76";
Encoding win1251 = Encoding.GetEncoding(1251);
byte[] uniByte = Encoding.Unicode.GetBytes(b);
byte[] win1251Bytes = Encoding.Convert(Encoding.Unicode, win1251, uniByte, 0, uniByte.Length);
textBox1.Text = win1251.GetString(win1251Bytes);
 

jmcilhinney

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
4,060
Location
Sydney, Australia
Programming Experience
10+
I missed it the first time but SkyDiver is right. What are you actually expecting the output to be. The idea with encodings is that the same characters are represented by different binary values. You start with a string so you already have the characters so what are you actually trying to achieve? The UTF16 encoding is completely irrelevant in this case. The only way it could be relevant is if you were starting by bytes in one encoding and wanted to end with bytes in the other. If you're starting with a string and ending with a string then what possible use could you have for two encodings? This:
I'm trying to decode string from unicode (utf-16) to windows-1251.
is a nonsensical statement. Instead of this contrived example that doesn't demonstrate anything useful, please explain what you're actually trying to achieve. Are you trying to save text to a file in a specific encoding? Are you trying to read a file that is in a specific encoding? Converting a string from one encoding to another is nonsense because ALL .NET strings are Unicode.
 

NoUserHere

Well-known member
Joined
Sep 5, 2018
Messages
2,138
Programming Experience
10+
Mother May... why is this under VS.Net?

And why are you encoding twice?

What do you want to do with b?

What is your aim with line 2? Explaining each line of your code and what you hope to achieve on each line would be helpful.

Would it not be better to use Encoding.GetEncoding("windows-1251");? Encoding.GetEncoding Method (System.Text)

Also, if someone has a link for the codepage code numbers for the list of code values to use, can they post it up?

I think you should tell what you are trying to do and why. You might get better answers.

Something tells me their opening post is not on que with what they are trying to do. This doesn't make a lot of sense. Are they trying to get the cyrillic code code or get cyrillic values specifically or something?
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
3,555
Location
Chesapeake, VA
Programming Experience
10+
Part of the problem here is that the OP seems to be under the mistaken impression the the character `Ó` (U+00D3) in Unicode is going to map to Windows 1251 'У' (0xD3). Unfortunately that impression is incorrect. The call to Encoding.Convert() maps U+00D3 to Window 1251 'O' (0x4F). Similar mappings are happening for the other 'I's with accents which simply get mapped over to vanilla 'I' (0x49).
 
Solution

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
3,555
Location
Chesapeake, VA
Programming Experience
10+
Unfortunately, it looks like that converter is assuming the incoming string is 8 bit, not Unicode. Although it does the OP's expected results from post #5, try putting in the OP's string and let it convert to UTF-16. I would expect the output to be the same as the input, but it looks like it treats each of the characters as if it were an 8 bit character and the end result is some junk string.
 

JohnH

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
1,209
Location
Norway
Programming Experience
10+
Top Bottom