Question Unicode (utf16) to win-1251

ant555 · Jun 28, 2021

Hi.
I'm trying to decode string from unicode (utf-16) to windows-1251.

C#:

string b = "Ó382ÍÎ76";
Encoding win1251 = Encoding.GetEncoding(1251);
byte[] uniByte = Encoding.Unicode.GetBytes(b);
textBox1.Text = win1251.GetString(uniByte);

On microsoft doc said that "GetString(Byte[]) When overridden in a derived class, decodes all the bytes in the specified byte array into a string." but i gets only first symbol of b variable, or any symbol when specifying an index number.
What am I doing wrong?

jmcilhinney · Jun 28, 2021

If the bytes you provide as the argument contain a representation of a null character then using the string object will only use the characters up to that null. The other characters will still be in the string object but won't be used. As a test, try calling GetChars instead and then examine each character in the result to see if there's a null character in there.

ant555 · Jun 28, 2021

jmcilhinney said:
If the bytes you provide as the argument contain a representation of a null character then using the string object will only use the characters up to that null. The other characters will still be in the string object but won't be used. As a test, try calling GetChars instead and then examine each character in the result to see if there's a null character in there.

C#:

string b = "Ó382ÍÎ76";
Encoding win1251 = Encoding.GetEncoding(1251);
byte[] uniByte = Encoding.Unicode.GetBytes(b);
textBox1.Text = BitConverter.ToString(uniByte);

I get D3-00-33-00-38-00-32-00-CD-00-CE-00-37-00-36-00 - there is no null character.

Skydiver · Jun 28, 2021

The problem is that in this line you are passing in the wrong input:

C#:

win1251.GetString(uniByte);

The call to GetString() is expecting to see bytes that are encoded in Windows 1251, but you are passing in bytes encoded in Unicode from Encoding.Unicode.GetBytes() the lline before.

ant555 · Jun 28, 2021

Skydiver said:
The problem is that in this line you are passing in the wrong input:

C#:

win1251.GetString(uniByte);

The call to GetString() is expecting to see bytes that are encoded in Windows 1251, but you are passing in bytes encoded in Unicode from Encoding.Unicode.GetBytes() the lline before.

So I am getting the first character decoded correctly? I thought that in this case, I will get the wrong symbol that I expected.

Also I have tried Encoding.Convert Method and I get the whole array but the wrong values: O382II76. Correct value is У382НО76

C#:

string b = "Ó382ÍÎ76";
Encoding win1251 = Encoding.GetEncoding(1251);
byte[] uniByte = Encoding.Unicode.GetBytes(b);
byte[] win1251Bytes = Encoding.Convert(Encoding.Unicode, win1251, uniByte, 0, uniByte.Length);
textBox1.Text = win1251.GetString(win1251Bytes);

jmcilhinney · Jun 28, 2021

I missed it the first time but SkyDiver is right. What are you actually expecting the output to be. The idea with encodings is that the same characters are represented by different binary values. You start with a string so you already have the characters so what are you actually trying to achieve? The UTF16 encoding is completely irrelevant in this case. The only way it could be relevant is if you were starting by bytes in one encoding and wanted to end with bytes in the other. If you're starting with a string and ending with a string then what possible use could you have for two encodings? This:

I'm trying to decode string from unicode (utf-16) to windows-1251.

is a nonsensical statement. Instead of this contrived example that doesn't demonstrate anything useful, please explain what you're actually trying to achieve. Are you trying to save text to a file in a specific encoding? Are you trying to read a file that is in a specific encoding? Converting a string from one encoding to another is nonsense because ALL .NET strings are Unicode.

NoUserHere · Jun 28, 2021

Mother May... why is this under VS.Net?

And why are you encoding twice?

What do you want to do with b?

What is your aim with line 2? Explaining each line of your code and what you hope to achieve on each line would be helpful.

Would it not be better to use Encoding.GetEncoding("windows-1251");? Encoding.GetEncoding Method (System.Text)

Also, if someone has a link for the codepage code numbers for the list of code values to use, can they post it up?

I think you should tell what you are trying to do and why. You might get better answers.

Something tells me their opening post is not on que with what they are trying to do. This doesn't make a lot of sense. Are they trying to get the cyrillic code code or get cyrillic values specifically or something?

Skydiver · Jun 28, 2021

Part of the problem here is that the OP seems to be under the mistaken impression the the character `Ó` (U+00D3) in Unicode is going to map to Windows 1251 'У' (0xD3). Unfortunately that impression is incorrect. The call to Encoding.Convert() maps U+00D3 to Window 1251 'O' (0x4F). Similar mappings are happening for the other 'I's with accents which simply get mapped over to vanilla 'I' (0x49).

NoUserHere · Jun 28, 2021

That was my guess too. Best waiting on their reply.

I am not on VS to debug this and check, but this converter is handy : Online Character Set Fixer I don't think my brain can take a lot of char encoding tonight, I'm out for now.

jmcilhinney · Jun 28, 2021

Sheepings said:
Mother May... why is this under VS.Net?

Didn't notice initially. Corrected.

Skydiver · Jun 29, 2021

Sheepings said:
this converter is handy : Online Character Set Fixer

Unfortunately, it looks like that converter is assuming the incoming string is 8 bit, not Unicode. Although it does the OP's expected results from post #5, try putting in the OP's string and let it convert to UTF-16. I would expect the output to be the same as the input, but it looks like it treats each of the characters as if it were an 8 bit character and the end result is some junk string.

JohnH · Jun 29, 2021

Sheepings said:
Also, if someone has a link for the codepage code numbers for the list of code values to use, can they post it up?

Code Page Identifiers - Win32 apps

The following table defines the available code page identifiers.

docs.microsoft.com

ant555 · Jun 29, 2021

Thanks for the thoughts - I understand where my mistake is!

Question Unicode (utf16) to win-1251

ant555

New member

Skydiver

jmcilhinney

C# Forum Moderator

ant555

New member

Skydiver

ant555

New member

jmcilhinney

C# Forum Moderator

NoUserHere

Well-known member

Skydiver

NoUserHere

Well-known member

jmcilhinney

C# Forum Moderator

Skydiver

JohnH

C# Forum Moderator

Code Page Identifiers - Win32 apps

ant555

New member

Share this page

Latest posts