Reading an OEM file

Gloops

Well-known member
Joined
Jun 30, 2022
Messages
137
Programming Experience
10+
Hello everybody,
I receive a text file written by PowerShell with Export-CSV, with no precision of character set, on a Windows French system, so I presume we have a character page of 850 or 1252.
I want to read it from a WinForms application that has to load the contents to a ListBox. But if I attempt to use the character set pages I said, I get an error:

Error message : encoding not recognized:
"System.NotSupportedException : 'No data is available for encoding 850. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.'"

And ... I do not intend to redefine the basic sorts of files, that already existed before Windows was invented, and are daily used since, particularly in console applications.

So, I began by listing the supported character sets :

Reading the existing encodings:
EncodingInfo[] listenc = Encoding.GetEncodings();
foreach(EncodingInfo inf in listenc)
{
    System.Diagnostics.Debug.Print("{0} : {1}", inf.CodePage, inf.DisplayName);
}

and in the output I obtained this :

Existing encodings obtained by previous code extract:
1200 : Unicode
1201 : Unicode (Big-Endian)
12000 : Unicode (UTF-32)
12001 : Unicode (UTF-32 Big-Endian)
20127 : US-ASCII
28591 : Western European (ISO)
65001 : Unicode (UTF-8)


Well, is not it possible to read an OEM text file in WinForms, with accentuated characters ?
I should guess I already did.

Do I have anything more to declare first ?
 
'Export-Csv' defaults to UTF8 encoding.

And CSV files are not OEM file formats. CSV files are RFC standard file format.
 
'Export-Csv' defaults to UTF8 encoding.

Hum, I presume you must have done something for that ?
I tried UTF8, and obtained only a question mark for "é".
But where you are right : the file contains a question mark, so I have to manage that in PowerShell.

And CSV files are not OEM file formats. CSV files are RFC standard file format.
Oh, you mean about my title ?
 
Once I added -Encoding UTF-8 to the Export-CSV command, I could read the file with WinForms using Encoding.UTF8

I do not know exactly what the default was, probably ASCII, as the accentuated characters were not rendered. But you are right, according to the documentation it should be UTF8, so perhaps there was a local surprise.

Anyway now, the content is cleanly displayed, thank you.
 
Probably post a bug in the PowerShell documentation page saying that the product is not functioning as documented with regards to the `-Encoding` parameter:

 
Hello,
The documentation explains how to initiate your own default encoding, and as an example they give UTF8. Maybe this deserves to still take a little time before telling hey your doc is buggy 😉
 
I disagree. It specifically says:
-Encoding
Specifies the encoding for the exported CSV file. The default value is utf8NoBOM.

and if you run Get-Help Export-Csv you will see this:
Screenshot_7.png


Notice that it doesn't say:
[-Encoding [{ ASCII | BigEndianUnicode ... }]]

What this means is that if you use the -Encoding parameter, you MUST specify a choice of encoding format. You can't use -Encoding and just have it default to UTF8. That further means that if you don't specify -Encoding the output encoding should be in UTF8.

Try running the following in PowerShell:
C#:
[PSCustomObject] @{ TradeMark = "$([char]0x2122)" } | Export-Csv x.csv -NoTypeInformation
Format-Hex x.csv

The output that I get looks like this on my machine using PowerShell 7.2.5:
C#:
          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000000 22 54 72 61 64 65 4D 61 72 6B 22 0D 0A 22 E2 84 "TradeMark"��"â�
0000000000000010 A2 22 0D 0A                                     ¢"��

Notice the E2 84 A2. That's the UTF-8 encoding of the U+0x2122 .

Screenshot_8.png
 
Okay, I think know what's happening... I assume that you are using PowerShell 5.1 or lower. With that older version of PowerShell, I got this result:
C#:
           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   22 54 72 61 64 65 4D 61 72 6B 22 0D 0A 22 3F 22  "TradeMark".."?"
00000010   0D 0A                                            ..

Which is correct because the PS5 documentation says that the default encoding when not specified is ASCII.
 
That is right. Oh and maybe this is written in the documentation too?
Well I updated something for PowerShell recently, but probably not what I thought :
$PSVersionTable:
Name                           Value
----                           -----
PSVersion                      5.1.19041.1682
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.19041.1682
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1


A few hours ago I read how to registrar a default value for the encoding in PowerShell —unfortunately I do not find it at this time. Of course, if you want to use it, you do not put the -Encoding parameter to a command.
 
Yeah, I stumbled across that thing for changing the default encoding a couple of years ago. It was some kind of registry hack as I recall. All I know is that I did it to my laptop because at that time I was trying to get Unicode output on the standard console Window while using PowerShell 5. I should have bookmarked it.
 
Back
Top Bottom