Answered Unicode In Database Not Displaying Properly on View?

Mitchelln11

Active member
Joined
Apr 10, 2020
Messages
39
Programming Experience
Beginner
I am using the National Park Service REST API to pull park info which I am saving to a SQL database. It saves with the unicode value. One of the parks in Hawai'i is Haleakalā National Park, which is fine, only whenever I want it to display, it shows exactly that unicode and not the actual symbol. ( ā ) It is supposed to be an a with a slash over the top. I believe it's Polynesian. My site is already set to UTF-8, but it doesn't work from the database. If I copy and paste the @#257; anywhere else on a view, it shows up properly.

Any thoughts? Thanks.
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
1,575
Location
Virginia Beach, VA
Programming Experience
10+
I've never heard of a database that uses HTML/XML entities to Unicode characters. That is a first for me.

Are you sure that the data that went into the database was not HTML or XML encoded?

Attach a debugger, look at the string that comes back from the database query. If the value was truly a 16-bit Unicode code point, then Visual Studio inspectors (local and autos) will display the string with the a. Now if the data stored in the database had that HTML encoded value ("ā"), then the inspectors would show those entity encoding digits.
 
Last edited:

Mitchelln11

Active member
Joined
Apr 10, 2020
Messages
39
Programming Experience
Beginner
I've never heard of a database that uses HTML/XML entities to Unicode characters. That is a first for me.

Are you sure that the data that went into the database was not HTML or XML encoded?

Attach a debugger, look at the string that comes back from the database query. If the value was truly a 16-bit Unicode code point, then Visual Studio inspectors (local and autos) will display the string with the a. Now if the data stored in the database had that HTML encoded value ("ā"), then the inspectors would show those entity encoding digits.


Looking back at Postman, and even that says
Code:
"name": "Haleakalā"
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
1,575
Location
Virginia Beach, VA
Programming Experience
10+
If that's the case, then you need to convert the HTML entity into an actual Unicode character before putting it into your database.

 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
1,575
Location
Virginia Beach, VA
Programming Experience
10+
That's really sad that the API does that though. I've only started skimming the API documentation, but it looks like the API returns JSON. JSON supports Unicode strings. I don't know why they HTML encoded that character since there is no need to.
 

Mitchelln11

Active member
Joined
Apr 10, 2020
Messages
39
Programming Experience
Beginner
That's really sad that the API does that though. I've only started skimming the API documentation, but it looks like the API returns JSON. JSON supports Unicode strings. I don't know why they HTML encoded that character since there is no need to.

Could that possibly be a typo? In other parts of the API, the symbols are directly in there. It's just the name that's like that.


Also, yes, decoding did the job, so thank you for this!
Code:
park.ParkName = HttpUtility.HtmlDecode(individualPark.FullName);
 
Last edited by a moderator:

JohnH

C# Forum Moderator
Staff member
Joined
Apr 23, 2011
Messages
777
Location
Norway
Programming Experience
10+
Thread split, start new threads for new topics.
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
1,575
Location
Virginia Beach, VA
Programming Experience
10+
More likely a data conversion error combined with a proofreading error. The NPS paid someone to do the data conversion from their original data format, and the person reviewing the data/person in charge of quality assurance just missed it.

My company had a similar issue when it contracted out converting all employee's resumes from the old data format that HR kept stuff in (it was essentially flat US CodePage 437) to go into a more modern Oracle database. Unfortunately, HR again short-sightedly picked a specific code page, instead of going Unicode. Anyway during the conversion process, various people's names, company names, bullet symbols, etc. got mangled. Unfortunately, they were blaming us, the web team for not correctly displaying the data because they kept asserting that they followed the modern Unicode standard. I had to prove to them that their data was not Unicode.
 
Top Bottom