Jump to content

Welcome to Geeks to Go - Register now for FREE

Geeks To Go is a helpful hub, where thousands of volunteer geeks quickly serve friendly answers and support. Check out the forums and get free advice from the experts. Register now to gain access to all of our features, it's FREE and only takes one minute. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more.

Create Account How it Works
Photo

UTF-8 in C++


  • Please log in to reply

#1
scicatur

scicatur

    Member

  • Member
  • PipPip
  • 16 posts
Do you know what would be the best/most portable solution to handle UTF-8 characters in C++ code.

WinAPI seems to define a bunch of types like : WCHAR and TCHAR ... but I don't want to use anything MS specific.

wchar_t seems to be standard ... but its 16 bit unicode while UTF-8 is variable length encoding where one character takes space 1 - 6 bytes.

Also it would be nice to have something like the STL <string> class to use but with UTF-8.

Please if you know the solution or have ideas I would be most pleased to hear about it.
  • 0

Advertisements


#2
bdlt

bdlt

    Member

  • Member
  • PipPipPip
  • 875 posts
haven't used this but it may be somewhat 'portable':
http://www.utilityco...tr/default.aspx

unicode is probably more portable(no guarentee in C++, however)

sales pitch: consider using java on your next project if being portable is a requirement
  • 0

#3
scicatur

scicatur

    Member

  • Topic Starter
  • Member
  • PipPip
  • 16 posts
Thanks for answer. It looks good except it is commercial.

I think I tought up kind of a work-around solution to the problem.
Because UTF-8 its fundamentally just bytes so I read from file to
a 'unsigned char' array after which I use custom function to convert into
STL <string>. That custom function between ansi and utf-8 I maybe write myself BUT it is never perfect because utf-8 is fully unicode while ansi is not fully. Fortunately the utf-8 characters that cannot be converted are used only in very exotic languages. Converting japanese to ansi shouldn't be problem. Then of course I must make ansi to utf-8 function as well.
  • 0






Similar Topics

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

As Featured On:

Microsoft Yahoo BBC MSN PC Magazine Washington Post HP