Tracking Forums, Newsgroups, Maling Lists
Home Scripts Tutorials Tracker Forums
 
  HOME    TRACKER    PHP




Find Out Encoding Of A File With Php, And Convert It To UTF-8 Encoding


In my script I need to find out the encoding of a file and in case it
is not of the kind UTF-8 I need to convert it to that format.




View Complete Forum Thread with Replies

See Related Forum Messages: Follow the Links Below to View Complete Thread
Saved File Encoding
I am querying a database and saving the results as a tab-delimited file.  Is there a way to force the saved file to open in a certain encoding such as ISO Latin 1? Also, what is the best way to deal with carriage returns in the data I am saving?

Encoding Problem - Parsing An Xml File
I have a problem while parsing an xml file. The file is a backup from moodle (moodle.xml). When i open the file with one of my editors and setting Encoding to utf8 font courier standard or greek characters the editor can see all greek strings contained within the file perfectly. On the other hand when i try to open it with php parse it and then display it on a web page the data from the file come out like latin characters like (A~ atilde)(o: o umplaud) and others
like this. Updating moodle is not an option so i have to find a workaround. Has anybody heard of something like this?

Question About File Upload Encoding
If I drop a form onto my page that allows file uploads, I know well
enough how to handle the file upload in the server side via PHP.

However, I have a question: when my browser actually *sends* a binary
file (e.g., an image), does it encode it somehow for the upload? Or
does it simply send a raw bytestream?

The reason I am asking is that I am in the process of transferring some
images from one photo gallery to another (in Drupal). Mapping the two
database schemas has proved to be a little tricky, and I am thinking
it would be simpler to use the web interface to re-upload the various
images.

However, rather than do it one at a time, I'd just as soon write a PHP
script that would traverse through the old photo gallery files, and
upload them via HTTP POST (along with other relevant details, such as
descriptive text) and let the web interface make sure that the
database.

I'm comfortable with regular HTTP POST mechanisms, and doing them
programmatically with PHP, but I've never uploaded files that way.
Anything I should know? For example, do they need to be encoded
somehow before upload? Or can I just read in the raw bytestream, and
voila, there we go?

Error While Encoding A Txt File To Win-1252
I have a problem with encoding a file .txt to win-1252.

$text = recode_string("ISO-8859-1..win-1252", $text);

but I get the following error message:

Fatal error: Call to undefined function: recode_string()

the same when I trie to recode the whole file

with recode_file.

Saving XML File With Iso-8859-1 Encoding Fails
The question might sound (and probably is;-) stupid, but why doesnt the
follwing code work?? Is there something i dont see?

$doc = new DOMDocument(Ƈ.0', 'iso-8859-1');
$doc->formatOutput = true;

$root = $doc->createElement('root');
$root = $doc->appendChild($root);

$head = $doc->createElement('head');
$head = $root->appendChild($head);

$title = $doc->createElement('title');
$title = $head->appendChild($title);

/* probably the real sign gets killed,
so here is the html: ä (an umlaut) */
$text = $doc->createTextNode('ä');
$text = $title->appendChild($text);

echo $doc->saveXML();

URL Encoding
In one of my PHP applications, I generate a dynamic link to another page. The parameter list contains a few variables with some special characters, so as expected, I use urlencode() to encode the parameters before plopping them into the URL.

For some strange reason, the link doesn't work - I get a "page cannot be displayed" error in IE 6. NS 6 starts to load it, then just stops. Oddly enough, if I go into the location bar (under IE at least), and delete one (1) character, any character from this one particular parameter (search_expr), the page loads fine (albeit with a messed up parameter). Code:

URL Encoding
Can anyone point me in the direction of a tutorial on URL encoding? I am creating a website and i want to use 1 page which contains the layout, then seperate pages with the content. i would do it something similar to this:

http://website.com/index.php?page=links

Then that would just call up a .inc page specified in the page= variable and display it on index.php.

Php Encoding
i was wondering, which is a good-freeware php enconding alternative?
Im running the apache server on a windows platform.

Encoding
I have a problem. How can I transform this xml:
$xml="<name>Niccol&#xF2;</name>"
to this
$xml="<name>Niccolò</name>"

I recive by CURL from a site this xml:

<?xml version="1.0">
<name>Niccol&#xF2;</name>

but, for some reasons, I want:

<?xml version="1.0">
<name>Niccolò</name>


Which Encoding?
Not really a PHP question, but i've got an app that sends me this type of
encoding:

á (&aacute;) translates to %C3%A1
ñ (&ntilde;) translates to %C3%B1

i've tried several decodings with PHP without any good result.
Anyone knows which type of encoding is this and which function to use in
order to decode it with PHP?

Encoding
my website is parsing 4 external xml files (rss feeds) and writes them in a mysql database. my problem is that 2 of the xml files are encoded in utf-8 and 2 of them are windows 1250. when i need to retrieve the data from the database and display it on my site (which is utf-8) i have a problem with special characters that are encoded in windows 1250. I need all the data in my database to be encoded in one way. how to do that ? or is there another way?

Help On Character Encoding
I have a string in windows-1256 character set. But i would like to change from this chracter set to utf-8. (actually it is for to display arabic text). I am using in my whole site utf-8.
Any help highly appreciated.

Chinese Encoding
I have a forum mainly in Chinese character. There are some characters which cannot be shown correctly and turn into "& #22175" ... does anyone know how to fix this?

Email Encoding
I'm using the code below to send emails right now. Its is secure?

Low-end PHP Encoding For Windows
FYI, BadBlue added a built-in PHP encoding capability to the enterprise version of their Windows web server. The personal edition of the server is a free download and is a quick easy way to get PHP running: http://badblue.com/helpphps.htm .

Encoding Problem
I have a dom tree representing the content of a html document. In the xml I
use &#x20AC; as the euro sign. I think I need to do this to be able to use
xsl transformations. After the xsl transformation I use a small SAX
parser. But my &#x20AC; gets converted to: â,¬50.
I've tried passing xml_parser_create() 'UTF-8' and '' (hint from php.net)
but it only makes things worse ( ? or squares are displayed ).
How do i have to encode characters to be able to pass them to SAX. Is there
a build in function or do I use str_replace()?

Replace tld with top level domain of belgium to contact me pgp:0x3B7D6BD6
The soul would have no rainbow had the eyes no tears.

GB2312 Encoding In PHP
I have language text stored as variables in text files, which are
'included' by my PHP scripts (is there a better way?). However, I seem
to have a problem with the simplified chinese GB2312 encoding format.

I thought that most foreign encoding mechanisms would avoid the use of
the quotation mark - however, I saved something out as GB2312 and now
effectively get parse errors (due to premature quotes which appear to
form parts of the Chinese characters themselves).

A few other websites I tested though don't seem to have a problem
sending me mail in GB2312 though ... so they must have somehow managed.

Locale And Different Encoding
I have a simple question about locale and PHP5.

I'm using polish characters but I encode them in UTF-8.
I was sorting an array with usort($array, 'strcoll'); but I've entred a
problem.
I was sorting arrays this way:

setlocale(LC_ALL, 'pl_PL.utf8');
usort($array, 'strcoll');

but it turned out to be completely wrong...

I've started to test various encodings on various OS and ended with
this sample PHP script
that sorts UTF-8 encoded array on both Linux and Windows with locale
encoding set to ISO-8859-2:

<? // file encoding is UTF-8
$array = array('Aea', 'Eaz', 'Eaz', 'Eaz', 'Aea', 'Aeas');

var_dump(mb_detect_encoding($array[1])); // shows UTF-8 - OK

// set locale encoding ISO-8859-2
// pl_PL for Linux
// polish_Poland.28592 for Windows
if (PHP_OS == 'WINNT') {
setlocale(LC_ALL, 'polish_Poland.28592');
} else {
setlocale(LC_ALL, 'pl_PL');
}

// sort
usort($array, 'strcoll');

// works OK on both Linux and Windows
var_dump($array);
?>

how come it works on both Linux Gentoo PHP 5.1.1 and
Windows NT PHP 5.1.4?

the contents of $array contents are encoded in UTF-8, but encoding is
set to ISO-8859-2 (via setlocale)

Php Script Encoding
this is my first "topic" on Google Groups.

I'm looking for a solution for 4 days, without results.

So... my apache is serving pages in UTF-8 but my php scripts are writed
(I'm speaking about strings) in ISO-8859-1.

How can I say at apache how to interpret my scripts in a different
charset?

I don't want to change every "è" with "&egrave;"
I think that it should be a simple configuration in the .conf or in the
..ini but... nothig found about.

SoapVar And Encoding
what's going on with the following (I'm working in non-WSDL mode).
Basically, I need the following in the SOAP request body:

<ns6:create>
...
<property>
<name>Property 1</name>
<value>Test Value 1</value>
</property>
<property>
<name>Property 2</name>
<value>Test Value 2</value>
</property>
</ns6:create>

So I've created a class called Create that has the required members,
and stores the property elements (which are instances of a class called
NamedValue) in an array:

class NamedValue {
public $name;
public $value;

public function __construct($name, $value) { ... }
}

class Create {
public $id;
public $parent;
public $type;
public $property;

public function __construct(...) {
...
$property = array(
new NamedValue("Property 1", "Test Value 1"),
new NamedValue("Property 2", "Test Value 2")
);
}
}

Euc-kr Encoding Problem
I have a dom tree representing the content of a html document.
I use character-set as the euc-kr sign.

Can I use a <?xml encoding="euc-kr" ?>
This source don't work.

How Can I use character-set as the euc-kr ?

Character Encoding In PHP & GD2?
I have been working on a dynamic image in GD2 which acts as a chatbox in a forum signature. It all seems to be working pretty well, but I have come across a bug (of sorts): if the user were to enter characters such as '¬_¬' the result in the image is shown as 'Ž_Ž'.

Is there any way that, using PHP, I can change this so that is actually understands the characters and would print '¬_¬'.?

Character Encoding
I'm working on a site for my brothers friend and it has a commentary system.
The commentaries get added by calling a Javascript which updates the current page and then call a PHP script to store the comment in a mySQL database.
My problem is at if the user uses any kind of special chars like the danish chars: æ, ø and å then the PHP script will just die on them. It will store the message until the chars and then stop.

I've tried running an escape () in the javascript on the txt i'm sending, and i've tried using CONVERT ('string' USING utf8) in the SQL query. The entire site is encoded in UTF-8, and so is the database but the user might type in ISO-8859-1 (?).

Somewhere else on the site i have a fileupload, here the page calls the upload script directly and it has no problems storing special chars in the database, so i'm thinking the problem might be a combination of the Javascript and PHP.

I'm hoping someone has an idea to what might be wrong. Code:

Detect Character Encoding?
My Problem is I want to build my own small web interface. works fine with imap_ functions so far. But now i got problems with Mails coming from MS Outlook - they are UTF-7 or UTF-8 encoded.

I don't know how to find out how they are encoded?

Second Problem: Got a character Sequence in the subject looking like that:

Ihre Anfrage: =?iso-8859-1?Q?minpoll=20timelimit=20wie=20gro=DF=3F?=

Don't know what to do with that.

Perhaps someone could drop some info on encoding.

Encoding/decoding A Variable
Is there a function that will temporarily encode a variable (say, for passing through a URL), and another that would decode the same?

Character Encoding Issues
I have built a CMS for a client. They client is copy pasting characters from Microsoft Word into the CMS and saving. When the CMS content is later displayed, several of the characters are shown as wier junk characters like: — or ’.

I fixed this at one level by converting the MySQL database to UTF8. This fixed it insofar as my computer is now diaplaying those characters properly but apparently some other computers are not. Some other computers are still showing wierd characters.

Does anyone know what the problem thus might still be? Are some client machines missing the fontmaps to express those characters? Or, are they not understanding to display that text as UTF8 perhaps?

SOAP And Accept-Encoding
I'm trying to make a SOAP request, and enable compression in
the response (by sending the Accept-Encoding header). To do this, I'm
sending 'compression' =SOAP_COMPRESSION_ACCEPT |
SOAP_COMPRESSION_GZIP in the SoapClient constructor's options.

But when I print the request headers, I don't see the Accept-Encoding
header. Am I doing something wrong?

I also tried SOAP_COMPRESSION_ACCEPT | SOAP_COMPRESSION_GZIP | 9 to
set the compression level, as I've seen in some examples.

I just downloaded and compiled php 5.2.1 with --enable-soap on Mac OS
10.4.9.

The full example is below:
$o_client = new SoapClient($wsdlURL, array(
'trace' =true,
'compression' =SOAP_COMPRESSION_ACCEPT | SOAP_COMPRESSION_GZIP,
'login' =$login,
'password' =$password
));
$o_result = $o_client->acknowledge();
echo $o_client->__getLastRequestHeaders();

Zend Optimizer Encoding
I'm currently running into problems with Zend Optimized PHP code; or
more clear, a partner of mine asked me to adapt a script he bought which
is encoded this way.

Does anyone know if this can be done, or if there are any projects
trying to develop a free decoder/decompiler?

I know this might be a bit off-topic here, but I couldn't find a list
that seemed to fit better to me.

Domxml_new_doc - Can I Add Encoding Attribute?
I've got a question about adding encoding attribute to my DOM XML Document?

I'm from Poland and we use extended latin alphabet - I'd like to use
iso-8859-2 Polish charset

e.g.:
<?xml version="1.0" encoding="iso-8859-2"?>

is there such posibility in DOM XML?

Send Request With Encoding
Does anyone know if it allows to send request with specific encoding?

What i want to do is sending japanese as a parameter. But always gives
me an error "Invalid protocol "

$http_request = new Snoopy;
$str = mb_convert_encoding($this -mAddress,"UTF-8", "Shift_JIS");
$url = "http://www.myusite.jp/api/?v=1.1&q='"+$str+"'";

$http_request -fetch( $url );
if( !$http_request -error ){
}

Encoding Problem, Can't Solve It
Text I get:
# stat¸ pakeitimai
# em s paskirties keitimas

As you can see some letters isn't exactly as it should be. It's in
UTF-8 in MySQL (utf8_lithuanian_ci) and I transfer using utf8
encoding. The problem I need those letters, I think those should be
called latin. But where they are gone I can't image.

Flex Form And Encoding
I have a RIA made in Flex whos send data to a PHP Script before I store the
data in a database. The function to format my data looks like this:

Character Encoding Issues
When I use utf-8 encoding my page shows up fine. However, if I use
iso-8859-1 I get some funky characters that show up at the top of my page
where I am just calling "require_once." I am using a Win32/IIS/ISAPI/PHP5
installation, anyone have an idea of what's going on and how to resolve?

Functions' Compatibility For UTF-8 Encoding
I'm writing a system with PHP which encodes with UTF-8 encoding. Everything is encoded with UTF-8 encoding.

In order to work with UTF-8 encoded strings, I need to use special functions - mbString function (stands for Multi Byte String), that specially compatible for UTF-8 encoding and others.

The problem is that there aren't enough mbString functions so that I will be able to work well with UTF-8 encoded strings. Many important mbString functions are missing.

I wrote a list of regular functions and I need to know if they can work well & suitable for UTF-8 encoded strings. Code:

Character Encoding And Rewrite?
im having a real problem with character encoding and php i aware that php has some problems with character encoding but i dont really understand them

whats got me even more stumped is that when i was developing the same thing on my pc i never had this problem im now on a mac
whats more the problem only seems to happen when using the rewrite rule in the htaccess?

basically if i have an file index.html and it is
<?php
      echo "£";
?>

it echos a £ sign fine

if however i use an htaccess file to redirect any request to another file lets say invoke.html and it has the same thing it outputs £

i know for this example i should just echo &pound;
but this isnt exactly what im trying to do
this is just the simpliest example of the problem
what im actually doing is fopen fread a file then using strpos to find any £ symbols

the £ symbols in the file are just £ symbols
if i echo the read file contents they output as just £ symbols

however if i put "£" in as the needle for strpos it goes in as £ and therefore doesnt find it in the haystack.

Encoding Necessary To Send The TM Symbol Via Mail()?
Anybody know what encoding is necessary to send text containing the TM symbol via mail()?

The html is " & t r a d e ; " or " & # 1 5 3 ; " and email source has the character encoded as " = 8 1 "

I am on a macintosh so the tm symbol is [option + 2] but when i code this it displays as a superscript "a" in the email. Does anybody have any suggestions?

Fopen GB2312 Encoding Problem
I am trying to read in a url which is encoded in GB2312 (http://
top.baidu.com/winkvane.html). I have tried other pages with this
encoding and each time I get a connection time out error. It seems
that fopen cannot open a webpage in GB2312. Is there a way around
this? Has anyone else had this sort of problem?

Here is an example with some simple code (using PHP5):

$file = fopen("http://top.baidu.com/winkvane.html", "r") or
exit("Unable to open file!");

while(!feof($file))
{
echo fgets($file). "<br />";
}
fclose($file);

Writing Euro Symbol With Utf Encoding
I am writing a string from the database into a text file which contains
"Euro" symbol . I have tried using iconv() and utf_encode and
mb_convert_encoding() functions to convert the string to Unicode but the
text file created has just a square box in place of the Euro symbol. Can
anybody help me in this matter ?

Submit Form Data In Particular Encoding
I use some forms with search fields to look up some terms in external dictionaries. However, data need to be send in particular charset (for example, page with external dictionary uses windows-1250, so I need to submit data in windows-1250 charset to work properly).

I tried to use form's attribute accept-charset for this - works fine in FF, but not in IE, it seems to ignore this attribute. I also tried to set POST parameters in header manually in desired charset, it would might work, but then the page gets loaded without CSS etc, bare HTML gets downloaded only..

PHP5 DOMDocument: Specifying Encoding Programmatically
How do I tell DOMDocument->load() what encoding I want it to use?

Longer: I search for and process XML files from elsewhere, and need to
transform them with some XSLTs. No problem. Using PHP5 and the DOM
library, everything's a snap. Worked fine, up till now. Today, funky
characters were in the XML file -- "smart" quotes from Word, it looks
like. Anyways, DOMDocument->load complained about them, saying that
they weren't UTF-8, and to specify the encoding.

Lo and behold, the encoding is not specified in these XML files. If I
add in 'encoding="iso-8859-1"' to the header, it works fine. The rub is
I have no control over these XML files.

Reading the file into a string, modifying its header and writing it back
out to another location seems to be my only option, but I'd prefer to do
it without having to use temporary copies of the XML files at all. Is
there any way to simply tell the parser to parse them as if they were
iso-8859-1?

Encoding Error When Importing With Simple XML
When importing XML with SimpleXML -php5-with accents they come with errors... I guest this is an encoding thing... example:  when importing this frase in spanish "La Tecnología más avanzada" I get:

La Tecnología más avanzada

if the original text was in utf-8 or in iso-8859-1 how do I can get them correctly according to it encoding?

Sending MIME-mail (subject Encoding)
If i send a charachter like "ö", "ä" and "ü" the mail-client get's always "?". i think, that it isn't encoded correctly. in the body (with base64-encode) i get the right charakter. So i would know if i must decode also the subject and how i can do this?

No Mbstring Function For Finding Suitable Encoding.
I am making some site, where I use UTF-8 encoding.
From PHP I send mail. But, if possible I want to send
the mail in ISO-8859-1 or KOI8-R (because still some
mailers have problem with UTF-8), but if not, just in
UTF-8.

If I look in the documentation, there is no function that
can check if a UTF-8 string can be encoded in another
encoding without loss of characters.

The function mb_check_encoding and mb_detect_encoding
have a different purpose.

Or, do I miss something?

So, I want a function:

bool mb_encoding_possible(string str, string to_encoding, string
from_encoding)

which returns TRUE if mb_convert_encoding is possible, without loss.

Encoding/characterset/font Family Confusion
I could use a bit of guidance on the following matter.

I am starting a new project now and must make some decisions regarding
encoding.
Environment: PHP4.3, Postgres7.4.3

I must be able to receive forminformation and store that in a database and
later produce it on screen on the client (just plain HTML).
Nothing special. I do this for many years, but I never paid a lot of
attention to special characters.

A few day ago I discovered that the euro-sign is not defined in all
fontfamilies.
They cannot produce the right sign no matter if I use &euro; or the
hexadecimal equivalent.
After a little research I found I could put font-tags around the euro-sign
with another font-family (Arial in this case) to get the Euro sign.

I am completely graphical impaired, and only understand programmingcode (and
HTML/JavaScript of course) , so this is a weak point on my side, hence this
question.

I target on Europe only at the moment (no need for Chineese
charactersupport)
That said, will the following setup make sense?

Postgresql db encoding scheme: LATIN1
In the headers of all my HTML: content-type: text/html charset: iso-8859-1

A few related questions:
1) Will people be able to copy/paste info from other sources (like
wordprocessing programs and other websites) into my forms?

2) Can I use regular expressions as I am used to (ASCII) in my PHP code?
Will I match e acute, eurosign, etc?

3) Will the roundtrip describe here under have problems with normal expected
european characters?

client copies some text from some source ->
paste in the form ->
receive by PHP ->
insert in Postgresql (or update) ->
retrieve from postgresql ->
display as HTML (with content-type: text/html charset: iso-8859-1)

Is that OK?
Any pitfalls?
Should I maybe use UTF-8?

PHP Version 5.0.1 Ignores Charactar Encoding Header
I use this: header("Content-Type: text/html; charset=utf-8") to allow a
server to detect charactar encoding.

When I validate the html of webpages on PHP Version 4.3.4, charactar
encoding is detected with no problem.

When I validate same page served on PHP Version 5.0.1, I am told that
the charactar encoding could not be detected. Is there some change in
PHP Version 5.0.1 I should know about ?

Content-Encoding Is Not Set In Response When Using Zlib.output_compression
I enabled automatic gzip compression with the following lines in
..htaccess:

php_value zlib.output_compression On
php_value zlib.output_compression_level 5

The problem is that the Content-Encoding header does not get set at all
in the response. Therefore, a browser that advertises itself as
supporting gzip compression (Accept-Encoding: gzip,deflate) receives
compressed content but does not know it is compressed.

If I manually add the following to my script:

header("Content-Encoding: gzip");

.... it then works.

I'm using PHP Version 4.4.2 with the following Configure Command

'./configure' '--with-apache=../apache_1.3.34' '--with-openssl'
'--with-gd' '--with-mysql' '--enable-trans-sid' '--enable-track-vars'
'--with-jpeg-dir=/usr' '--with-png-dir=/usr' '--with-zlib=/usr'
'--enable-mbstring' '--enable-ftp' '--enable-exif'
'--with-freetype-dir=/usr/local' '--with-pspell=/usr/local'

Zlib settings:
ZLib Support enabled
Compiled Version 1.2.2
Linked Version 1.2.2

Character Encoding Problem - £ (pound) Sign
I have a simple PHP page that takes values from a form and puts them in a
database (MySQL).

The code is in a file test.php, which I have typed in at the bottom of
this post (please excuse any typos). This page is not a production page -
I created it just to try and solve the £ sign problem I am having.

When I put the pound sign (£) into the input box and submit the form,
what gets inserted into mysql is an A with a ^ on top, followed by the
pound sign. I notice that the request URI shows the £ sign as %C2%A3. If
I run the query manually in php (i.e. mysql_query("INSERT INTO test
VALUES('£')") ) then it is fine. (The problem occurs with both GET and
POST)

I have read up a bit about it and I believe that the problem is that one
part of the system uses UTF-8 while the other part uses ISO-8859-1. The
database is ISO-8859-1 (latin1).

Using phpmyadmin, I can put a £ sign into the database with no problem,
so I know that there must be a solution to this issue within PHP, but I
have no idea what it is!

test.php:

<?php
mysql_connect('localhost', 'root', '');
mysql_select_db("test");
mysql_query("INSERT INTO test VALUES('".$_REQUEST['testpound']."')");
?>
<HTML>
<BODY>
<form action="test.php">
<input name="testpound">
<input type="submit">
</form>
</body>
</html>

PHP + MySQL (+AJAX) Character Encoding Problem
i'm developing a site using php + mysql and javascript (ajax techniques). My problem is, that some characters like &#321; arent displayed correctly. Their codes are shown, not the symbols themselves.

MySQL encoding is UTF-8, the encoding of the webpages is ISO-8859-1.

What am i doing wrong?

Encoding / Decoding Multi Dimensional Session Array
Can you encode and decode multi-dimensional arrays? I think encoding them works just fine using session_encode(), however, session_decode is only capable of decoding the single dimensional arrays stored in the encoded string...


Copyright © 2005-08 www.BigResource.com, All rights reserved