Tracking Forums, Newsgroups, Maling Lists
Home Scripts Tutorials Tracker Forums
 
  HOME    TRACKER    PHP




Convert MS Word / Rtf / ... To Plain Text


i'm looking for standalone libraries that convert documents to plain text so i can let people edit the text in a textarea after uploading. One thing to notice is that i can not use COM because i can't configure the webserver.

Does anyone has interesting classes that are able to do this. I found a PHP class for ms word documents at http://obninsk.name/obninsk_doc/ but that doesn't work at all for my word documents.




View Complete Forum Thread with Replies

See Related Forum Messages: Follow the Links Below to View Complete Thread
Convert Word-to-Text On Linux
How can I read a Word document and convert it to text (just in memory
is fine) on a Linux machine where there is no Word installed?

Can`t Output To Text/plain
I`d like to show my MySql query results in a plain text style. So inside my php file I wrote:

Plain Text Email
I'm wanting to protect all inputs for sending a plain text email, in a common
routine.

Have just found POSIX [:print:] which I thought looked useful.
I didn't want to use htmlentities(); because it's a plain text email.

Would this protect me from anyone sending spam though this?

$raw = stripslashes($raw);
$raw = preg_replace("/(content-type|bcc:|cc:|onload|onclick)/i", "DELETED",
$raw);
$raw = strip_tags($raw);
$raw = preg_replace("/[^[:print:]]/", " ", $raw);
$raw = substr($raw, 0, 500);
$raw = trim($raw);

Or, should I use:
$raw = htmlentities($raw, ENT_NOQUOTES);

The email address would obviously be different.
This would cover just the name, subject and message.
I don't need newlines etc.

Plain Text Database
i'm really a newbie to php but not OOP.

i'm designing a database to hold simple text messages to display in a
page called, "News". The client doesn't want a sql database so I
suggested a plain text database. I have it working but when I pull the
data (fopen) it all comes back as one line.

It's set up as a simple form passing 2 variable, $title and $comments.
They both write (fwrite) just fine to the .txt file but upon
retreiving them (fopen) it's all one line. Since I can't pass formated
text to a .txt file is there a different way?

As a newbie I haven't come across a solution yet. The client wants
this soon so I'm asking here due to the timeline. Given a few more
weeks I'm sure I'd stumble across it in some text.

RTF To Plain Text Conversion
does anyone know of a good PHP "module" -- or something else that I can invoke from a PHP script -- that will perform a simple conversion from Rich Text Format (RTF) to plain text with line breaks? I want to store some data in a MySQL database in RTF and allow users to preview the data as unformatted text (except for line breaks/paragraphs) on a webpage before deciding whether to download a file containing the RTF data. I'd rather not try to hack something out myself if I don't have to. The RTF files are likely to be created with Microsoft Word.

Extract Only Plain Text From A Page
Basically, what I am trying to do is write some PHP code that will automatically take text from any web page and eliminate all the HTML, CSS, and JS codes and formatting, leaving only the plain text from the page. I got my code started, but I have hit a snag with javascript and css codes. This is what I have so far:

<?php
$geturl = $_GET["url"];
ob_start();
include($geturl);
$page = ob_get_contents();
ob_end_clean();
$output = ereg_replace('<script.*.</script>', ' ', $page);
$output2 = ereg_replace('<style.*.</style>', ' ', $output);
$plaintext = strip_tags($output2);
echo $plaintext;
?>

The strip_tags function automatically removes all html tags, but it doesn't do anything to javascript and css because html code is not provided between the beginning and end tags, whereas javascript and css codes are both contained within two separate tags, like this for more clarification:

html:
<div name="htmltag">Keep this text here</div>

javascript:
<script>function somejs() {remove all this code}</script>

As you can see, the text between the div tags should stay, but the js between the script tags should be removed because it is code.

I then tried the ereg_replace function to get rid of js and css codes, but there is a problem when there is more than 1 piece of js or css code. The wildcard value (.*.) skips over any ending script or style tags until it reaches the last ending tag, therefore deleting all the text between the two pieces of code. Example:

<SCRIPT>function somejs() {remove all this code}</script> //removes all text and code from beginning here
KEEP ALL THIS TEXT HERE
<script>function somejs() {remove all this code}</SCRIPT> //to end here

Now finally down to the question, is there any way to only remove the js and css code between the beginning tag and the immediate next ending tag? Or is there any other way to get rid of the javascript and css codes?

Mail() Plain Text Vs. Html Format
I have been testing the mail() code below using MS Outlook and Outlook Express and a hotmail account and the details sent are always in "plain text" format, which results in the information being nicely aligned (incidentally the e-mail contains order confirmation with lots of columns).

However, my customer came back to me this morning to tell me that all is not well ! And rightly enough, when I looked at the snapshot he sent me he is receiving it in "html" format. What am I doing wrong ? Keeping in mind that I am a PHP greenhorn ... Can anyone help. Thanks in advance !

$headers = "From: info@somecompany.com
";
$headers.= "X-Sender: <info@somecompany.com>
";
$headers.= "X-Mailer: PHP
";
$headers.= "X-Priority: 1
";
$headers.= "Return-Path: "."<info@somecompany.com>
";
$headers.= "cc: info@anothercompany.com
";
$headers.= "bcc: me@mycompany.com
";
$headers.= "MIME-Version: 1.0
";
$headers.= "Content-type: text/plain; charset=iso-8859-1
";

if(@mail($to,$re,$msg,$headers))
{
// tell them all was sent fine
}
else
{
// give an error message
}

Inserting/parsing Plain Text With 'require'?
I am trying to setup a very simple site that will pull text files into an existing template. I am using a simple require
statement, such as:

<?php
require "/www/companyname/body.txt"
?>

The first problem is that it does not seem to respect the linefeeds, which are saved in Unix format, and just lists it as one
massive block of text. The second problem is that, obviously, it does not convert symbols such as '&' to '&'.

The reason behind this way of including text into HTML files is so that the lecturers can write articles without having to
deal with HTML and the articles are inserted into the HTML templates with the 'require' statement. Also, the shear
number of text documents that need to be posted would cause a lot of work. I have looked at Project Midguard, but I
tend to shy away from applications with little documentation, even though that would be absolutely ideal.

Sending Both HTML And Plain Text Email.
I am using php to send weekly newsletters to my mysql database, the emails are always in HTML only.

I was wondering if anyone knew how to send both types so that if they can't view HTML emails it will show just text?

Plain Text Email Spacing Issues?
PHP Code:

// create final message, $text refers to the textarea they typed the original message in $message2 = "Dear $firstname,

$text

Regards,
The Team......

Application/octet-stream Vs Text/plain When Uploading
when testing Zend's file upload script, i uploaded a file (sql.txt that was a sql backup) and $_FILES reported it as text/plain as it should. as soon as i renamed it to sql.sql, $_FILES now reports it as application/octet-stream. all i did was rename the file. to make matters worse, i thought Windows XPpro was adding some bits to the file to explain $_FILES new type so i renamed sql.sql to sql.exe; $_FILES now says it is text/plain, even with the .exe extention.

does anyone know why adding the .sql extention would change the type from text/plain to application/octet-stream?

btw, the webhost is a linux box (RH).

more tested extentions (renaming sql.txt to the following extentions)
.sql - application/octet-stream
.php - application/octet-stream
.gz - application/octet-stream
.tar - application/octet-stream
.htm - text/plain
.html - text/plain
.txt - text/plain
.exe - text/plain
.gif - text/plain
.jpg - text/plain
.asp - text/asp
.rpm - audio/x-pn-realaudio-plugin
.wav - audio/x-wav
.mp3 - audio/mpeg

(all i'm doing is changing the extention, nothing more. opening the file in notepad looks all ascii, no funky characters)

Phpmailer.class Messages Are Been Converted To Plain Text...
I am using a phpmailer class to send some staff over the email...

I am tring to send it with text/html but for some reason the email are
been converted to plain and all the headers are shown, here is the
email...

X-Tour4Less.co.il Mailer:
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="b1_27976937fb6a931b3ed2d40aebd76a26"

--b1_27976937fb6a931b3ed2d40aebd76a26
Content-Type: text/plain; charset = "windows-1255"
Content-Transfer-Encoding: 8bit

*יסיון עברית

--b1_27976937fb6a931b3ed2d40aebd76a26
Content-Type: text/html; charset = "windows-1255"
Content-Transfer-Encoding: 8bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"<html>
<head>

<META HTTP-EQUIV="Content-Type" content="text/html;
charset=windows-1255"</head<body<p dir=RTL><span
lang=HE>&#1513;&#1500;&#1493;&#1501;&nbsp;&#1495;&#1489;&#1512;&#1514;
Tour4less contact .</span></p<p dir=RTL><span
lang=HE>&#1502;&#1513;&#1514;&#1502;&#1513; &#1513;&#1500;
&#1492;&#1488;&#1514;&#1512; &#1513;&#1500;&#1504;&#1493; </span><span
dir=LTR>TOUR4LESS.CO.IL</span><span dir=RTL></span><span lang=HE><span
dir=RTL></span&#1492;&#1514;&#1506;&#1504;&#1497;&#1497;&#1503;
&#1489;&#1497;&#1510;&#1497;&#1512;&#1514; &#1511;&#1513;&#1512;
&#1488;&#1497;&#1514;&#1493; &#1506;&quot;&#1497;.........

How To Send A Plain Text Version Of An Email With Html
how can u send a plain text version of an email with the html so that
the users mail client can access this plain text version?

Sending Plain Text E-mail, Trying To Track Accesses
i am using php to dispatch from time to time using the mail() function. i have the message split into an html form and a plain text form, only one displays depending on the recipients mail client. my html message includes a 1x1 "image" that is really a php script, which allows me to track reads on the html version... but i don't know of a way to track reads/views/accesses/etc on the plain text version. is this possible?

Html To Word Convert With Php
Html to word convert with php. Is there any one to help to convert a html php document to convert by doc(Word file).

Convert Word To HTML
How to convert MS Word 2003 file to HTML using php script?

Read File MS Word, Convert File Txt To MS WORD,
1. how can i read file and content from MS WORD file..
2. how i can create or convert to MS WORD file with php..

Can I Convert MySQL Db Records Into Microsoft Word Documents?
i found the solution to export
file from mysql db into *.csv. but is there anyway to convert the
contents into *.doc and save in my webserver and providing a link for
the end users to download the word file?

FYI, the database records are obtained by end users submitting the
forms themselve and i saved it in my db...

Mime_content_type() For PNG Image Returns "text/plain"
PHP 4.3.8 with UNIX with option --with-magic_mime

Code: ( php )

Get The Last Word From A Text File
i have a text file that's update all the time. i need to get the last word from it. how can i do that?

Convert Text
I have a text like this: "Thuy&#7873;n V Bi&#7875;n"
How to convert it to: "Thuyền V* Biển"

Formatting Text In Word Document
I am building a web site that displays the contents from a MySQL database in a word file and saves it to disk. Problem is that I can create the file but I dont know how to format the text in the word document.

Finding A Key Word In A Text File
I would like to find a word stored in a text file.

Structure: I have one file named keyWords.txt that stores some key
words I'm interested in finding. In addition I also have a file named
textOrigin.txt in which I store the text to search in.
I would like my prog to check if a certain word appears in the text
and than to tell me what line it found it in (if it did...).

My problem is that the script can't find the words I'm looking for. I
took one word from the word list and put it into the text file to be
searched, for some reason this word is not found by the prog. I used
'enter' at the end of each line. The word being used is on line 3 in
the keyWords.txt file. I have some reason to belive that the reason
lie here:
if ($pos)
{
echo " line $i: $storeWord[$n]
";
}
I also tried it with if (!$pos === FALSE) {...} but nothing there
either...

the keyWords.txt file:
-------------------------------
Recording Site
Recording Type
INTRA
SUA
................

Echo To Text Box Only Returns First Word
I'm echoing values from a db to text boxes with php, but only the first words are returned. The db field is set to varchar(255). Can someone please tell me how to solve this small but annoying problem? By the way, I don't have any regular expressions or that kind of coding.

Extract Text From Word Documents
Is there a way to extract the text of a word document with php? And perhaps some of the formatting (like break lines, bold, italic,...)?

Insert Text, Ms Word Document
i've hit a wall regarding php and ms word. what i want is to open a
document containing bookmarks, insert text where the bookmarks are, and
save.

it's working, unless the bookmark is in the header part of the page (re
header/ footer). in that case i get an error saying the bookmark
wasn't found/ doesn't exist.

anyone got any tip on how to get into the header part of a word
document using php?

the following, simple code works when bookmark is in the main part of
document:

$empty = new VARIANT();
$word = new COM("word.application") or die ("some explanation");
$word->Documents->Open("C:PathDocument with bookmark.doc");
$word->Selection->GoTo(wdGoToBookmark, $empty, $empty, "bookmark");
$word->Selection->TypeText("text to be inserted");
$word->Documents[1]->SaveAs("C:Pathwith inserted text.doc");
$word->Quit();
$word = null;

How To Convert Any Text To Unicode?
How can I convert any text to unicode please help me

Convert Text To A Percent
Im making a useless little program that takes 2 peoples names and tests their love compatibility as a percentage. I dont want a random number generator because I want to make it so when you enter the same name twice you get the same result.

Any ideas how to put the to strings together to get a varying percentage. I have tried a few things. One i converted both the strings to md5 and then did similar_text() to compare... however the percentages were always low... I want a mixed result.

How To Convert Ascii To Text
I replace some user input with their ascii equivalent so they display
on the webpage properly:

$entry = preg_replace ( "/'+/" , '&#39' , $entry);
$entry = preg_replace ( "/,+/" , '&#44' , $entry);

I then need to email the data, however in email the ascii code is
displayed, not the text.

Is there an easier way to convert the ascii back to the text without
another preg_replace?

Full Text Search In PDF And Word Files ?
I need to perform full text searches on a batch of PDF and Word files.
What is the best way to go?

After some research, I'm thinking of extracting the plain text from the
files with "pdftotext" and "catdoc", hamonizing the various possible
encodings to UTF-8, storing the text in a MySQL database, and then
using the full text search capabilities of MySQL.
Do you think that would work well? I am told that the files are mostly
text and won't be longer than 30 pages.

Regular Expression To Underline A Given Word In A Text...
With the sentence :

"Bordeaux est au bord de l'eau"

How to do to underline, for instance, the word "eau" ? without underlining
the substring of "Bordeaux" ?
I don't know how to isolate the word...

My current code :

$text=eregi_replace("(".stripslashes($word_to_underline]).")","<b></b>",$
text);

but this underline "eau" in "Bordeaux" too and i don't want to !

Convert Php Data Into Text File?
I have a php file that gathers data from a specific website. I am then using that data (numbers) in a different application. The only problem is the html formatting. I just need the numbers, but I'm getting all the html tags with the data. Is there a way for me to have the php file output in to a seperate plain text file?

Convert Text From Database For HTML
I'm pulling text from a database (MySQL) and I'm using the nl2br () function to convert the line breaks

DESCRIPTION="<?php echo nl2br($row_rsttheJobResults['description']); ?>"

This creates the following example

"A new line will be created soon < 'br' >
and here it is."

However the website that receives this cannot accept the characters < and >
So I want to convert this "< br >" into this "&lt;p&gt;"

Could anybody help me add to the nl2br () function to convert the < br > tags?

* the spaces between the < and br are there because they won't show in the message otherwise

Search For A Specific Word Inside A Text File?
Basically what i'm trying to do is when a user inputs login/password information at a login page, I want PHP to search inside verify.txt and if it finds the login/password combination then allows the user to proceed. Is this possible? And if so, which functions would I use to get the job done?

Also, how can I save the login name so that it can be passed to/included in a url?

How Convert Http:// Text To Real Hyperlink?
I have some texts in MySQL databases. When I show this texts in webpages using PHP, I need to convert every ocurrence of

http://blablabla.com

to

<a href='http://blablabla.com'>http://blablabla.com</a>

, so my text will really became 'active'. The people who wrote the texts don't know HTML tags so I can't ask them to write explicity <a...> </a> commands. The problem is the same with text that contains the @ charactere, as

name@mailserver.com

. I have to change this to

<a href='mailto:name@mailserver.com'>name@mailserver.com</a>.

Please, I don't know how to use regular expression. I think It can make this replacemments very easy..

Convert Numerical To Text For Check Writing
Does anyone know of/have an open source class that will compose the text
version of a dollar amount? For instance, convert $525.62 to "Five hundred
twenty five and 62/100 dollars".

I'm trying to write a quick accounts payable app and I really do not want
to have to write this part from scratch!!

What Is The Preg For Capitals In A Word To Be Replaced By That Word Preceded By A Space
what is the preg for capitals in a word to be replaced by that word
preceded by a space?

i need to be able to do this in preg:

thisWord := this Word
AnotherExample := Another Example

strings with capitals sorrounded by other chars need to have a space
inserted before the capital.

Plain English GD Installation?
I finallly have php3, apache and MySQL installed and running. I would like to install or get GD working.

I am on Win98, php3, apache 1.3.9 and downloaded GD 1.8.
I looked over the readme, but does'nt make a lot of sense to me. Can anyone point me in the direction for, say a "GD install for dummies"?

Plain PHP Implementation Of Hash Function
I have a problem compiling the hash function from PECL into my PHP.

I get the error configure: error: C preprocessor "/lib/cpp" fails
sanity check

I would like to use a plain PHP implementation of these functions.

Is there a library of them around?

Phpmailer Class Converted To Plain On Some Servers...
I am using a phpmailer class to send some forms over the email...

And the problem is, that some ppl (especially problematic for me is the
buyer....) getting the email as rough data (sorce...) here is the emal
itself as they get it (the headers are below...)

Code:

X-Tour4Less.co.il Mailer:
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="b1_27976937fb6a931b3ed2d40aebd76a26"

--b1_27976937fb6a931b3ed2d40aebd76a26
Content-Type: text/plain; charset = "windows-1255"
Content-Transfer-Encoding: 8bit

рйсйеп тбшйъ

--b1_27976937fb6a931b3ed2d40aebd76a26
Content-Type: text/html; charset = "windows-1255"
Content-Transfer-Encoding: 8bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"<html>
<head>

and the rest of the email...

and here is the headers...

Code:

ESMTP; 01 Sep 2006 12:11:23 -0000
Received: (qmail 32712 invoked from network); 1 Sep 2006 05:11:23 -0700

Received: from localhost (HELO http://www.tour4less.co.il) (127.0.0.1)
by localhost with SMTP; 1 Sep 2006 05:11:23 -0700
Received: from phpmailer ([88.153.9.8])
by http://www.tour4less.co.il with HTTP (PHPMailer);
Fri, 1 Sep 2006 05:11:23 -0700
Date: Fri, 1 Sep 2006 05:11:23 -0700
To: undisclosed-recipients:;
From: "Tour4less.co.il" <########## // here was an email i delited...

Subject: Contact Email from Tour4Less.co.il
Message-ID: <27976937fb6a931b3ed2d40aebd76a26@www.tour4less.co. il>
X-Priority: 3
X-Mailer: PHPMailer [version 1.71]
X-Virus-Scanned: amavisd-new at sce.ac.il
Return-Path: ############## // here was an email i delited...
X-OriginalArrivalTime: 01 Sep 2006 12:11:02.0551 (UTC)
FILETIME=[B168A270:01C6CDBF]

Quickly Adding Text To A Mysql Text Field That Is NOT Empty
Is there a way to insert text into a mysql text field that already has
text into it; without having first to extract the existing data and
append the new text to that string variable and then insert the new
string.

Basically i'm looking for a way to do it with a single query not 2 (one
being a select to gather existing data).

Gaining Access To How MySql Parses Text For Full Text
I want to gain access to the function or process MySql uses to parse words and phrases for Full Text searching. Here is an example.

If the user inputs...

Milan in history

MySql will search for milan, history, and milan history. Is there a way to extract just the combination of terms MySql uses to search the db without the stop words? Stop words are automatically eliminated from the search request unless the user encloses a phrase in quotes.

What I am trying to do is develop a script to highlight found search terms and phrases. I can explode a phrase into single words but if I do that the stop words would be included in the array. If there is some way of getting into the parsed words or phrases MySql Full Text actually uses to search, I can use each of those combinations as a keyword in my highlighting script.

This request is about searching for the code or any code related to the questions involved.

Text Area Not Accepting Large Amounts Of Text
I have a form where teachers enter homework assignments and they are then stored in a MYSQL database and retrieved elsewhere.

I have been using "get" with the form. The code is simple:

<textarea name="array[assignment]" cols="60" rows="10" id="array[assignment]"></textarea>

It does allow post of 100 words, etc. Stuff that teachers normally submit.

What's happening is that it won't allow very large posts (over 300 words maybe? ) Not sure what the cut off is. When you press submit it won't go, or there is an error where it won't submit.

Is there a limit for text fields? Since the fields scroll, I didn't think that having only 60 rows was any type of real limit, I though you could put in as much as needed. But then all of it ends up in the address bar, so there must be a limit of some sort.

Imagettftext() Gives Grainy Text When Writing Aliased Text
I'm using a bundled version of GD: 2.0.23 compatible. When using the
function imagettftext() with a negative color to get a aliased text,
the text gets grainy. What could be wrong? I've tried several ttf
fonts, with the same result.

Read And Display Japanese Text From Text File
I posted a question regarding reading japanese
text from a text file.

Well, since I solved the problem, I thought I'd post my solution for
the benefit of other people with the same problem.

The plan was to make a script to read and display japanese text. I
will use it for making a japanese proverb script and for a japanese
language study script.

Method :

I wrote a simple kanji text file (saved with UTF-8 encoding)
I wrote a simple PHP script to display the file contents (saved with
UTF-8
encoding)
I specified the content-type header for the HTML page :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

*** All files have the same encoding. ***

UTF-8 supports japanese characters.

and it works!

this is my PHP (and HTML) script :

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>PHP : Japanese Text File Read : Exercise 1</title>
</head>
<body>

<?php

$filename="japanese.txt";
//open file
$fp = fopen($filename,'r');

//loop through each line in the file
while($line=fgets($fp))
{
//output current text file line
print $line."<br>";
}
//close file handle
fclose($fp);

?>

</body>
</html>

I know it's a very simple script, for testing purposes only. It
displays the contents of the japanese text file line by line.

The key was to save all files in the same encoding (I used UTF-8) and
to specify the encoding / charset in the HTML header (<meta
http-equiv="Content-Type" content="text/html; charset=utf-8">)

Php & Ms Word
I want to display a .doc file to an html page, using PHP. The doc file is located in a MySQL db. It will be used for displaying some announcements through the main web page.

COM And Ms Word
I've read "PHP and COM" by Harish Kamath and I've got a problem just at the beginning.
I'm not able to create an instance of the Word application:

"$word = new COM("word.application") or die("Unable to instantiate application object")".
It doesn't print the die message, it is simply always "sending request to 127.0.0.1 ..."
Is it a problem of "php.ini" settings? I'm using PHP4.0.6 on Windows2000 (Office2000).

COM & Word
I am trying to print a document after merging some data into it using PHP
COM and it seems to be very tempremental and doesn't work at all over
network printers. When I say tempremental, if it works on a machine it
works, no question, but if it doesn't it just goes nowhere and doesn't say
anything at all.

Defintely works on Windows XP Pro, but again tempremental, once it wasn't
working and I did everything possible, all office updates etc. but gave up
and just re-installed windows and it worked fine.

Windows 2000, i think i've had it working

Windows 2000 Server, not working.

All using Office 2000, but even tried the Office 2003 on Windows 2000 Server
and nothing at all.. :( It seems very badly documented.

The code I use to print: (a simplified just to do the task)

<?

$empty = new VARIANT();
com_load_typelib('Word.Application');
$word = new COM('word.application') or die('Unable to load Word');
print "Loaded Word, version {$word->Version}
<br/>";

$word->Documents->Open("c:/templates/Options.doc");

$output="";
$word->ActiveDocument->PrintOut(0,0,0,$output);

?>

Is there any known good resources for doing this? Can anyone shed any light
on this?!


Copyright 2005-08 www.BigResource.com, All rights reserved