Convert MS Word / Rtf / ... To Plain Text
i'm looking for standalone libraries that convert documents to plain text so i can let people edit the text in a textarea after uploading. One thing to notice is that i can not use COM because i can't configure the webserver.
Does anyone has interesting classes that are able to do this. I found a PHP class for ms word documents at http://obninsk.name/obninsk_doc/ but that doesn't work at all for my word documents.
View Complete Forum Thread with Replies
See Related Forum Messages: Follow the Links Below to View Complete Thread
Convert Word-to-Text On Linux
How can I read a Word document and convert it to text (just in memory is fine) on a Linux machine where there is no Word installed?
Plain Text Email
I'm wanting to protect all inputs for sending a plain text email, in a common routine. Have just found POSIX [:print:] which I thought looked useful. I didn't want to use htmlentities(); because it's a plain text email. Would this protect me from anyone sending spam though this? $raw = stripslashes($raw); $raw = preg_replace("/(content-type|bcc:|cc:|onload|onclick)/i", "DELETED", $raw); $raw = strip_tags($raw); $raw = preg_replace("/[^[:print:]]/", " ", $raw); $raw = substr($raw, 0, 500); $raw = trim($raw); Or, should I use: $raw = htmlentities($raw, ENT_NOQUOTES); The email address would obviously be different. This would cover just the name, subject and message. I don't need newlines etc.
Plain Text Database
i'm really a newbie to php but not OOP. i'm designing a database to hold simple text messages to display in a page called, "News". The client doesn't want a sql database so I suggested a plain text database. I have it working but when I pull the data (fopen) it all comes back as one line. It's set up as a simple form passing 2 variable, $title and $comments. They both write (fwrite) just fine to the .txt file but upon retreiving them (fopen) it's all one line. Since I can't pass formated text to a .txt file is there a different way? As a newbie I haven't come across a solution yet. The client wants this soon so I'm asking here due to the timeline. Given a few more weeks I'm sure I'd stumble across it in some text.
RTF To Plain Text Conversion
does anyone know of a good PHP "module" -- or something else that I can invoke from a PHP script -- that will perform a simple conversion from Rich Text Format (RTF) to plain text with line breaks? I want to store some data in a MySQL database in RTF and allow users to preview the data as unformatted text (except for line breaks/paragraphs) on a webpage before deciding whether to download a file containing the RTF data. I'd rather not try to hack something out myself if I don't have to. The RTF files are likely to be created with Microsoft Word.
Extract Only Plain Text From A Page
Basically, what I am trying to do is write some PHP code that will automatically take text from any web page and eliminate all the HTML, CSS, and JS codes and formatting, leaving only the plain text from the page. I got my code started, but I have hit a snag with javascript and css codes. This is what I have so far: <?php $geturl = $_GET["url"]; ob_start(); include($geturl); $page = ob_get_contents(); ob_end_clean(); $output = ereg_replace('<script.*.</script>', ' ', $page); $output2 = ereg_replace('<style.*.</style>', ' ', $output); $plaintext = strip_tags($output2); echo $plaintext; ?> The strip_tags function automatically removes all html tags, but it doesn't do anything to javascript and css because html code is not provided between the beginning and end tags, whereas javascript and css codes are both contained within two separate tags, like this for more clarification: html: <div name="htmltag">Keep this text here</div> javascript: <script>function somejs() {remove all this code}</script> As you can see, the text between the div tags should stay, but the js between the script tags should be removed because it is code. I then tried the ereg_replace function to get rid of js and css codes, but there is a problem when there is more than 1 piece of js or css code. The wildcard value (.*.) skips over any ending script or style tags until it reaches the last ending tag, therefore deleting all the text between the two pieces of code. Example: <SCRIPT>function somejs() {remove all this code}</script> //removes all text and code from beginning here KEEP ALL THIS TEXT HERE <script>function somejs() {remove all this code}</SCRIPT> //to end here Now finally down to the question, is there any way to only remove the js and css code between the beginning tag and the immediate next ending tag? Or is there any other way to get rid of the javascript and css codes?
Mail() Plain Text Vs. Html Format
I have been testing the mail() code below using MS Outlook and Outlook Express and a hotmail account and the details sent are always in "plain text" format, which results in the information being nicely aligned (incidentally the e-mail contains order confirmation with lots of columns). However, my customer came back to me this morning to tell me that all is not well ! And rightly enough, when I looked at the snapshot he sent me he is receiving it in "html" format. What am I doing wrong ? Keeping in mind that I am a PHP greenhorn ... Can anyone help. Thanks in advance ! $headers = "From: info@somecompany.com "; $headers.= "X-Sender: <info@somecompany.com> "; $headers.= "X-Mailer: PHP "; $headers.= "X-Priority: 1 "; $headers.= "Return-Path: "."<info@somecompany.com> "; $headers.= "cc: info@anothercompany.com "; $headers.= "bcc: me@mycompany.com "; $headers.= "MIME-Version: 1.0 "; $headers.= "Content-type: text/plain; charset=iso-8859-1 "; if(@mail($to,$re,$msg,$headers)) { // tell them all was sent fine } else { // give an error message }
Inserting/parsing Plain Text With 'require'?
I am trying to setup a very simple site that will pull text files into an existing template. I am using a simple require statement, such as: <?php require "/www/companyname/body.txt" ?> The first problem is that it does not seem to respect the linefeeds, which are saved in Unix format, and just lists it as one massive block of text. The second problem is that, obviously, it does not convert symbols such as '&' to '&'. The reason behind this way of including text into HTML files is so that the lecturers can write articles without having to deal with HTML and the articles are inserted into the HTML templates with the 'require' statement. Also, the shear number of text documents that need to be posted would cause a lot of work. I have looked at Project Midguard, but I tend to shy away from applications with little documentation, even though that would be absolutely ideal.
Sending Both HTML And Plain Text Email.
I am using php to send weekly newsletters to my mysql database, the emails are always in HTML only. I was wondering if anyone knew how to send both types so that if they can't view HTML emails it will show just text?
Plain Text Email Spacing Issues?
PHP Code: // create final message, $text refers to the textarea they typed the original message in $message2 = "Dear $firstname, $text Regards, The Team......
Application/octet-stream Vs Text/plain When Uploading
when testing Zend's file upload script, i uploaded a file (sql.txt that was a sql backup) and $_FILES reported it as text/plain as it should. as soon as i renamed it to sql.sql, $_FILES now reports it as application/octet-stream. all i did was rename the file. to make matters worse, i thought Windows XPpro was adding some bits to the file to explain $_FILES new type so i renamed sql.sql to sql.exe; $_FILES now says it is text/plain, even with the .exe extention. does anyone know why adding the .sql extention would change the type from text/plain to application/octet-stream? btw, the webhost is a linux box (RH). more tested extentions (renaming sql.txt to the following extentions) .sql - application/octet-stream .php - application/octet-stream .gz - application/octet-stream .tar - application/octet-stream .htm - text/plain .html - text/plain .txt - text/plain .exe - text/plain .gif - text/plain .jpg - text/plain .asp - text/asp .rpm - audio/x-pn-realaudio-plugin .wav - audio/x-wav .mp3 - audio/mpeg (all i'm doing is changing the extention, nothing more. opening the file in notepad looks all ascii, no funky characters)
Phpmailer.class Messages Are Been Converted To Plain Text...
I am using a phpmailer class to send some staff over the email... I am tring to send it with text/html but for some reason the email are been converted to plain and all the headers are shown, here is the email... X-Tour4Less.co.il Mailer: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="b1_27976937fb6a931b3ed2d40aebd76a26" --b1_27976937fb6a931b3ed2d40aebd76a26 Content-Type: text/plain; charset = "windows-1255" Content-Transfer-Encoding: 8bit *יסיון עברית --b1_27976937fb6a931b3ed2d40aebd76a26 Content-Type: text/html; charset = "windows-1255" Content-Transfer-Encoding: 8bit <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"<html> <head> <META HTTP-EQUIV="Content-Type" content="text/html; charset=windows-1255"</head<body<p dir=RTL><span lang=HE>שלום חברת Tour4less contact .</span></p<p dir=RTL><span lang=HE>משתמש של האתר שלנו </span><span dir=LTR>TOUR4LESS.CO.IL</span><span dir=RTL></span><span lang=HE><span dir=RTL></spanהתעניין ביצירת קשר איתו ע"י.........
Sending Plain Text E-mail, Trying To Track Accesses
i am using php to dispatch from time to time using the mail() function. i have the message split into an html form and a plain text form, only one displays depending on the recipients mail client. my html message includes a 1x1 "image" that is really a php script, which allows me to track reads on the html version... but i don't know of a way to track reads/views/accesses/etc on the plain text version. is this possible?
Html To Word Convert With Php
Html to word convert with php. Is there any one to help to convert a html php document to convert by doc(Word file).
Can I Convert MySQL Db Records Into Microsoft Word Documents?
i found the solution to export file from mysql db into *.csv. but is there anyway to convert the contents into *.doc and save in my webserver and providing a link for the end users to download the word file? FYI, the database records are obtained by end users submitting the forms themselve and i saved it in my db...
Convert Text
I have a text like this: "Thuyền V Biển" How to convert it to: "Thuyền V* Biển"
Formatting Text In Word Document
I am building a web site that displays the contents from a MySQL database in a word file and saves it to disk. Problem is that I can create the file but I dont know how to format the text in the word document.
Finding A Key Word In A Text File
I would like to find a word stored in a text file. Structure: I have one file named keyWords.txt that stores some key words I'm interested in finding. In addition I also have a file named textOrigin.txt in which I store the text to search in. I would like my prog to check if a certain word appears in the text and than to tell me what line it found it in (if it did...). My problem is that the script can't find the words I'm looking for. I took one word from the word list and put it into the text file to be searched, for some reason this word is not found by the prog. I used 'enter' at the end of each line. The word being used is on line 3 in the keyWords.txt file. I have some reason to belive that the reason lie here: if ($pos) { echo " line $i: $storeWord[$n] "; } I also tried it with if (!$pos === FALSE) {...} but nothing there either... the keyWords.txt file: ------------------------------- Recording Site Recording Type INTRA SUA ................
Echo To Text Box Only Returns First Word
I'm echoing values from a db to text boxes with php, but only the first words are returned. The db field is set to varchar(255). Can someone please tell me how to solve this small but annoying problem? By the way, I don't have any regular expressions or that kind of coding.
Extract Text From Word Documents
Is there a way to extract the text of a word document with php? And perhaps some of the formatting (like break lines, bold, italic,...)?
Insert Text, Ms Word Document
i've hit a wall regarding php and ms word. what i want is to open a document containing bookmarks, insert text where the bookmarks are, and save. it's working, unless the bookmark is in the header part of the page (re header/ footer). in that case i get an error saying the bookmark wasn't found/ doesn't exist. anyone got any tip on how to get into the header part of a word document using php? the following, simple code works when bookmark is in the main part of document: $empty = new VARIANT(); $word = new COM("word.application") or die ("some explanation"); $word->Documents->Open("C:PathDocument with bookmark.doc"); $word->Selection->GoTo(wdGoToBookmark, $empty, $empty, "bookmark"); $word->Selection->TypeText("text to be inserted"); $word->Documents[1]->SaveAs("C:Pathwith inserted text.doc"); $word->Quit(); $word = null;
Convert Text To A Percent
Im making a useless little program that takes 2 peoples names and tests their love compatibility as a percentage. I dont want a random number generator because I want to make it so when you enter the same name twice you get the same result. Any ideas how to put the to strings together to get a varying percentage. I have tried a few things. One i converted both the strings to md5 and then did similar_text() to compare... however the percentages were always low... I want a mixed result.
How To Convert Ascii To Text
I replace some user input with their ascii equivalent so they display on the webpage properly: $entry = preg_replace ( "/'+/" , ''' , $entry); $entry = preg_replace ( "/,+/" , ',' , $entry); I then need to email the data, however in email the ascii code is displayed, not the text. Is there an easier way to convert the ascii back to the text without another preg_replace?
Full Text Search In PDF And Word Files ?
I need to perform full text searches on a batch of PDF and Word files. What is the best way to go? After some research, I'm thinking of extracting the plain text from the files with "pdftotext" and "catdoc", hamonizing the various possible encodings to UTF-8, storing the text in a MySQL database, and then using the full text search capabilities of MySQL. Do you think that would work well? I am told that the files are mostly text and won't be longer than 30 pages.
Regular Expression To Underline A Given Word In A Text...
With the sentence : "Bordeaux est au bord de l'eau" How to do to underline, for instance, the word "eau" ? without underlining the substring of "Bordeaux" ? I don't know how to isolate the word... My current code : $text=eregi_replace("(".stripslashes($word_to_underline]).")","<b> |