Tracking Forums, Newsgroups, Maling Lists
Home Scripts Tutorials Tracker Forums
 
  HOME    TRACKER    MYSQL




SELECT DISTINCT On A Varchar Column (big Table)


I have a table with 10 million rows in it. Now, I need to export a list of unique values from a varchar column. Normally I would do:

SELECT DISTINCT MyColumn INTO OUTFILE '/dumpfile.txt' FROM MyTable

But this is VERY time consuming and slows my PC down.

Can you give me an advice about what should I attempt to make it last the less?
For instance, should I create an index on MyColumn AND THEN execute the SELECT DISTINCT?

I may look silly, but I also tried using a Memory table thinking I would cut the indexing computation time but my RAM is not big enough to contain all the rows.




View Complete Forum Thread with Replies

See Related Forum Messages: Follow the Links Below to View Complete Thread
Select A Row When A Column Is A Varchar
I have created a table like

create table table_name (
name varchar(20),
password varchar(20),
id int
)

insert into table table_name values("user1", password("user11"), '1')
insert into table table_name values("user2", password("user22"), '2')
insert into table table_name values("user3", password("user33"), '3')

But I have problem when I try to retrive the value.

How can I retrive a row using password

Select * from table_name where password = password("user22");
This will give an error.

Distinct Column In Multi-Table Query
I want one article from each 'sub topic'. Each sub topic is given a 'sub_id'.

PHP

$garticle = mysql_query("SELECT DISTINCT(feeds.sub_id),
articles.title, articles.feed_id, articles.abstract,
articles.link, articles.date, site.articles,
$formula AS importance
FROM articles,feeds,site
WHERE articles.date > DATE_SUB( now() , INTERVAL 1 day )
AND articles.feed_id=feeds.feed_id
AND feeds.site_id = site.site_id
ORDER BY articles.article_id DESC LIMIT 16", $connection)
    or die(mysql_error());

The article has a 'feedIid', this JOINS the feeds table. The feeds have a 'sub_id'. I only want the 16 most recent articles from DISTINCT 'sub_id's

Slow Select Using Count(distinct) In A Table Bigger Than 100000 Records
Recently I started using MYSQL in my enterprise. I made a table which has around 100000 records. The problems is that it is really slow.. Im trying to do a query in which I get the number of distinct users per day.

This is my query:

select date(startedDate) as mydate, count(distinct(Users)) as users from Mytable
group by mydate

It is really simple and it does it correctly but it takes one minute.. One minute is not too much time but i need to insert around 10 000 000 records and thats what worries me.....

SELECT DISTINCT, (and Display Other Fields Not Distinct.)
I am using SELECT DISTINCT to select 1 of a duplicted field. So far I have;

SELECT DISTINCT `field1`, `field2`, `field3` FROM table1

This returns what I need. There is also another field (field4) which I also want to select, but not distinctly.

Something like: SELECT DISTINCT `field1`, `field2`, `field3` NOT_DISTINCT `field4` FROM `table1`

The field that is not being selected distinctly contains a '1' or a '0'. My table is ordered by field4 (0 first) does this mean the select distinct will select those with '0' before those with '1' (I want '0' to have priority when select distinct)

I will only be using this SQL query once to remove duplicates from a database, I am not concerned about performance issues which someone has mentioned to me.

How can I display this not-wanted-distinct field in a distinct query?

Select Distinct And Include Non-distinct Columns
I have a publication table that tracks the products assigned to various publications.

I want to select all of the distinct products, based on product_ID, assigned to a specific publication but I also want to return additional columns that do not need to be distinct.

If I use the following select:

Select distinct publication.product_ID, publication_ID.code, publication.region from publication where publication_ID = '12'

I would get a list showing the three columns that I want to see which are product_ID, publication_ID and region for publication 12 but I get too many items as the distinct function means that all three of these columns must be distinct and I only need the product_ID to be distinct.

So how do I find all of the distinct product_ID but also show other columns such as region?

Do I need to do this with some kind of self-join?

Column VARCHAR > 255 ???
I'm building a custom CMS and I need to know what field type I can use to get > 255 characters as char & varchar have a 255 limit.

The column will hold news item details...

Lastly how do I ensure line breaks remain ? htmlentities and html_entity_decode?


Varchar Column
just wanted to find out if it was possible to prepend a string to a varchar column
i accidentally set one of our columns to int, when it should have been varchar
and everything is set as 412345678
when it should be 0412345678
is there a function tthat i can use eg.

Code:


update column set column = prepend('0', column);

Selecting MAX Value Of VARCHAR Column
MySQL Version = 4.0.17

I am having some troubles with a db and php written by another developer - unfortunately I do not have the luxury of altering the way in which this has been implemented - basically I am just trying to patch this up!

Basically its a table of appointment time slots.

The first column is VARCHAR(10) and stores the ApptID

eg A9826

Each time an appointment slot is created, the PHP script runs the following SQL:

SELECT MAX(ApptID) FROM AppointmentDates
The script then gets the substring of the result so that it only has the numeric content - eg A9826 becomes 9826.

The script then adds 1 to this value in order to create the next ApptID (dont ask me why its been implemented in this manner! )

Basically, the appointment ID has now reached 10000+ and when the SQL runs, it always returned 9999 as the max.....Therefore making the next ApptID = 10000 - creating a duplicate key error when attempting to insert...

My question is - is there a limit on the value that MAX can return? Or is there some other explanation for this?

Sorting By Number On A VARCHAR Column
I have a table containing a field ref_number

Example column contents:

1
2
3a
3b
5a
5b
6
...

It's a number generally but may also have a or b or c etc tacked on the end. So I had to make this field a VARCHAR but now when I order ASC on this, it doesn't display in the order expected i.e.

1
10
11
19
2

So, this is my attempt to get around the issue:


SELECT *, RIGHT(CONCAT('00000',ref_number), 5) as number_sort FROM item ORDER by number_sort ASC;

Math Operation On A Varchar Column
I have a numerical value that, thanks to the peculiarity of where I must place this value, is in a VARCHAR column.
Can I do a basic incremental math operation on its value

Primary Key On Long Varchar Column + Utf8
I wish to create a table with 4 columns (varchar(10),varchar(512),int(11),varchar(16)) with primary key composed by the three first columns.
With default charset,it's work,but If I define default charset to utf8, the following error occured:

ERROR 1071 (42000): Specified key was too long; max key length is 999 bytes

I guess it's because of utf8 which makes char encoded in 16 bits instead of usual 8, leading to all fields size > 999 bytes, even if utf8 is just use on second column (max length 512).

Is there any way to set maximum key length to something bigger than 999 bytes?

DISTINCT On Only One Column?
I would like to select two columns from a table and have MySQL use DISTINCT to filter only one of the columns.

So for example:

Wanting Distinct Rows (by Only One Column), Not Sure How To Do It
I have a table `logins` with id, name, ip, mac, date

ip and mac are both tracking information for name and I'm trying to write a query that, when given a name, selects all names that person has logged in with by cross-checking their ip and mac with every record we have for that name

I have this query so far but it gives duplicates in the name field. What I'm trying to do is get it to give me each name only once and the most recent date associated with that name.

SELECT *
FROM `login_history`
WHERE `ip`
IN (

SELECT `ip`
FROM `login_history`
WHERE `name` = 'Name'
)
OR `mac`
IN (

SELECT `mac`
FROM `login_history`
WHERE `name` = 'Name'
)

Varchar Select Max
I am nigrating from db2 to mysql.

I have a column System_Id that is Varchar.

Assuming I have 10 rows, System_id values 1 to 10.

When I do a Select value( Max (System_Id),0) in DB2 I am getting 11 as the result.

But when I do a Select coalesce(Max(System_id),0) in mysql it is returning 9.

Since the next value of my system id depends on this value, I am pretty much in trouble. Is there any way to work around this rather than having to change the column type to Integer (because that would involve changing quite a bit of my java code as well)

Select A Varchar
I'm can't seem to find this in the documentation and not sure whether it's even possible....

I've got a varchar field and I want to select all and find the length of the field that has the most characters. I'm not bothered about what field it is, just the number of characters.

Retrieving The Quantity Of Each Distinct Number From A Column
I have this table in which each row is associated with a category number.
The category number is NOT unique, so basically more than one row in the
table can have the same category number. Now, I want to know how many times
each category number appears in the table. For example, if i have a category
number, say, 5, i want to know how many '5's there are in the same table. I
want to do this all at once and place the results in 2 columns, one for the
category number and the other for the quantity associated with each number.
how do I do that with mysql? (I can process it with php of course, but i
just think it's cool to do it with mysql), is it possible to do it with one
statement? Does it involve the Count function?

Select Varchar Vs Join
(i'm considering minimizing the normalization, but i'm not sure how this will affect performance.)

if i make a varchar column an index, how much slower would that be than an integer column?

Pivoting A Column W/ Undetermined Number Of Distinct Values
Suppose I have 2 tables and each table has 2 columns (name, category). I want to make a script that takes in one parameter from the user specifying which table they want to use and then give them a table that has as many columns as there are distinct values within the category column and a count of all people belonging to that distinct category. Code:

Invalid Distinct Recordset Returned On An Indexed Column
I have a table with about 1.2 million records. I have an index set on a column.

For close to two years, this query has worked perfectly fine:

SELECT DISTINCT `Mgmt_Area` as thevalue, `Mgmt_Area` as valueid from qcdata ORDER BY thevalue

Note, this is a programmatically generated query based on some user selections. This query actually is used to create a listbox. The index has 62 separate values.

This query would usually return: Code:

Select Year From Varchar Field
A cartographer customer of mine has populated a MySQL varchar field with information about the date in which his maps were created. He wants to pull out the date and sort records by it.

The rules he uses for approximating dates are as follows, where YYYY signifies the date he wants to sort by, zzzz represents garbage:

1) Explicit year:

YYYY

2) Year approximated:

{YYYY} zzzz

or (YYYY) zzzz

e.g. {1984} 1987 and he wants 1984

3). Circa:

ca. YYYY

---- Therefore what I would like to do, is pull out only the first four numbers, and use those numbers to sort on.

SELECT [grab the first four numbers only from column "Period"] AS year FROM maps ORDER BY year.

Difference Between Select * And Select Distinct *
What is the difference between SELECT * and SELECT DISTINCT *
Which one is faster?

Select With Distinct Only One Row
Here is my sql command and the result:

mysql> select distinct(Name), Location from inv_view limit 5;
+------------------------+------------------------+
| Name | Location |
+------------------------+------------------------+
| Adjustable Race | Subassembly |
| Adjustable Race | Miscellaneous Storage |
| Adjustable Race | Tool Crib |
| All-Purpose Bike Stand | Finished Goods Storage |
| AWC Logo Cap | Finished Goods Storage |
+------------------------+------------------------+
5 rows in set (0.03 sec)

But, this is not I want. I want not to show row 2 and 3.
The point is, that I want to show the table with column Name unique, regardless of another column.

Select Distinct
I have a DB with song info. The main fields we're dealing with are:

Album
Artist
Title

What I'm trying to do is list the 5 most recently added albums. Considering the fact that albums have several songs each, I want to just display the [Artist] and [Album] one time.

I used the following mySQL statement
SELECT DISTINCT album, artist, buycd, date_added, picture, albumyear from songlist WHERE songtype='S' GROUP BY album, artist ORDER BY date_added DESC LIMIT 5 Code:

Select Distinct
I have a query where I am searching through a DB multiple times and don't want duplicates. For example I might search through the DB which has 15000 records for all the records that have a 12 in the number field.

The catch is that there are multiple numbers in the field so the same company could come up 4 or 5 times but ultimately it will all bring up the same data for the 4 or 5 companies.

When I use the distinct method it will only return the name not any of the other fields that I need for the query I am processing. Is there a way to remove the duplicates out of the search and still be able to access the other fields? So I can get a distinct name and the corresponding ID?

Select Distinct
I need to do a query on a db, the db has several columns that I need to use such as name, url, uid

there are duplicates in name but not url or uid and I only want to pull 1 each of the names such as:

name url uid
dave www.yahoo.com 1
dave www.google.com 2
joe www.joesplace.com 3
joe www.joesplace2.com 4

I want to pull just 1 each of dave and joe. I have tried using:

SELECT DISTINCT name FROM database

but that only pulls the name column any ideas on how to accomplish my task?

Select * With Distinct On Some
How can I select all fields in a table, but only want distinctive records based on 2 or 3 fields?

Select Max Distinct
i have a users table and a row (banda1) where the users write yours favorite musical band...

now, i want do a "Top 10 more" where select the name (distinct) in order by more write name...

Select Distinct Max
I have a table with the following pertinant fields:

completed
quarter

For each quarter there will be 10 records (40 records per year). A date will be inserted when each of the 10 requirements are fullfiled.

What I want to do is get the last date the 10 requirements are completed so I can see when the querterly is completed.

1. completed will always be a date
2. quarter will look like "1:2005", "2:2005", "3:2005", "4:2005", "1:2006"


So I want to:

1. Select quarterly
2. Select completed
3. Select distinct quarterly where MAX completed

SELECT DISTINCT
I'm using a SELECT DISTINCT to grab values from mysql, my question is,
is it possible to get other field data from the statement, besides the DISTINCT data,

example

sql = "SELECT DISTINCT Value1,Value2 FROM SomeTable WHERE id="&whatever.

can I only get data from Value1,Value2, or is there a way to add in other field names to get data from without them being part of the DISTINCT directive?

Select Distinct
if i do a
select distinct field1 from table1

ill get the distinct field1

but how do i get distinct field1 values and distinct field2 values.

if i do a
select distinct field1,field2 from table1

the result will contain similar values too...

SELECT DISTINCT
Is there a way to select records where there is more than one record containing the same fruit?

CODE+-----------------+
| tbl_fruit       |
+-----------------+
| Apples          |
| Pears           |
| Apples          |
| Bananas         |
| Apples          |
| Bananas         |
+-----------------+

Select Distinct Only On Certain Columns?
We have a table with 10 columns, which log user actions.

Let's say four of them are: time, visitor, visitor_info, siteid

sample data looks like this:
time, visitor, visitor_info, siteid
a)10, 1, 2, 5
b)10, 2, 7, 9
c)11, 1, 2, 5
d)12, 3, 8, 9
e)12, 1, 2, 5

as you can see rows A, C, E are identical except the time part. If I do a select distinct on it, all three rows are returned (as they should), but is it possible to do a select distinct ignoring a specific column, so it would only return a result of one row? (ie a distinct only on visitor, visitor_info, and siteid)?

This is for MySQL5 (using PHP5).

Select Distinct On One Field
Let's say I have a table called T1 with the fields id, datetime and userid.

It can look like this:

T1
id | datetime | userid
1 | 2007-03-03 20:20:20 | 1
1 | 2007-03-03 20:20:25 | 1
1 | 2007-03-03 20:20:30 | 1
1 | 2007-03-10 17:15:30 | 2
1 | 2007-03-10 17:15:45 | 2
2 | 2007-03-15 18:34:45 | 1
I'd like to select distinct values of id. By doing my select query I would like to retrieve two rows where the id is 1 (one with userid 1 and one with userid 2) and one row where the id is 2.

Select Distinct From 2 Fields?
I have a table with 2 similar fields containing state names (e.g. primary_state and secondary_state). How can I find distinct states from both fields?

Select Distinct Union?
I have a query script that works with a classifieds script

class_prodimages.pid can have more than one image entry

my client only wants only one image to be displayed

here is what I have now

$sqlm = @mysql_query("
SELECT
class_products.shortDescription,
class_products.id,
class_products.title,
class_products.price,
class_prodimages.image,
class_prodimages.pid
FROM class_products,class_prodimages
WHERE class_products.featured = 'Y' AND class_prodimages.pid = class_products.id
");
I have put DISTINCT all over the place but it doesnt work.

then searche the forum and see union and tried from the examples but its also not working



SELECT shortDescription, id, title, price
FROM
class_prodimages
UNION
SELECT image, pid
FROM class_prodimages
WHERE class_products.featured = 'Y' AND class_prodimages.pid = class_products.id

Select Distinct From Two Different Tables
I'd like to run a query that will pull distinct items from two separate tables. For example, I have a "user1" table and a "user2" table.

In the "user1" table is a field called "username". In the "user2" table is a field called "thisuser". Some users are in both tables - some are only in one.

I want to run a query to get me a list of all of the users, and only show them once even if they appear in both tables

Any Way To Speed Up Select Distinct
I have a table with a varchar(255) column, it has 111,000 rows. When I do a
select distinct on that column it takes 16 seconds and returns about 25
distinct values, I'd like it to take much less time. I tried creating an
index but the explain on the query shows that it isn't using an index. Is
there any way to speed this up or should I just maintain another table with
the distinct values myself?

Select Distinct On 2 Tables
using mysql 4.0.21

is this the best way to get distinct names from both tables where shift is
20031007:

SELECT m.user_full_name FROM press_maint m
WHERE shift_date = 20031007
UNION
SELECT p.user_full_name FROM press_prod p WHERE shift_date = 20031007
GROUP BY user_full_name

Select Distinct Records
I am trying to compare the date part of a datetime value field with today's date....

Here's the sql:

mySQL = "Select * from Test WHERE TheDate LIKE'"&date()"' ORDER BY TheDate"
Set rs= Con.Execute( mySQL )

That return nothing even though Test has records for today..

Distinct Select On 2 Tables
I'm trying to do a select distinct data from from 2 tables and the solution I'm having almost works but it's still pulling multiple records from both tables.

Here's the Select statement I'm trying:

SELECT DISTINCT table_1.table_1_id, table_1.title, table_2.field_1 FROM project, conversation WHERE table_1.table_1_id = table_2.table_2_id

Select Distinct Order
I have a database containing my server log files. A sample of the rows would be similar to:

+--------------+------------+-----------------+--------+
| ip | time | resource | status |
+--------------+------------+-----------------+--------+
| 195.93.21.71 | 1112868867 | /about/ | 200 |
| 195.93.21.4 | 1112868869 | /css/normal.css | 200 |
| 195.93.21.2 | 1112868872 | /css/print.css | 200 |
+--------------+------------+-----------------+--------+

Some of the IPs and resources will be the duplicates in the full table. I can select out the number of unique IPs and resources with the following query:

SELECT COUNT(DISTINCT ip) FROM stats

But what I now want to do is find which resource has been requested the most. By this I mean extract the distinct resources (one of each) ordered by number of times it occurs. I've been trying variations on the following with no luck:

SELECT resource from stats ORDER BY COUNT(DISTINCT resource)

Can anyone tell me if this is even possible and/or point me in the right direction to go?

SELECT DISTINCT W/ Joins?
I have three tables - one for documents (document), one for documentviews (documentview) and third for connecting a specific news to a specific documentview (doc2docview). The doc2docview table has a structure of ID, owner (the document ID) and reference (the documentview ID).

One document can be linked to several documentviews, which creates a problem - with this query, if a document is assigned to three documentviews, I get the same title three times.

SELECT DISTINCT document.title as title,
doc2docview.owner,
document.ID as did,
document.summary as summary,
document.source as source,
document.ctime as time,
documentpart.filename as filename
FROM doc2docview
LEFT JOIN document ON doc2docview.owner = document.ID
LEFT JOIN documentpart ON document.ID = documentpart.document
WHERE reference IN (55,56)
AND
document.status = 'A'
ORDER BY document.ctime DESC
LIMIT 100

What should I do to get rid of the duplicates?

Select Distinct On Two Tables
I have two tables, both containing an 'authors' column. Is there a way
to get a unique list of authors from the two tables?

I tried SELECT DISTINCT `authors` from `table1`, `table2`;

but I got an "Column 'authors' in field list is ambiguous" error.

Is there also a query to return only the count of distinct authors from
the two tables?

Increment Distinct In Select
Is it possible to create a field in a select query that is a counter of unique IDs

eg. I have a table of data that includes sessionIDs I want to display the data and number the sessionIDs, so for the first unique sessionID encounters I would want the value to be 1 and then when a second unique sessionID is enountered it would be 2 etc...

MySQL Select Distinct
I have a table that i use to log IP Addr's and dates of people looking at my site, I want to count the entries for each dates so I can make a graph, any ideas?

table is simple looks like;

t_id    t_ip     t_date

Select Two Fields With Distinct
Here is my query:
CODESELECT DISTINCT(LOCATION) AS LOC FROM tblNetDesc
     LEFT JOIN tblNetClass ON tblNetClass.NetClassID=tblNetDesc.NetClassID
     LEFT JOIN tblLocation ON tblLocation.LocationID=tblNetClass.LocationID;

Use Of Index For SELECT DISTINCT
I may be wrong but i find something strange in index use when selecting distinct value. For example, let's assume we have a table like that:

a: integer
b: varchar(128)

Let's say that b has only a few distinct values but the table has MANY rows (2 000 000). The column b is indexed, with of course a VERY low cardinality.

Why optmimizer will always scan the 2 000 000 (using index on b though) to execute query SELECT DISTINCT b FROM table while it knows there are only a few distincts values from the index?

Select Distinct On 2 Tables
using mysql 4.0.21

is this the best way to get distinct names from both tables where shift is
20031007:


SELECT m.user_full_name FROM press_maint m
WHERE shift_date = 20031007
UNION
SELECT p.user_full_name FROM press_prod p WHERE shift_date = 20031007
GROUP BY user_full_name

SELECT DISTINCT() Issue
I'm trying to select rows with a distinct field along with several other fields in the select query. The query looks like this:

PHP Code:
 SELECT 
 DISTINCT(user_log.session_id),
 users.username,
 user_log.section,
 user_log.session_id,
 user_log.access_time 
FROM user_log 
LEFT JOIN users 
ON user_log.userid = users.userid 
WHERE user_log.access_time > (" . time() . " - 1200) 
ORDER BY user_log.access_time DESC 
LIMIT 0,40


Copyright © 2005-08 www.BigResource.com, All rights reserved