The simplest way of selecting random rows from the MySQL database is to use “ORDER BY RAND()” clause in the query.
Solution 1 [SQL QUERY]
SELECT * FROM `table` ORDER BY RAND() LIMIT 0,4;
This query select 4 random value every refresh?
The problem with this method is that it is very slow. The reason for it being so slow is that MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.
There are several workarounds to speed things up.
The basic idea is to get a random number and then select a specific row using this number.
In the case that all the rows have unique ids we will just have to pick a random number between the smallest and the biggest id and then select the row with id that equals that number. To make this method work when ids are not evenly distributed we will have to use “>=” operator instead of “=” in the last query.
To get the minimum and maximum id values in the entire table we will use MAX() and MIN() aggregate functions. These functions will return minimum and maximum value in the specified group. The group in our case is all the values of `id` column in our table.
Solution 2 [PHP]
$range_result = mysql_query( ” SELECT MAX(`id`) AS max_id , MIN(`id`) AS min_id FROM `table` “);
$range_row = mysql_fetch_object( $range_result );
$random = mt_rand( $range_row->min_id , $range_row->max_id );
$result = mysql_query( ” SELECT * FROM `table` WHERE `id` >= $random LIMIT 0,1 “);
As we mentioned this method is limited to tables with unique id for each row. What to do if it’s not the case?
The solution is to use the MySQL LIMIT clause. LIMIT accepts two arguments. The first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1).
To calculate the offset to the first row we will generate a random number between 0 and 1 using MySQL’s RAND() function. Then we will multiply this number by number of records in the table, which we will get using COUNT() function. Since LIMIT arguments must be integers and not float values we will round the resulting number using FLOOR() function. FLOOR() is an arithmetic function that calculates the largest integer value that is smaller than or equal to the expression. The resulting code will look like this:
Solution 3 [PHP]
$offset_result = mysql_query( ” SELECT FLOOR(RAND() * COUNT(*)) AS `offset` FROM `table` “);
$offset_row = mysql_fetch_object( $offset_result );
$offset = $offset_row->offset;
$result = mysql_query( ” SELECT * FROM `table` LIMIT $offset, 1 ” );
In MySQL 4.1 and later we can combine two previous methods using subquery like so:
Solution 4 [SQL]
SELECT * FROM `table` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table` ) ORDER BY id LIMIT 1;
This solution has the same weakness as the solution 2 e.g. it only works for tables with unique ids.
Remember the reason we started looked for alternative ways of selecting random rows? Speed! So how do these methods compare in terms of execution times. I am not going to go into specifics of hardware and software configuration or give precise numbers. The approximate results are:
- The slowest method is solution 1. Let’s say that it took 100% of time to execute.
- Solution 2 took 79%.
- Solution 3 – 13%.
- Solution 4 – 16%.
The winner is solution 3.
This is quite a hot information. I’ll share it on Digg.
FANTASTIC!
Of course we may share our knowledge to everyone.
Thanks and Regards
–Rana
The style of writing is quite familiar . Have you written guest posts for other blogs?
I took solution 4, i have 400 id’s and he always takes out of the oldest 6 , how is this possible?
Rana vai, Its cool
. But I afraid it will fail if you use apache benchmark test. I already suffered with that. Rand() function creates scalability problem whether way you use
Well, I prefer 3rd solution.
2nd and 4th are even wrong! If you have deleted any of your rows, these solutions would return some rows more often than others.
For example, if I have table with ids:
1
101
102
103
…
119
That is, I have deleted rows from 2 to 10.
So, 2nd and 4th solution would return row with id=101 with a possibility of 100/119, and every other row with a possibility of 1/119. Is this random? I don’t whink so..
Moreover, these results (time for execution) depends on number of rows in your table.
If you have to choose X *unique* random records, and row count is not big (for example 10), I’d prefer 1st method.
And if I have just to shuffle some rows, I’d always use 1st method (for example answers to some question), because you’ll need to count random number to each row anyway.
But if you need to choose just 1 random row – I accept, 3rd solution is the best.
Marius