'Database Tip: Only Counting Certain Values in an Aggregate Query'

Aggregates are awesome. But sometimes you want to do 2 counts in a query, and have one of them be more restrictive than the other. Lets say our database focuses on pictures, and rating them on a scale from 1-100. We want to know the average rating, the minimum rating, the max, the number of ratings, and the number of ratings over 75.

First let’s setup the tables. create table rating ( picture int not null, userid int not null, rating int not null, primary key (picture, userid) ) engine=innodb;

To test this, we’ll need sample data. We can used a stored procedure to populate the table. ` delimiter ;;

create procedure populate_rating () begin set @i = 0;

    repeat
            insert ignore into rating
                    set picture = floor(rand() * 1000) + 1,
                    userid = floor(rand() * 1000) + 1,
                    rating = floor(rand() * 100) + 1;
            set @i = @i + 1;
            until @i > 1000000 end repeat;

end; ; delimiter ;

call populate_rating(); `

The count(rating) will work to call all the values, but will not work to count all the values over 75. To get around this restriction, we use the sum() function wrapped around an IF statement to only count values over 75. select picture, avg(rating), min(rating), max(rating), count(rating), sum(IF(rating > 75, 1, 0)) from rating group by picture;

Works like a charm.

Enjoy.

If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.