Creating mass data in MySQL

This started out on afternoon as one of those (what you think, naively, when you begin) quick and dirty projets just to see what happens. Then 5 hours later, when you've scoured the internet, tried several pre-existing github projects, you wind up writing your own nasty code because you just can't quite find what you're looking for.

AWS RDS Storage Autoscaling. Right? Easy to test and play with. All I wanted to see as the EventBridge notification that was created when autoscaling kicked in. Create database in AWS, fill database with data, observe autoscaling, profit (with knowledge). So off I set to test.

Creating the RDS was easy, some very basic ClickOps as was most certainly a test and didn't need to be anything to spectacular, open the DB to the public IPs I'd be hitting it from (again, far from best practice but this was meant to be a quick test damnit!). Database creating, phase 1 complete.

Now, 20GB to fill, how hard can that be? Right...

Some quick Googling, showing how to quickly mass import data, that doesn't help as I don't have a file to import, I need to generate fresh data.

Stumble upon https://github.com/Percona-Lab/mysql_random_data_load. Looks promising, run it up from a VM.

ubuntu@always-on2:~/mysql_random$ screen ./mysql_random_data_load test t3 150000000 --host database-1.dsadisfbi3ob.us-west-2.rds.amazonaws.com -u admin -p<password>

Didn't read the documenation too closely but let that run for a few hours. And, to be fair, it appears to do its job very well. Creates good looking data. But not enough of it. I let that run for a good 3 or 4 hours. It's burnt through about a GB of storage, but this is costing me cents per hour. I need this to be faster, I'm not made of money.

So we revisit some of those earlier posts about speed. mysqimport seem to be the answer. How big can a field by in MySQL? Big apparently, for blob storage. This gets me to thinking, let's generate a quick and dirty csv from some random file on the computer, mysqlimport it, and boom! Profit.

After several attempts this is where we land. This is the database and table you'll want to create and also add some permission to the admin user (I'd never do this on a prod DB btw, this is purely for testing purposes)

CREATE DATABASE test;
USE test;
CREATE TABLE stock(ID INT, Name VARCHAR(40), PHOTO LONGBLOB, Quantity INT);
GRANT SESSION_VARIABLES_ADMIN ON *.* TO 'admin'@'%';

This is then how I pumped it with data. Effectively create a dummy, base64 it to minimise the change of some obscure MySQL error, then create an ugly, hacked together csv file looping over the same data and pumping it into the DB. Pretty? No. Eloquent? No. Effective? TBC.

dd if=/dev/urandom of=randomdata bs=10240 count=1000
base64 -w 0 randomdata > basedata.txt
j="5000"
echo $j
while [ $j -lt 100000 ]
    do
    echo $j
    i="0"
    echo '' > stock
    while [ $i -lt 100 ]
        do
        echo $[$i+$j]
        echo -n $[$i+$j], 'lucas', \'>> stock
        cat basedata.txt | tr -d '\n' >> stock
        echo \', 2 >> stock
        i=$[$i+1]
    done
    mysqlimport --local \
        --compress \
        --user=admin \
        -pLucas123! \
        --host=database-1.cz8whn5szavq.us-west-2.rds.amazonaws.com \
        --fields-terminated-by=',' test2 stock 
    j=$[$j+1000]
done
rm randomdata
rm basedata.txt

Watching on RDS CloudWatch monitoring...

RDS Autoscaling Storage Screenshot

Success!

And the output:

RDS Autoscaling Storage Notification

Was it worth it? God no. Was it interesting none the less? Absolutely.