Create thousands of rows of realistic test data

Oracle query performance testing, analyzing how adding an index impacts the execution plans or testing a ReSTful API pagination needs data. Having a quick way to generate tens and thousands of records is always helpful.

Mujahid Khaleel
Feb 11, 2019

Oracle query performance testing, analyzing how adding an index impacts the execution plans or testing a ReSTful API pagination needs data. Having a quick way to generate tens and thousands of records is always helpful.

Recently had a similar need, start crafting a query to create random and sparse data. Also required the data to look realistic in date-of-birth, email, and ZIP fields. Below is a sample script from that effort. Oracle’s DBMS_RANDOM package is utilized extensively.

Person table structure

CREATE TABLE PERSON (
  ID            NUMBER,
  FIRST_NAME    VARCHAR(50),
  LAST_NAME     VARCHAR(50),
  EMAIL         VARCHAR(50),
  GENDER        VARCHAR(50),
  STREET        VARCHAR(50),
  CITY          VARCHAR(50),
  STATE         VARCHAR(50),
  ZIP           VARCHAR(50),
  SSN           VARCHAR(50),
  DATE_OF_BIRTH DATE,
  CONSTRAINT PK_PERSON PRIMARY KEY (ID)
);

PL/SQL block to generate rows

Below PL/SQL block is inserting one row at a time in the loop. There are other alternate ways as well.

DECLARE
  GENERATE_ROWS NUMBER  := 1000;
  START_DOB_YEAR NUMBER := 1940;
  END_DOB_YEAR NUMBER   := 2018;
  START_DOB_DATE DATE;
BEGIN
  START_DOB_DATE := TO_DATE(START_DOB_YEAR || '-01-01', 'YYYY-MM-DD');
  FOR I IN 1 .. GENERATE_ROWS
  LOOP
    INSERT INTO PERSON (ID, FIRST_NAME, LAST_NAME, EMAIL, GENDER, STREET, CITY, STATE, ZIP, SSN, DATE_OF_BIRTH)
    SELECT TO_NUMBER(SYS_GUID(), 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') AS ID,
           DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) AS FIRSTNAME,
           DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) AS LASTNAME,
           DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) || '@' || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 10)) || DECODE(TRUNC(DBMS_RANDOM.VALUE(1, 4)), 1, '.com', 2, '.net', 3, '.org', 4, '.edu') AS EMAIL,
           DECODE(ROUND(DBMS_RANDOM.VALUE(1, 2), 0), 1, 'M', 2, 'F') AS GENDER,
           TRUNC(DBMS_RANDOM.VALUE(1, 500)) || ' ' || DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) || DECODE(TRUNC(DBMS_RANDOM.VALUE(1, 4)), 1, ' Rd', 2, ' Drive', 3, ' St', 4, ' Cir') AS STREET,
           DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 25)) AS CITY,
           DBMS_RANDOM.STRING('U', 2) AS STATE,
           LPAD(TRUNC(DBMS_RANDOM.VALUE(100, 99999)), 5, '0') AS ZIP,
           TRUNC(DBMS_RANDOM.VALUE(100, 999)) || '-' || TRUNC(DBMS_RANDOM.VALUE(10, 99)) || '-' || TRUNC(DBMS_RANDOM.VALUE(1000, 9999)) AS SSN,
           TO_DATE(TRUNC(DBMS_RANDOM.VALUE(TO_CHAR(START_DOB_DATE, 'J'), (TO_CHAR(START_DOB_DATE, 'J') + ((END_DOB_YEAR - START_DOB_YEAR) * 365.25))), 0), 'J') AS DOB
      FROM DUAL;
  END LOOP;
  COMMIT;
END;
/

Verify the result

Once the PL/SQL block completes, check the table contents.

SELECT COUNT(*) FROM PERSON;

Simple analysis of data

Below query will provide a very basic idea about how the data is spread.

SELECT DISTINCT
       P.GENDER,
       COUNT(*) OVER (PARTITION BY P.GENDER) COUNT_BY_GENDER,
       MIN(P.DATE_OF_BIRTH) OVER (PARTITION BY P.GENDER) OLDEST_BY_GENDER,
       MAX(P.DATE_OF_BIRTH) OVER (PARTITION BY P.GENDER) YOUNGEST_BY_GENDER,
       AVG(TRUNC((TO_NUMBER(TO_CHAR(SYSDATE, 'YYYYMMDD')) - TO_NUMBER(TO_CHAR(P.DATE_OF_BIRTH, 'YYYYMMDD'))) / 10000))  OVER (PARTITION BY P.GENDER) AVERAGE_AGE_BY_GENDER
  FROM PERSON P;

See also