Create thousands of rows of realistic test data


Oracle query performance testing, analyzing how adding an index impacts the execution plans or testing a ReSTful API pagination needs data. Having a quick way to generate tens and thousands of records is always helpful.

Recently had a similar need, start crafting a query to create random and sparse data. Also required the data to look realistic in date-of-birth, email, and ZIP fields. Below is a sample script from that effort. Oracle’s DBMS_RANDOM package is utilized extensively.

Person table structure

CREATE TABLE PERSON (
  ID            NUMBER,
  FIRST_NAME    VARCHAR(50),
  LAST_NAME     VARCHAR(50),
  EMAIL         VARCHAR(50),
  GENDER        VARCHAR(50),
  STREET        VARCHAR(50),
  CITY          VARCHAR(50),
  STATE         VARCHAR(50),
  ZIP           VARCHAR(50),
  SSN           VARCHAR(50),
  DATE_OF_BIRTH DATE,
  CONSTRAINT PK_PERSON PRIMARY KEY (ID)
);

PL/SQL block to generate rows

Below PL/SQL block is inserting one row at a time in the loop. There are other alternate ways as well.

DECLARE
  GENERATE_ROWS NUMBER  := 1000;
  START_DOB_YEAR NUMBER := 1940;
  END_DOB_YEAR NUMBER   := 2018;
  START_DOB_DATE DATE;
BEGIN
  START_DOB_DATE := TO_DATE(START_DOB_YEAR || '-01-01', 'YYYY-MM-DD');
  FOR I IN 1 .. GENERATE_ROWS
  LOOP
    INSERT INTO PERSON (ID, FIRST_NAME, LAST_NAME, EMAIL, GENDER, STREET, CITY, STATE, ZIP, SSN, DATE_OF_BIRTH)
    SELECT TO_NUMBER(SYS_GUID(), 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') AS ID,
           DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) AS FIRSTNAME,
           DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) AS LASTNAME,
           DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) || '@' || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 10)) || DECODE(TRUNC(DBMS_RANDOM.VALUE(1, 4)), 1, '.com', 2, '.net', 3, '.org', 4, '.edu') AS EMAIL,
           DECODE(ROUND(DBMS_RANDOM.VALUE(1, 2), 0), 1, 'M', 2, 'F') AS GENDER,
           TRUNC(DBMS_RANDOM.VALUE(1, 500)) || ' ' || DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) || DECODE(TRUNC(DBMS_RANDOM.VALUE(1, 4)), 1, ' Rd', 2, ' Drive', 3, ' St', 4, ' Cir') AS STREET,
           DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 25)) AS CITY,
           DBMS_RANDOM.STRING('U', 2) AS STATE,
           LPAD(TRUNC(DBMS_RANDOM.VALUE(100, 99999)), 5, '0') AS ZIP,
           TRUNC(DBMS_RANDOM.VALUE(100, 999)) || '-' || TRUNC(DBMS_RANDOM.VALUE(10, 99)) || '-' || TRUNC(DBMS_RANDOM.VALUE(1000, 9999)) AS SSN,
           TO_DATE(TRUNC(DBMS_RANDOM.VALUE(TO_CHAR(START_DOB_DATE, 'J'), (TO_CHAR(START_DOB_DATE, 'J') + ((END_DOB_YEAR - START_DOB_YEAR) * 365.25))), 0), 'J') AS DOB
      FROM DUAL;
  END LOOP;
  COMMIT;
END;
/

Verify the result

Once the PL/SQL block completes, check the table contents.

SELECT COUNT(*) FROM PERSON;

Simple analysis of data

Below query will provide a very basic idea about how the data is spread.

SELECT DISTINCT
       P.GENDER,
       COUNT(*) OVER (PARTITION BY P.GENDER) COUNT_BY_GENDER,
       MIN(P.DATE_OF_BIRTH) OVER (PARTITION BY P.GENDER) OLDEST_BY_GENDER,
       MAX(P.DATE_OF_BIRTH) OVER (PARTITION BY P.GENDER) YOUNGEST_BY_GENDER,
       AVG(TRUNC((TO_NUMBER(TO_CHAR(SYSDATE, 'YYYYMMDD')) - TO_NUMBER(TO_CHAR(P.DATE_OF_BIRTH, 'YYYYMMDD'))) / 10000))  OVER (PARTITION BY P.GENDER) AVERAGE_AGE_BY_GENDER
  FROM PERSON P;

See also

Test-dataOracle