Oracle query performance testing, analyzing how adding an index impacts the execution plans or testing a ReSTful API pagination needs data. Having a quick way to generate tens and thousands of records is always helpful.
Recently had a similar need, start crafting a query to create
random and sparse data. Also required the data to look realistic
in date-of-birth, email, and ZIP fields. Below is a sample
script from that effort. Oracle’s DBMS_RANDOM
package is utilized
extensively.
Person table structure
CREATE TABLE PERSON (
ID NUMBER,
FIRST_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
EMAIL VARCHAR(50),
GENDER VARCHAR(50),
STREET VARCHAR(50),
CITY VARCHAR(50),
STATE VARCHAR(50),
ZIP VARCHAR(50),
SSN VARCHAR(50),
DATE_OF_BIRTH DATE,
CONSTRAINT PK_PERSON PRIMARY KEY (ID)
);
PL/SQL block to generate rows
Below PL/SQL block is inserting one row at a time in the loop. There are other alternate ways as well.
DECLARE
GENERATE_ROWS NUMBER := 1000;
START_DOB_YEAR NUMBER := 1940;
END_DOB_YEAR NUMBER := 2018;
START_DOB_DATE DATE;
BEGIN
START_DOB_DATE := TO_DATE(START_DOB_YEAR || '-01-01', 'YYYY-MM-DD');
FOR I IN 1 .. GENERATE_ROWS
LOOP
INSERT INTO PERSON (ID, FIRST_NAME, LAST_NAME, EMAIL, GENDER, STREET, CITY, STATE, ZIP, SSN, DATE_OF_BIRTH)
SELECT TO_NUMBER(SYS_GUID(), 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') AS ID,
DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) AS FIRSTNAME,
DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) AS LASTNAME,
DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) || '@' || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 10)) || DECODE(TRUNC(DBMS_RANDOM.VALUE(1, 4)), 1, '.com', 2, '.net', 3, '.org', 4, '.edu') AS EMAIL,
DECODE(ROUND(DBMS_RANDOM.VALUE(1, 2), 0), 1, 'M', 2, 'F') AS GENDER,
TRUNC(DBMS_RANDOM.VALUE(1, 500)) || ' ' || DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 14)) || DECODE(TRUNC(DBMS_RANDOM.VALUE(1, 4)), 1, ' Rd', 2, ' Drive', 3, ' St', 4, ' Cir') AS STREET,
DBMS_RANDOM.STRING('U', 1) || DBMS_RANDOM.STRING('L', DBMS_RANDOM.VALUE(6, 25)) AS CITY,
DBMS_RANDOM.STRING('U', 2) AS STATE,
LPAD(TRUNC(DBMS_RANDOM.VALUE(100, 99999)), 5, '0') AS ZIP,
TRUNC(DBMS_RANDOM.VALUE(100, 999)) || '-' || TRUNC(DBMS_RANDOM.VALUE(10, 99)) || '-' || TRUNC(DBMS_RANDOM.VALUE(1000, 9999)) AS SSN,
TO_DATE(TRUNC(DBMS_RANDOM.VALUE(TO_CHAR(START_DOB_DATE, 'J'), (TO_CHAR(START_DOB_DATE, 'J') + ((END_DOB_YEAR - START_DOB_YEAR) * 365.25))), 0), 'J') AS DOB
FROM DUAL;
END LOOP;
COMMIT;
END;
/
Verify the result
Once the PL/SQL block completes, check the table contents.
SELECT COUNT(*) FROM PERSON;
Simple analysis of data
Below query will provide a very basic idea about how the data is spread.
SELECT DISTINCT
P.GENDER,
COUNT(*) OVER (PARTITION BY P.GENDER) COUNT_BY_GENDER,
MIN(P.DATE_OF_BIRTH) OVER (PARTITION BY P.GENDER) OLDEST_BY_GENDER,
MAX(P.DATE_OF_BIRTH) OVER (PARTITION BY P.GENDER) YOUNGEST_BY_GENDER,
AVG(TRUNC((TO_NUMBER(TO_CHAR(SYSDATE, 'YYYYMMDD')) - TO_NUMBER(TO_CHAR(P.DATE_OF_BIRTH, 'YYYYMMDD'))) / 10000)) OVER (PARTITION BY P.GENDER) AVERAGE_AGE_BY_GENDER
FROM PERSON P;