SQL Tutorial for QA Database Testing and Verification
What is SQL and Why It's Essential for QA
SQL (Structured Query Language) is the standard language for interacting with relational databases. Every modern application stores its data in a database — whether it's PostgreSQL, MySQL, Oracle, SQL Server, or others. As a QA engineer, you need to be able to directly query that database to verify data integrity, set up test scenarios, and investigate test failures.
Many QA engineers make the mistake of only testing through the UI. But the real test is in the database: did the application actually save the data? Did it transform it correctly? Did it enforce constraints? SQL skills allow you to answer these questions directly, making you a far more effective and credible QA professional.
Why QA Engineers Must Master SQL
Database verification is one of the most critical yet often overlooked aspects of QA. A UI test might show a success message, but if the data wasn't actually saved to the database, that's a critical bug that a UI-only test would never catch.
SQL Statement Types
| Category | Statements | QA Use |
|---|---|---|
| DQL | SELECT | Query data for verification |
| DML | INSERT UPDATE DELETE | Manage test data |
| DDL | CREATE ALTER DROP | Schema validation |
| TCL | COMMIT ROLLBACK | Transaction testing |
What is the difference between WHERE and HAVING?
WHERE filters rows before grouping. HAVING filters groups after GROUP BY — used with aggregate functions like COUNT, SUM.
What is a PRIMARY KEY?
A column (or set of columns) that uniquely identifies each row in a table. It cannot be NULL and must be unique.
Difference between TRUNCATE and DELETE?
DELETE removes rows one by one and can be rolled back. TRUNCATE removes all rows instantly, cannot be rolled back, and resets auto-increment.
-- Select all columns SELECT * FROM users; -- Select specific columns SELECT id, name, email FROM users; -- With column aliases SELECT name AS full_name, email AS contact FROM users; -- Distinct values (find duplicates) SELECT DISTINCT role FROM users; -- Count rows SELECT COUNT(*) AS total_users FROM users; -- Order results SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;
-- Exact match SELECT * FROM users WHERE role = 'admin'; -- Numeric comparison SELECT * FROM orders WHERE amount > 1000; -- Pattern match (LIKE) SELECT * FROM users WHERE email LIKE '%@gmail.com'; -- IN list SELECT * FROM tests WHERE status IN ('PASS', 'FAIL'); -- NULL check SELECT * FROM tests WHERE executed_at IS NULL; -- Range (BETWEEN) SELECT * FROM orders WHERE amount BETWEEN 100 AND 500; -- Combined conditions SELECT * FROM users WHERE role = 'admin' AND created_at > '2025-01-01';
-- INSERT single row (test data setup) INSERT INTO users (name, email, role, created_at) VALUES ('QA Tester', '[email protected]', 'user', NOW()); -- INSERT multiple rows INSERT INTO users (name, email) VALUES ('Test User 1', '[email protected]'), ('Test User 2', '[email protected]'); -- UPDATE (always use WHERE!) UPDATE users SET role = 'admin', updated_at = NOW() WHERE email = '[email protected]'; -- DELETE (test data cleanup) DELETE FROM users WHERE email LIKE '%@test.com';
| JOIN Type | Returns | QA Use |
|---|---|---|
INNER JOIN | Only rows matching in BOTH tables | Users who placed orders |
LEFT JOIN | All left rows + matching right (NULL if no match) | Users even without orders |
RIGHT JOIN | All right rows + matching left | Orders even with deleted users |
FULL OUTER JOIN | All rows from both tables | Complete reconciliation |
-- INNER JOIN: orders with user details SELECT o.id, u.name, u.email, o.amount, o.status FROM orders o INNER JOIN users u ON o.user_id = u.id WHERE o.status = 'pending' ORDER BY o.created_at DESC; -- LEFT JOIN: all users including those without orders SELECT u.name, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id;
| Function | Purpose | QA Verification Use |
|---|---|---|
COUNT(*) | Count rows | Verify number of records created |
SUM(col) | Sum values | Validate order totals |
AVG(col) | Average value | Check calculated statistics |
MIN(col) | Minimum value | Boundary value testing |
MAX(col) | Maximum value | Boundary value testing |
-- Test results summary SELECT status, COUNT(*) AS count, ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM test_runs), 1) AS pct FROM test_runs GROUP BY status ORDER BY count DESC;
-- IN subquery: users who placed orders SELECT * FROM users WHERE id IN ( SELECT DISTINCT user_id FROM orders ); -- NOT IN: users who have NOT placed orders SELECT * FROM users WHERE id NOT IN ( SELECT DISTINCT user_id FROM orders WHERE user_id IS NOT NULL ); -- Scalar subquery: compare to average SELECT * FROM orders WHERE amount > (SELECT AVG(amount) FROM orders);
-- ✅ After registration test: verify user created SELECT COUNT(*) FROM users WHERE email = '[email protected]'; -- Expected: 1 -- ✅ After order test: verify correct amount saved SELECT amount, status, user_id FROM orders WHERE reference = 'TEST-ORDER-001'; -- ✅ Verify no duplicate emails SELECT email, COUNT(*) AS count FROM users GROUP BY email HAVING COUNT(*) > 1; -- Expected: 0 rows -- 🧹 Test data cleanup (run after tests) DELETE FROM orders WHERE reference LIKE 'TEST-%'; DELETE FROM users WHERE email LIKE '%@test.com';
What is a Stored Procedure?
A stored procedure is a named, precompiled SQL block stored in the database. It accepts parameters, executes multiple SQL statements, and can return result sets. Call it with CALL procedure_name(args);
Parameter Types
| Type | Direction | Use |
|---|---|---|
IN | Input only | Pass values into the procedure |
OUT | Output only | Return values to the caller |
INOUT | Both | Pass in and get a modified value back |
Test Data Setup
Call a procedure before each test to insert a clean, consistent set of test data — no manual INSERT statements in test scripts.
Test Data Cleanup
Call a teardown procedure in @AfterMethod to delete all test records — keeps the database clean between runs.
IF / LOOP / Cursors
Procedures support conditionals, loops, and cursors — enabling complex data generation logic impossible in a single SQL statement.
CALL from Java
Use JDBC CallableStatement to invoke stored procedures directly from Selenium/Java test code using {CALL procedure_name(?,?)}.
Stored Procedure — MySQL
DELIMITER // CREATE PROCEDURE SetupTestData( IN p_email VARCHAR(100), IN p_role VARCHAR(20), OUT p_user_id INT ) BEGIN -- Clean any existing record first DELETE FROM users WHERE email = p_email; -- Insert fresh test user INSERT INTO users (email, role, created_at) VALUES (p_email, p_role, NOW()); -- Return the generated ID SET p_user_id = LAST_INSERT_ID(); END // DELIMITER ; -- Call the procedure CALL SetupTestData('qa@test.com', 'user', @uid); SELECT @uid;
Teardown Procedure
DELIMITER // CREATE PROCEDURE CleanupTestData() BEGIN DELETE FROM orders WHERE reference LIKE 'TEST-%'; DELETE FROM users WHERE email LIKE '%@test.com'; END // DELIMITER ; -- Call in test teardown CALL CleanupTestData();
Call from Java (JDBC)
Connection conn = DriverManager.getConnection(url, user, pass); CallableStatement cs = conn.prepareCall( "{CALL SetupTestData(?, ?, ?)}"); cs.setString(1, "qa@test.com"); cs.setString(2, "user"); cs.registerOutParameter(3, Types.INTEGER); cs.execute(); int userId = cs.getInt(3); System.out.println("Created user ID: " + userId);
What is a stored procedure and why use it in QA?
A stored procedure is a precompiled, named SQL block stored in the database. QA uses them for reusable test data setup/teardown — call one procedure instead of writing 10 INSERT statements in every test.
Difference between IN, OUT, and INOUT parameters?
IN passes a value into the procedure (read-only inside). OUT returns a value to the caller (write-only inside). INOUT does both — the value is passed in, modified, and returned back.
How do you call a stored procedure from Java?
Use JDBC's CallableStatement with the syntax {CALL procedure_name(?,?,?)}. Register OUT parameters with registerOutParameter() before calling execute(). Retrieve OUT values with getInt(), getString(), etc.
Stored procedure vs function in SQL?
A function returns a single value and can be used in SELECT statements. A stored procedure can return multiple result sets, has OUT params, and is called with CALL — it cannot be used inside a SELECT expression.
Common SQL Data Types
| Type | Use | Example |
|---|---|---|
INT / BIGINT | Whole numbers, IDs | user_id INT |
VARCHAR(n) | Variable-length text | name VARCHAR(100) |
CHAR(n) | Fixed-length text | status CHAR(1) |
TEXT | Long text content | description TEXT |
DECIMAL(p,s) | Exact decimal numbers | amount DECIMAL(10,2) |
BOOLEAN | True/false flags | is_active BOOLEAN |
DATE | Date only (YYYY-MM-DD) | dob DATE |
DATETIME | Date + time | created_at DATETIME |
TIMESTAMP | Date + time + timezone | updated_at TIMESTAMP |
Constraints
| Constraint | Purpose |
|---|---|
PRIMARY KEY | Uniquely identifies each row — cannot be NULL, must be unique |
FOREIGN KEY | Links to primary key of another table — enforces referential integrity |
UNIQUE | All values in column must be unique (allows one NULL) |
NOT NULL | Column cannot store NULL values |
CHECK | Validates that values meet a condition |
DEFAULT | Provides a default value when none is supplied |
CREATE TABLE with Constraints
CREATE TABLE users ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) NOT NULL, email VARCHAR(150) NOT NULL UNIQUE, role VARCHAR(20) NOT NULL DEFAULT 'user', age INT CHECK (age >= 18), is_active BOOLEAN DEFAULT TRUE, created_at DATETIME DEFAULT NOW() ); CREATE TABLE orders ( id INT AUTO_INCREMENT PRIMARY KEY, user_id INT NOT NULL, amount DECIMAL(10,2) NOT NULL, status VARCHAR(20) DEFAULT 'pending', created_at DATETIME DEFAULT NOW(), FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE );
ALTER TABLE & DROP
-- Add a column ALTER TABLE users ADD COLUMN phone VARCHAR(20); -- Modify a column type ALTER TABLE users MODIFY COLUMN name VARCHAR(200) NOT NULL; -- Drop a column ALTER TABLE users DROP COLUMN phone; -- Rename a table ALTER TABLE users RENAME TO app_users; -- Drop table (permanent — cannot be rolled back) DROP TABLE IF EXISTS temp_test_data; -- Truncate (remove all rows, reset auto-increment) TRUNCATE TABLE test_logs;
What is the difference between DROP, TRUNCATE, and DELETE?
DELETE removes specific rows (can use WHERE, can be rolled back, triggers fire). TRUNCATE removes ALL rows instantly, resets auto-increment, cannot be rolled back, no triggers. DROP removes the entire table structure and data permanently.
Difference between CHAR and VARCHAR?
CHAR(n) is fixed-length — always stores exactly n bytes, padding with spaces if shorter. VARCHAR(n) is variable-length — stores only as many bytes as the actual value plus a length prefix. Use CHAR for codes with fixed length (status, country code), VARCHAR for names and emails.
What is a FOREIGN KEY and what does ON DELETE CASCADE do?
A FOREIGN KEY enforces referential integrity — it ensures a value in one table references a valid row in another table. ON DELETE CASCADE automatically deletes child rows when the parent row is deleted, preventing orphaned records.
What is the difference between PRIMARY KEY and UNIQUE?
PRIMARY KEY = UNIQUE + NOT NULL + only one per table. UNIQUE allows one NULL value and a table can have many UNIQUE constraints. Both create an index on the column automatically.
What is a CHECK constraint?
CHECK validates that column values meet a condition at insert/update time. Example: CHECK(age >= 18) prevents inserting users under 18. If the condition is violated, the INSERT/UPDATE is rejected with an error.
NULL Functions
| Function | Purpose | Returns |
|---|---|---|
IS NULL | Check if value is NULL | TRUE / FALSE |
IS NOT NULL | Check if value has a value | TRUE / FALSE |
COALESCE(a,b,c) | Return first non-NULL value | First non-NULL |
IFNULL(val, default) | Return default if val is NULL (MySQL) | val OR default |
NULLIF(a, b) | Return NULL if a equals b, else a | a OR NULL |
ISNULL(val) | Returns 1 if val is NULL (MySQL) | 1 or 0 |
NULL = NULL is not TRUE — it returns NULL. You cannot use = NULL or != NULL. Always use IS NULL or IS NOT NULL.NULL Query Examples
-- Find rows where phone is missing SELECT * FROM users WHERE phone IS NULL; -- Find rows where phone is provided SELECT * FROM users WHERE phone IS NOT NULL; -- COALESCE: use first non-null value SELECT name, COALESCE(phone, mobile, 'No contact') AS contact FROM users; -- IFNULL: replace NULL with a default (MySQL) SELECT name, IFNULL(discount, 0) AS discount FROM orders; -- NULLIF: avoid division by zero SELECT total / NULLIF(quantity, 0) AS unit_price FROM order_items; -- SUM ignores NULLs — use COALESCE to treat NULL as 0 SELECT SUM(COALESCE(discount, 0)) FROM orders; -- Count non-null values only SELECT COUNT(phone) FROM users; -- ignores NULLs SELECT COUNT(*) FROM users; -- counts ALL rows including NULLs
What is NULL in SQL?
NULL means unknown or missing — it is not zero, not an empty string, and not false. It represents the absence of a value. Any arithmetic operation with NULL returns NULL, and any comparison with NULL using = returns NULL (not TRUE or FALSE).
Why can't you use = NULL to check for NULL?
Because NULL = NULL evaluates to NULL (unknown), not TRUE. SQL's three-valued logic (TRUE, FALSE, NULL) means comparisons with NULL are always NULL. Use IS NULL or IS NOT NULL — these are the only operators that correctly detect NULL values.
What is COALESCE?
COALESCE(a, b, c, ...) returns the first non-NULL argument in the list. It works on any number of arguments and is ANSI SQL standard. Use it to provide fallback values: COALESCE(phone, mobile, 'N/A') returns phone if available, else mobile, else 'N/A'.
What is NULLIF and when would you use it?
NULLIF(a, b) returns NULL if a equals b, otherwise returns a. The main use case is preventing division by zero: total / NULLIF(quantity, 0) — if quantity is 0, NULLIF returns NULL, so the division returns NULL instead of an error.
CASE Syntax
| Type | Syntax | Use When |
|---|---|---|
| Searched CASE | CASE WHEN condition THEN result END | Different conditions per branch — most flexible |
| Simple CASE | CASE col WHEN val THEN result END | Comparing one column to multiple values |
In SELECT (computed column)
Create a label or category column based on data — classify order amounts as Low/Medium/High without changing the underlying data.
In GROUP BY reports
Pivot data by using CASE inside SUM() — count or sum values per category in a single row instead of multiple rows.
In ORDER BY
Apply custom sort order — for example, sort statuses as FAILED first, then PENDING, then PASSED, rather than alphabetically.
In WHERE (via subquery)
CASE cannot be used directly in WHERE — wrap it in a subquery or use it in a computed column and filter on the alias in an outer query.
CASE in SELECT
-- Classify order amounts SELECT id, amount, CASE WHEN amount < 100 THEN 'Low' WHEN amount BETWEEN 100 AND 999 THEN 'Medium' ELSE 'High' END AS amount_tier FROM orders; -- Simple CASE: translate status code SELECT id, CASE status WHEN 'P' THEN 'Pending' WHEN 'A' THEN 'Approved' WHEN 'R' THEN 'Rejected' ELSE 'Unknown' END AS status_label FROM orders;
CASE for Pivot Report & Custom Sort
-- Pivot: count test results per type in one row SELECT SUM(CASE WHEN status = 'PASS' THEN 1 ELSE 0 END) AS passed, SUM(CASE WHEN status = 'FAIL' THEN 1 ELSE 0 END) AS failed, SUM(CASE WHEN status = 'SKIP' THEN 1 ELSE 0 END) AS skipped FROM test_runs; -- Custom sort: FAIL first, then PENDING, then PASS SELECT * FROM test_runs ORDER BY CASE status WHEN 'FAIL' THEN 1 WHEN 'PENDING' THEN 2 ELSE 3 END;
What is a CASE statement in SQL?
CASE is SQL's conditional expression — equivalent to IF-THEN-ELSE. It evaluates conditions in order and returns the result for the first matching WHEN. If no condition matches, it returns the ELSE value (or NULL if ELSE is omitted).
Difference between Simple CASE and Searched CASE?
Simple CASE compares one expression to multiple values: CASE status WHEN 'A' THEN... — clean and readable for exact matches. Searched CASE evaluates independent boolean conditions per branch: CASE WHEN amount > 100 THEN... — more flexible, supports any condition.
How do you use CASE to create a pivot table in SQL?
Use SUM(CASE WHEN status = 'PASS' THEN 1 ELSE 0 END) inside a SELECT. Each CASE expression counts or sums rows that match its condition, turning rows into columns — a manual pivot without using PIVOT syntax.
Can you use CASE in ORDER BY?
Yes. CASE in ORDER BY allows custom sort logic — for example, sort 'FAILED' rows first regardless of alphabetical order. This is useful in QA reports where you want failures at the top for immediate visibility.
String Functions
| Function | Purpose | Example |
|---|---|---|
UPPER(str) | Convert to uppercase | UPPER('hello') → 'HELLO' |
LOWER(str) | Convert to lowercase | LOWER('HELLO') → 'hello' |
LENGTH(str) | String length in bytes | LENGTH('abc') → 3 |
TRIM(str) | Remove leading/trailing spaces | TRIM(' hi ') → 'hi' |
CONCAT(a,b,...) | Concatenate strings | CONCAT('a','b') → 'ab' |
SUBSTRING(str,pos,len) | Extract substring | SUBSTRING('hello',1,3) → 'hel' |
REPLACE(str,old,new) | Replace substring | REPLACE('a-b','-','_') → 'a_b' |
INSTR(str,sub) | Find position of substring | INSTR('hello','ll') → 3 |
Date & Time Functions
| Function | Purpose | Returns |
|---|---|---|
NOW() | Current date and time | 2026-04-15 10:30:00 |
CURDATE() | Current date only | 2026-04-15 |
DATE(datetime) | Extract date part | DATE('2026-04-15 10:00') → 2026-04-15 |
DATEDIFF(d1,d2) | Days between dates | DATEDIFF('2026-04-15','2026-04-01') → 14 |
DATE_ADD(date, INTERVAL n unit) | Add to a date | DATE_ADD(NOW(), INTERVAL 7 DAY) |
DATE_FORMAT(date, format) | Format a date | DATE_FORMAT(NOW(),'%d-%m-%Y') |
YEAR/MONTH/DAY(date) | Extract part | YEAR(NOW()) → 2026 |
String Function Examples
-- Case-insensitive email search SELECT * FROM users WHERE LOWER(email) = 'qa@test.com'; -- Build full name from parts SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM users; -- Check email format (must contain @) SELECT * FROM users WHERE INSTR(email, '@') = 0; -- Trim whitespace from imported data UPDATE users SET name = TRIM(name) WHERE name != TRIM(name); -- Extract domain from email SELECT email, SUBSTRING(email, INSTR(email, '@') + 1) AS domain FROM users;
Date Function Examples
-- Orders created in the last 7 days SELECT * FROM orders WHERE created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY); -- Format date for reports SELECT DATE_FORMAT(created_at, '%d %M %Y') AS order_date FROM orders; -- Days since registration SELECT name, DATEDIFF(NOW(), created_at) AS days_registered FROM users ORDER BY days_registered DESC; -- Users registered in April 2026 SELECT * FROM users WHERE YEAR(created_at) = 2026 AND MONTH(created_at) = 4;
How do you find records from the last 7 days?
Use DATE_SUB: WHERE created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY). Alternatively: WHERE created_at >= NOW() - INTERVAL 7 DAY. Both return rows created in the past 7 days relative to the current timestamp.
Difference between NOW() and CURDATE()?
NOW() returns the full current datetime including hours, minutes, seconds (e.g. 2026-04-15 10:30:00). CURDATE() returns only the date part (e.g. 2026-04-15). Use CURDATE() when you want to compare dates without the time component.
How do you do a case-insensitive search in SQL?
Use LOWER() or UPPER() on both sides: WHERE LOWER(email) = 'qa@test.com'. In MySQL with utf8_general_ci collation, comparisons are case-insensitive by default. In PostgreSQL, use ILIKE instead of LIKE for case-insensitive pattern matching.
How do you calculate the number of days between two dates?
Use DATEDIFF(end_date, start_date) — it returns the number of days end_date minus start_date. DATEDIFF('2026-04-15', '2026-04-01') returns 14. Note: DATEDIFF only returns days — use TIMESTAMPDIFF for hours, minutes, or months.
ORDER BY
| Syntax | Meaning |
|---|---|
ORDER BY col ASC | Ascending (A→Z, 1→9, oldest→newest). Default when ASC omitted. |
ORDER BY col DESC | Descending (Z→A, 9→1, newest→oldest) |
ORDER BY col1, col2 | Sort by col1 first, then col2 for ties |
ORDER BY 3 | Sort by the 3rd column in SELECT list (positional) |
ORDER BY RAND() | Random order (MySQL) — for test data sampling |
LIMIT & OFFSET — Pagination
LIMIT n
Return at most n rows. Use it to get the top N results or to test with a small sample from a large table.
LIMIT n OFFSET m
Skip m rows then return n rows. Page 2 with 10 per page = LIMIT 10 OFFSET 10. Page 3 = LIMIT 10 OFFSET 20.
Pagination Formula
OFFSET = (page - 1) × topicsize. Always pair with ORDER BY — without it, the page order is non-deterministic.
QA use case
Verify that the first page of results is the most recent, that pagination boundaries are correct, and that total count matches the UI page count indicator.
-- Top 10 most recent orders SELECT id, user_id, amount, status FROM orders ORDER BY created_at DESC LIMIT 10; -- Sort by multiple columns: status ASC, then amount DESC SELECT * FROM orders ORDER BY status ASC, amount DESC; -- Page 1 (10 per page) SELECT * FROM products ORDER BY name ASC LIMIT 10 OFFSET 0; -- Page 2 SELECT * FROM products ORDER BY name ASC LIMIT 10 OFFSET 10; -- Page N formula -- LIMIT topicsize OFFSET (page-1)*topicsize -- Get a random sample of 5 test records SELECT * FROM users ORDER BY RAND() LIMIT 5; -- QA: verify first page = most recent (newest first) SELECT id, created_at FROM orders ORDER BY created_at DESC LIMIT 1; -- should match what UI shows as latest
What is the default sort order for ORDER BY?
ASC (ascending) — A to Z for text, smallest to largest for numbers, oldest to newest for dates. You must explicitly specify DESC for descending. Always be explicit in test queries to avoid assumptions about sort order.
How do you implement pagination in SQL?
Use LIMIT and OFFSET: LIMIT topicsize OFFSET (page-1)*topicsize. Always include an ORDER BY clause — without it, row order is non-deterministic and paginated results will be inconsistent across queries, causing flaky test assertions.
What is the difference between LIMIT 10 OFFSET 10 and LIMIT 10, 10?
They are identical in MySQL. LIMIT 10, 10 is shorthand for LIMIT 10 OFFSET 10 — offset first, then count. However, the LIMIT n OFFSET m syntax is more readable and is the ANSI SQL standard supported across all databases.
What happens to NULL values in ORDER BY?
In MySQL, NULLs sort first in ASC order and last in DESC order. In PostgreSQL, NULLs sort last in ASC and first in DESC by default, but you can control this with NULLS FIRST / NULLS LAST. Always verify NULL sort behavior when testing sort functionality.
Set Operations Compared
| Operator | Returns | Duplicates | QA Use |
|---|---|---|---|
UNION | Rows from A OR B | Removed | Merge two lists, get distinct combined set |
UNION ALL | Rows from A OR B | Kept | Faster merge, preserves all rows including dupes |
INTERSECT | Rows in BOTH A AND B | Removed | Find records that exist in both tables |
EXCEPT / MINUS | Rows in A but NOT B | Removed | Find records missing from target table (ETL testing) |
Rules for Set Operations
Same column count
Both SELECT statements must return the same number of columns. Column names come from the first SELECT.
Compatible data types
Corresponding columns must have compatible types — you cannot UNION a VARCHAR column with an INT column.
One ORDER BY at the end
Put ORDER BY at the end of the entire UNION — not inside individual SELECT statements. It sorts the combined result.
UNION vs UNION ALL speed
UNION ALL is faster — it skips the duplicate-removal step. Use UNION ALL whenever you know duplicates are impossible or acceptable.
-- UNION: all admin and manager emails (no duplicates) SELECT email FROM users WHERE role = 'admin' UNION SELECT email FROM users WHERE role = 'manager'; -- UNION ALL: all orders from two regions (keep dupes) SELECT id, amount, 'EU' AS region FROM eu_orders UNION ALL SELECT id, amount, 'US' AS region FROM us_orders ORDER BY amount DESC; -- INTERSECT: users who exist in BOTH tables -- (PostgreSQL/SQL Server — MySQL use JOIN instead) SELECT email FROM users INTERSECT SELECT email FROM newsletter_subscribers; -- MySQL equivalent of INTERSECT using JOIN SELECT DISTINCT u.email FROM users u JOIN newsletter_subscribers ns ON u.email = ns.email; -- EXCEPT: rows in source but NOT in target (ETL QA) SELECT id, email FROM source_users EXCEPT SELECT id, email FROM target_users; -- MySQL EXCEPT equivalent using LEFT JOIN SELECT s.id, s.email FROM source_users s LEFT JOIN target_users t ON s.id = t.id WHERE t.id IS NULL;
What is the difference between UNION and UNION ALL?
UNION combines results from two SELECT statements and removes duplicate rows. UNION ALL does the same but keeps all rows including duplicates. UNION ALL is faster because it skips the deduplication step — use it when duplicates are impossible or you need them preserved.
What is EXCEPT (or MINUS) used for in QA testing?
EXCEPT returns rows in the first query that do NOT exist in the second query. In QA/ETL testing: SELECT from source EXCEPT SELECT from target shows records that failed to migrate or were lost in a transformation — a powerful data reconciliation query.
What are the rules for combining queries with UNION?
Both SELECT statements must have the same number of columns, and corresponding columns must have compatible data types. Column names in the result come from the first SELECT. ORDER BY can only appear once at the very end of the combined query.
Does MySQL support INTERSECT and EXCEPT?
INTERSECT and EXCEPT were added in MySQL 8.0.31+. For older versions, simulate INTERSECT with an INNER JOIN and EXCEPT with a LEFT JOIN + WHERE target_col IS NULL. PostgreSQL and SQL Server support them natively in all modern versions.
EXISTS vs IN vs JOIN
| Approach | How it works | Best for |
|---|---|---|
EXISTS | Returns TRUE if subquery yields any row — stops at first match | Large subquery, only checking presence |
NOT EXISTS | Returns TRUE if subquery yields zero rows | Finding records with no related rows |
IN (subquery) | Runs full subquery, builds list, checks membership | Small static value lists |
LEFT JOIN + IS NULL | Join and filter where right side is NULL | Same as NOT EXISTS — sometimes faster |
Correlated Subquery
EXISTS subqueries are usually correlated — they reference the outer query's row. The subquery re-evaluates for each outer row, checking a specific condition per row.
Short-Circuit Evaluation
EXISTS stops scanning the inner table as soon as one matching row is found — making it highly efficient when you only need to know if something exists, not how many or what data.
QA: Verify Record Exists
After a test action (e.g. user registration), use EXISTS to assert the record was created without fetching the full row data.
NOT IN vs NOT EXISTS with NULLs
NOT IN fails silently when the subquery contains any NULL — returns no rows. NOT EXISTS handles NULLs correctly. Always prefer NOT EXISTS over NOT IN for subqueries.
-- EXISTS: users who have at least one order SELECT u.id, u.name, u.email FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id -- correlated: references outer row ); -- NOT EXISTS: users who have NEVER placed an order SELECT u.id, u.name, u.email FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id ); -- QA: assert a user was created after registration test SELECT CASE WHEN EXISTS ( SELECT 1 FROM users WHERE email = 'new@test.com' ) THEN 'PASS' ELSE 'FAIL' END AS test_result; -- NOT EXISTS: safer than NOT IN when NULLs may exist -- BAD (fails if any user_id in orders is NULL): SELECT * FROM users WHERE id NOT IN ( SELECT user_id FROM orders ); -- GOOD: SELECT * FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id );
What is EXISTS in SQL?
EXISTS is a boolean operator that returns TRUE if the subquery returns at least one row, FALSE if it returns no rows. It ignores the actual column values in the subquery — only the presence of rows matters. SELECT 1 in the subquery is conventional since the value is never used.
When is EXISTS faster than IN?
EXISTS short-circuits — it stops at the first matching row. IN evaluates the full subquery and builds a complete list before comparing. When the subquery returns a large result set, EXISTS is significantly faster because it doesn't need to scan all rows.
Why is NOT EXISTS safer than NOT IN?
NOT IN behaves unexpectedly when the subquery contains any NULL value — it returns zero rows because NULL comparisons are always NULL (unknown), not FALSE. NOT EXISTS handles NULLs correctly and always returns logically expected results. Always use NOT EXISTS over NOT IN for subqueries.
What is a correlated subquery?
A correlated subquery references columns from the outer query. It re-executes once per outer row — the inner query uses the outer row's values as filter conditions. EXISTS subqueries are almost always correlated: WHERE o.user_id = u.id references the outer u table.
When to Use SELF JOIN
Hierarchical Data
Employee-manager relationships stored in the same table — join employees to their manager row using manager_id = id.
Find Duplicates
Join a table to itself on matching column values with different primary keys — surfaces rows that share the same email, phone, or name.
Compare Adjacent Rows
Compare each row to the next or previous row in the same table — e.g. compare two consecutive orders from the same user.
Category Relationships
Parent-child category trees stored in one table — join category to its parent using parent_id = id.
-- employees(id, name, manager_id) -- SELF JOIN: show employee alongside their manager SELECT e.name AS employee, m.name AS manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.id ORDER BY m.name; -- Find duplicate emails in the users table SELECT a.id AS id1, b.id AS id2, a.email FROM users a JOIN users b ON a.email = b.email AND a.id < b.id; -- avoid pairing (1,2) AND (2,1) -- Find orders from the same user placed on the same day SELECT a.id AS order1, b.id AS order2, a.user_id, DATE(a.created_at) AS order_date FROM orders a JOIN orders b ON a.user_id = b.user_id AND DATE(a.created_at) = DATE(b.created_at) AND a.id < b.id;
What is a SELF JOIN?
A SELF JOIN is when a table is joined to itself. The same table appears twice in the FROM clause with two different aliases. It is used to query hierarchical data (employee-manager), compare rows within the same table, or find duplicates.
Why are aliases required in a SELF JOIN?
Without aliases, SQL cannot distinguish which instance of the table a column reference belongs to. Two aliases (e.g. a and b) let you write a.column and b.column to reference columns from each "copy" of the table independently.
How do you find duplicate rows using a SELF JOIN?
JOIN the table to itself ON the duplicate column (e.g. email) AND a.id < b.id. The id condition prevents the same pair appearing twice ((1,2) and (2,1)) and prevents a row pairing with itself. Each returned row shows one duplicate pair.
When would you use a recursive CTE instead of a SELF JOIN?
A SELF JOIN retrieves only one level at a time (employee → manager). For traversing multiple levels of a hierarchy (all descendants of a node), a recursive CTE is the right tool — it iterates until no more rows are returned, handling unlimited depth.
SQL Execution Order
FROM / JOIN
Fetch rows from tables and apply joins.
WHERE
Filter individual rows before grouping. Cannot use aggregate functions here.
GROUP BY
Group the remaining rows by the specified columns.
HAVING
Filter groups after grouping. CAN use aggregate functions like COUNT, SUM, AVG.
SELECT
Project the requested columns and computed values.
ORDER BY / LIMIT
Sort and paginate the final result set.
GROUP BY & HAVING Examples
-- Count orders per user SELECT user_id, COUNT(*) AS order_count FROM orders GROUP BY user_id; -- Users with MORE than 5 orders (HAVING filters groups) SELECT user_id, COUNT(*) AS order_count FROM orders GROUP BY user_id HAVING COUNT(*) > 5 ORDER BY order_count DESC; -- WHERE + GROUP BY + HAVING together -- Only 2026 orders, grouped by month, months with > 100 orders SELECT MONTH(created_at) AS month, COUNT(*) AS total, SUM(amount) AS revenue FROM orders WHERE YEAR(created_at) = 2026 -- filter rows first GROUP BY MONTH(created_at) -- then group HAVING COUNT(*) > 100 -- then filter groups ORDER BY month; -- Detect duplicate emails (QA use case) SELECT email, COUNT(*) AS count FROM users GROUP BY email HAVING COUNT(*) > 1;
What is the difference between WHERE and HAVING?
WHERE filters individual rows before grouping — it cannot reference aggregate functions. HAVING filters groups after GROUP BY — it can reference aggregate functions like COUNT(), SUM(), AVG(). They operate at different stages of query execution.
Can you use HAVING without GROUP BY?
Yes — the entire result set is treated as one group. HAVING COUNT(*) > 5 on a table without GROUP BY checks if the total row count exceeds 5. Rarely useful, but syntactically valid.
How do you find duplicate records in a table?
GROUP BY the column(s) that should be unique, then HAVING COUNT(*) > 1: SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1. This is one of the most common QA database validation queries.
What is the SQL execution order?
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. This order explains why you cannot reference a SELECT alias in a WHERE clause (WHERE runs before SELECT), but you CAN reference it in ORDER BY (ORDER BY runs after SELECT).
View Benefits
View Syntax
| Statement | Purpose |
|---|---|
CREATE VIEW name AS SELECT ... | Create a new view |
CREATE OR REPLACE VIEW name AS ... | Update existing view definition |
DROP VIEW IF EXISTS name | Remove a view |
SELECT * FROM view_name | Query a view like a table |
SHOW CREATE VIEW name | See view definition |
Create & Use Views
-- Create a QA verification view (orders with user info) CREATE VIEW vw_order_details AS SELECT o.id AS order_id, u.name AS customer_name, u.email, o.amount, o.status, o.created_at FROM orders o INNER JOIN users u ON o.user_id = u.id; -- Query the view exactly like a table SELECT * FROM vw_order_details WHERE status = 'failed'; -- Update view definition CREATE OR REPLACE VIEW vw_order_details AS SELECT o.id, u.email, o.amount, o.status, o.created_at FROM orders o JOIN users u ON o.user_id = u.id; -- QA view: all test users (exclude production) CREATE VIEW vw_test_users AS SELECT * FROM users WHERE email LIKE '%@test.com'; -- Drop when no longer needed DROP VIEW IF EXISTS vw_test_users;
What is a SQL view?
A view is a stored SELECT query that behaves like a virtual table. It does not store data itself — each time you query the view, the underlying SELECT runs and returns fresh data. Views simplify complex queries and can restrict visible columns for security.
Can you INSERT or UPDATE data through a view?
Only if the view is "updatable" — it must reference a single base table with no GROUP BY, DISTINCT, aggregate functions, or UNION. Most complex views (with JOINs or aggregates) are read-only. Attempting to INSERT through a non-updatable view throws an error.
What is a materialized view?
A materialized view physically stores the query results and refreshes them periodically. Unlike a regular view (which re-runs on every query), a materialized view is pre-computed — much faster for complex aggregations but may show stale data. Supported in PostgreSQL and Oracle, not standard MySQL.
How are views useful in QA database testing?
Create a shared QA view that joins users, orders, and statuses — the entire QA team queries the view instead of rewriting the same complex JOIN every time. If the schema changes, update the view definition in one place rather than fixing 20 test queries.
CTE Syntax
WITH cte_name AS ( -- your SELECT here SELECT ... ), another_cte AS ( -- can reference cte_name above SELECT ... FROM cte_name ) -- final query uses any CTE defined above SELECT * FROM another_cte;
CTE vs Subquery vs View
| Feature | CTE | Subquery | View |
|---|---|---|---|
| Readability | ✅ Best | ⚠️ Gets messy | ✅ Good |
| Reuse in same query | ✅ Yes | ❌ Must repeat | ✅ Yes |
| Persists after query | ❌ No | ❌ No | ✅ Yes |
| Recursive support | ✅ Yes | ❌ No | ❌ No |
CTE Examples
-- Step 1: find high-value orders -- Step 2: get those customers WITH high_value_orders AS ( SELECT user_id, SUM(amount) AS total FROM orders GROUP BY user_id HAVING SUM(amount) > 5000 ) SELECT u.name, u.email, h.total FROM users u JOIN high_value_orders h ON u.id = h.user_id; -- Multiple CTEs: chained steps WITH active_users AS ( SELECT id, name FROM users WHERE is_active = TRUE ), recent_orders AS ( SELECT user_id, COUNT(*) AS cnt FROM orders WHERE created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY user_id ) SELECT a.name, r.cnt FROM active_users a JOIN recent_orders r ON a.id = r.user_id;
Recursive CTE
-- employees(id, name, manager_id) WITH RECURSIVE org_tree AS ( -- Anchor: start from the CEO (no manager) SELECT id, name, manager_id, 0 AS level FROM employees WHERE manager_id IS NULL UNION ALL -- Recursive: join each employee to their manager SELECT e.id, e.name, e.manager_id, o.level + 1 FROM employees e JOIN org_tree o ON e.manager_id = o.id ) SELECT * FROM org_tree ORDER BY level;
What is a CTE and how is it different from a subquery?
A CTE (Common Table Expression) is a named temporary result set defined with WITH before the main query. Unlike a subquery, a CTE can be referenced multiple times in the same query, is defined at the top (not inline), and is much more readable for complex multi-step logic.
Can a CTE reference another CTE?
Yes. When you define multiple CTEs separated by commas, each CTE can reference any CTE defined before it. This enables chaining multiple transformation steps cleanly — like a pipeline of named queries.
What is a recursive CTE?
A recursive CTE references itself. It has two parts: an anchor (base case — runs once) and a recursive member (references the CTE, runs repeatedly until no more rows). Used for tree/hierarchy traversal like org charts, categories, or bill of materials.
Does a CTE improve query performance?
Not necessarily. Most databases treat a CTE like an inline view — it may be re-executed each time it is referenced. For performance, consider a temp table if the CTE result is large and referenced many times. Materialized CTEs (persisted) are available in some databases like PostgreSQL.
Window Function Syntax
function_name() OVER ( [PARTITION BY column] -- split into groups [ORDER BY column] -- order within each group [ROWS/RANGE frame_spec] -- define the frame (optional) )
Common Window Functions
| Function | Purpose | Ties? |
|---|---|---|
ROW_NUMBER() | Unique sequential row number | No tie — always unique |
RANK() | Rank with gaps after ties | Gaps: 1,1,3 |
DENSE_RANK() | Rank without gaps | No gaps: 1,1,2 |
NTILE(n) | Divide rows into n buckets | — |
LAG(col, n) | Access previous row's value | — |
LEAD(col, n) | Access next row's value | — |
SUM() OVER | Running total | — |
AVG() OVER | Moving average | — |
Ranking & Deduplication
-- Rank users by total spend SELECT name, total_spent, RANK() OVER (ORDER BY total_spent DESC) AS rnk, DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rnk, ROW_NUMBER() OVER (ORDER BY total_spent DESC) AS row_num FROM user_totals; -- Deduplicate: keep latest record per user WITH deduped AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY created_at DESC ) AS rn FROM orders ) SELECT * FROM deduped WHERE rn = 1; -- LAG: compare each order to the previous one SELECT id, amount, LAG(amount, 1) OVER (ORDER BY created_at) AS prev_amount, amount - LAG(amount, 1) OVER (ORDER BY created_at) AS change FROM orders; -- Running total per user SELECT user_id, amount, created_at, SUM(amount) OVER ( PARTITION BY user_id ORDER BY created_at ) AS running_total FROM orders;
What are window functions in SQL?
Window functions perform calculations across a set of rows related to the current row (the "window") without collapsing rows like GROUP BY does. Each row keeps its identity and gets an additional computed column — e.g. its rank, running total, or comparison to the previous row.
Difference between ROW_NUMBER, RANK, and DENSE_RANK?
All three rank rows. For tied values: ROW_NUMBER always assigns unique numbers (ties broken arbitrarily — 1,2,3). RANK skips numbers after ties (1,1,3). DENSE_RANK never skips (1,1,2). Use DENSE_RANK when you don't want gaps in the ranking sequence.
What is PARTITION BY in a window function?
PARTITION BY divides the result set into groups (partitions) and the window function resets for each partition — like GROUP BY but without collapsing rows. RANK() OVER (PARTITION BY dept ORDER BY salary DESC) ranks employees within each department independently.
How do you find duplicate rows and keep only the latest one?
Use ROW_NUMBER() OVER (PARTITION BY unique_col ORDER BY created_at DESC) AS rn in a CTE, then SELECT * WHERE rn = 1 in the outer query. This is the standard deduplication pattern — the partition resets per key, and rn=1 is always the most recent row.
Types of Indexes
| Type | Description | Use Case |
|---|---|---|
PRIMARY KEY | Auto-created clustered index | Every table — the row lookup anchor |
UNIQUE | Enforces uniqueness + speeds lookup | Email, username, reference codes |
INDEX (B-Tree) | General-purpose sorted index | WHERE, JOIN, ORDER BY columns |
COMPOSITE | Index on multiple columns | Queries filtering on multiple columns |
FULLTEXT | Optimised for text search | MATCH...AGAINST on text columns |
When NOT to Use Indexes
Low-cardinality columns
A boolean column (true/false) has only 2 values — an index on it provides no benefit. Use indexes on high-cardinality columns like email or user_id.
Small tables
Full table scan is faster than index lookup for tables with fewer than a few hundred rows — the optimizer usually ignores the index anyway.
Heavy write tables
Every INSERT/UPDATE/DELETE must update all indexes on the table. Too many indexes on a frequently-written table hurts write performance significantly.
Create & Drop Indexes
-- Single-column index CREATE INDEX idx_users_email ON users(email); -- Unique index (enforces uniqueness) CREATE UNIQUE INDEX idx_users_email_uniq ON users(email); -- Composite index (column order matters!) -- Useful for: WHERE status = ? AND created_at > ? CREATE INDEX idx_orders_status_date ON orders(status, created_at); -- Drop an index DROP INDEX idx_users_email ON users; -- Show all indexes on a table SHOW INDEXES FROM orders;
EXPLAIN — Diagnose Slow Queries
-- EXPLAIN shows the query execution plan EXPLAIN SELECT * FROM orders WHERE user_id = 42; -- Key columns to check in EXPLAIN output: -- type: ALL = full scan (bad), ref/eq_ref = index (good) -- key: which index is being used (NULL = no index) -- rows: estimated rows scanned (lower = faster) -- Extra: "Using where", "Using index" (index-only = very fast) -- EXPLAIN ANALYZE (MySQL 8+): shows actual execution stats EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';
What is an index in SQL?
An index is a separate data structure (typically a B-Tree) that stores column values alongside pointers to the actual rows. It lets the database find rows without scanning the entire table — like a book index vs reading every page to find a topic.
What is a composite index and why does column order matter?
A composite index covers multiple columns. Column order determines which queries benefit from it — an index on (status, date) can be used for WHERE status = ? AND date > ?, or WHERE status = ?, but NOT for WHERE date > ? alone (the leftmost prefix rule).
What does EXPLAIN show and why is it useful?
EXPLAIN shows the query execution plan: which indexes are used, how many rows are scanned, and the join type. "type: ALL" means a full table scan (slow). "type: ref" means an index is used (fast). Use EXPLAIN to diagnose slow queries before adding indexes.
When would you NOT add an index?
On low-cardinality columns (boolean, status with 2-3 values), on very small tables (full scan is faster), or on columns that are written to frequently. Each index adds overhead to every INSERT/UPDATE/DELETE — too many indexes hurt write performance.
ACID Properties
Atomicity
All statements in a transaction succeed, or none do. If any statement fails, the database rolls back to the state before the transaction started — no partial writes.
Consistency
A transaction brings the database from one valid state to another. All constraints, rules, and cascades are enforced — the database is never left in an invalid state.
Isolation
Concurrent transactions cannot see each other's uncommitted changes. The level of isolation is configurable (READ COMMITTED, REPEATABLE READ, SERIALIZABLE).
Durability
Once a transaction is committed, the changes are permanent — even if the server crashes immediately after. Data is written to disk (WAL / redo log).
Isolation Levels
| Level | Dirty Read | Non-repeatable Read | Phantom Read |
|---|---|---|---|
READ UNCOMMITTED | ✅ Possible | ✅ Possible | ✅ Possible |
READ COMMITTED | ❌ Prevented | ✅ Possible | ✅ Possible |
REPEATABLE READ | ❌ Prevented | ❌ Prevented | ✅ Possible |
SERIALIZABLE | ❌ Prevented | ❌ Prevented | ❌ Prevented |
BEGIN / COMMIT / ROLLBACK
-- Transfer $100 from account A to account B atomically START TRANSACTION; UPDATE accounts SET balance = balance - 100 WHERE id = 1; UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- If both succeed: commit permanently COMMIT; -- If something went wrong: undo everything -- ROLLBACK;
SAVEPOINT — Partial Rollback
START TRANSACTION; INSERT INTO orders (user_id, amount) VALUES (1, 500); SAVEPOINT after_order; INSERT INTO payments (order_id, method) VALUES (LAST_INSERT_ID(), 'card'); -- Payment failed — roll back only to savepoint (keep order) ROLLBACK TO SAVEPOINT after_order; -- Retry with different payment method INSERT INTO payments (order_id, method) VALUES (LAST_INSERT_ID(), 'bank'); COMMIT;
QA: Use Transactions for Safe Test Data
-- Wrap test setup in a transaction -- Roll back after test to leave DB clean START TRANSACTION; INSERT INTO users (email, role) VALUES ('qa@test.com', 'user'); INSERT INTO orders (user_id, amount) VALUES (LAST_INSERT_ID(), 250); -- Run your assertions here... SELECT * FROM orders WHERE amount = 250; -- Undo all test data — DB is clean again ROLLBACK;
What is a database transaction?
A transaction is a sequence of SQL statements executed as a single unit of work. Either all statements succeed and are committed, or any failure triggers a rollback that undoes all changes — leaving the database in the state it was before the transaction started.
What does ACID stand for?
Atomicity (all or nothing), Consistency (valid state to valid state), Isolation (concurrent transactions don't interfere), Durability (committed data survives crashes). These four properties together guarantee reliable database operations.
What is a dirty read?
A dirty read occurs when one transaction reads data written by another transaction that has not yet committed. If the second transaction rolls back, the first has read data that never officially existed — a data integrity problem. READ COMMITTED isolation level prevents dirty reads.
What is a SAVEPOINT?
A SAVEPOINT marks a point within a transaction you can partially roll back to. ROLLBACK TO SAVEPOINT name undoes only the work done after that savepoint — the work before the savepoint is preserved. COMMIT still commits everything including pre-savepoint work.
How can QA engineers use transactions in testing?
Wrap test data setup in a BEGIN TRANSACTION. Run all test assertions. Then ROLLBACK — leaving the database in a perfectly clean state with zero cleanup code needed. This is especially powerful in integration tests against a real database.
Numeric Functions
| Function | Purpose | Example → Result |
|---|---|---|
ROUND(n, d) | Round to d decimal places | ROUND(3.456, 2) → 3.46 |
FLOOR(n) | Round down to nearest integer | FLOOR(3.9) → 3 |
CEIL(n) / CEILING(n) | Round up to nearest integer | CEIL(3.1) → 4 |
TRUNCATE(n, d) | Truncate to d decimals (no rounding) | TRUNCATE(3.99, 1) → 3.9 |
ABS(n) | Absolute value | ABS(-42) → 42 |
MOD(n, d) | Remainder after division | MOD(10, 3) → 1 |
POWER(base, exp) | Raise to power | POWER(2, 8) → 256 |
SQRT(n) | Square root | SQRT(144) → 12 |
SIGN(n) | Returns -1, 0, or 1 | SIGN(-5) → -1 |
RAND() | Random float 0–1 | RAND() → 0.7341... |
-- Verify order total is rounded to 2 decimal places SELECT id, amount, ROUND(amount, 2) AS rounded, TRUNCATE(amount, 2) AS truncated FROM orders WHERE amount != ROUND(amount, 2); -- find rounding issues -- Calculate percentage of passed tests SELECT ROUND( SUM(CASE WHEN status = 'PASS' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1 ) AS pass_rate_pct FROM test_runs; -- Find odd-numbered records (MOD) SELECT * FROM users WHERE MOD(id, 2) = 1; -- Absolute difference between expected and actual amounts SELECT order_id, expected_amount, actual_amount, ABS(expected_amount - actual_amount) AS discrepancy FROM order_audit WHERE ABS(expected_amount - actual_amount) > 0.01; -- Generate random test price between 10 and 100 SELECT ROUND(10 + RAND() * 90, 2) AS random_price;
Difference between ROUND and TRUNCATE?
ROUND applies standard rounding — ROUND(3.456, 2) = 3.46. TRUNCATE simply drops digits beyond the specified position without rounding — TRUNCATE(3.456, 2) = 3.45. For financial QA, test both scenarios because the difference accumulates over many records.
Difference between FLOOR and CEIL?
FLOOR(n) always rounds down toward negative infinity — FLOOR(3.9) = 3, FLOOR(-3.1) = -4. CEIL(n) always rounds up — CEIL(3.1) = 4, CEIL(-3.9) = -3. Use FLOOR for integer division scenarios and CEIL for "at least n" calculations like page count.
What is MOD used for in QA testing?
MOD(n, d) returns the remainder of n divided by d. Common QA uses: find odd/even IDs (MOD(id, 2) = 0), find records at fixed intervals, validate batch processing that runs every N records, or test paging logic.
How do you find records with rounding errors?
Compare the stored value to its rounded version: WHERE amount != ROUND(amount, 2). This surfaces records where the application stored more than 2 decimal places — a common bug in financial calculations that causes total mismatches.
Trigger Timing & Events
| Timing | Event | Common Use |
|---|---|---|
BEFORE INSERT | Before row is inserted | Validate / transform data before save |
AFTER INSERT | After row is inserted | Write audit log, populate related table |
BEFORE UPDATE | Before row is updated | Validate new values, prevent illegal changes |
AFTER UPDATE | After row is updated | Audit trail, sync denormalized data |
BEFORE DELETE | Before row is deleted | Archive row, prevent deletion of critical records |
AFTER DELETE | After row is deleted | Clean up related data, write deletion log |
NEW and OLD Row References
| Reference | Available in | Meaning |
|---|---|---|
NEW.column | INSERT, UPDATE | The new value being written |
OLD.column | UPDATE, DELETE | The value before the change |
Audit Log Trigger
-- Audit log table CREATE TABLE user_audit ( id INT AUTO_INCREMENT PRIMARY KEY, user_id INT, action VARCHAR(10), old_role VARCHAR(20), new_role VARCHAR(20), changed_at DATETIME DEFAULT NOW() ); -- Trigger: log every role change DELIMITER // CREATE TRIGGER trg_user_role_audit AFTER UPDATE ON users FOR EACH ROW BEGIN IF OLD.role != NEW.role THEN INSERT INTO user_audit (user_id, action, old_role, new_role) VALUES (OLD.id, 'UPDATE', OLD.role, NEW.role); END IF; END // DELIMITER ; -- Trigger: set updated_at automatically DELIMITER // CREATE TRIGGER trg_set_updated_at BEFORE UPDATE ON users FOR EACH ROW BEGIN SET NEW.updated_at = NOW(); END // DELIMITER ; -- Trigger: prevent deletion of admin users DELIMITER // CREATE TRIGGER trg_no_admin_delete BEFORE DELETE ON users FOR EACH ROW BEGIN IF OLD.role = 'admin' THEN SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Cannot delete admin users'; END IF; END // DELIMITER ; -- View existing triggers SHOW TRIGGERS FROM your_database; -- Drop a trigger DROP TRIGGER IF EXISTS trg_user_role_audit;
What is a database trigger?
A trigger is a stored SQL block that automatically executes when a specified event (INSERT, UPDATE, DELETE) occurs on a table. Triggers fire without any explicit call — they are driven by DML events and are invisible to the application layer.
What is the difference between BEFORE and AFTER triggers?
BEFORE triggers fire before the row change is applied — you can modify NEW values or cancel the operation with SIGNAL. AFTER triggers fire after the change is committed to the table — used for audit logging and cascading updates that need the final row state.
What are OLD and NEW in a trigger?
OLD refers to the row's values before the operation (available in UPDATE and DELETE). NEW refers to the values being written (available in INSERT and UPDATE). In a BEFORE UPDATE trigger, you can change NEW.column to transform data before it is saved.
How do triggers affect QA testing?
Triggers execute invisibly — the application doesn't know they fired. In QA, always verify trigger side effects directly in the database: check audit tables were populated, timestamps were updated, and constraints were enforced. A test that only checks the UI may miss trigger failures entirely.
MySQL JSON Functions
| Function | Purpose | Example |
|---|---|---|
JSON_EXTRACT(col, path) | Extract value at JSON path | JSON_EXTRACT(data, '$.name') |
col->>'$.path' | Shorthand for JSON_EXTRACT (unquoted) | data->>'$.name' |
JSON_OBJECT(k,v,...) | Build a JSON object | JSON_OBJECT('id', 1, 'name', 'QA') |
JSON_ARRAY(v,...) | Build a JSON array | JSON_ARRAY(1, 2, 3) |
JSON_CONTAINS(doc, val, path) | Check if doc contains val at path | Find rows with specific tag |
JSON_ARRAYAGG(col) | Aggregate column into JSON array | Group tags into array per user |
JSON_VALID(str) | Returns 1 if string is valid JSON | Validate stored JSON data quality |
JSON_LENGTH(col) | Number of elements in array/object | Count items in JSON array column |
JSON Path Syntax
$ = root, $.key = object key, $.arr[0] = first array element, $.a.b = nested key.
JSON Column Type
MySQL validates JSON at insert time when the column is declared as JSON type — invalid JSON is rejected. Older schemas store JSON as TEXT — use JSON_VALID() to verify data quality.
Indexing JSON
You cannot index a JSON column directly. Create a generated/virtual column from the extracted value, then index that column.
QA Use Case
APIs that log request/response payloads as JSON — query the JSON column to verify specific fields were captured correctly without deserializing in application code.
-- Table with JSON column -- orders(id, payload JSON) -- payload: {"user":"alice","items":[{"sku":"A1","qty":2}],"total":49.99} -- Extract a value from JSON SELECT id, JSON_EXTRACT(payload, '$.user') AS user_name, JSON_EXTRACT(payload, '$.total') AS total FROM orders; -- Shorthand operator (returns unquoted string) SELECT payload->>'$.user' AS user_name FROM orders; -- Filter by JSON field value SELECT * FROM orders WHERE payload->>'$.user' = 'alice'; -- Count items in JSON array SELECT id, JSON_LENGTH(payload, '$.items') AS item_count FROM orders WHERE JSON_LENGTH(payload, '$.items') > 3; -- orders with >3 items -- Validate all JSON in a TEXT column is valid SELECT id, payload FROM api_logs WHERE JSON_VALID(payload) = 0; -- find broken JSON -- Build JSON from relational data SELECT JSON_OBJECT( 'id', id, 'name', name, 'email', email ) AS user_json FROM users LIMIT 5; -- Aggregate: all order IDs per user as a JSON array SELECT user_id, JSON_ARRAYAGG(id) AS order_ids FROM orders GROUP BY user_id;
What is a JSON column in SQL and when would you use it?
A JSON column stores semi-structured data validated as JSON at insert time. Use it for data with variable structure — API request/response logs, feature flags, user preferences, or product attributes that differ per record. It avoids adding many optional nullable columns to a table.
How do you extract a specific field from a JSON column?
Use JSON_EXTRACT(column, '$.fieldName') or the shorthand column->>'$.fieldName'. The ->> operator returns the value as an unquoted string (without surrounding quotes), while -> returns a JSON fragment. Use ->> for direct comparisons in WHERE clauses.
How do you validate that stored JSON is valid in a TEXT column?
Use JSON_VALID(column) — it returns 1 for valid JSON, 0 for invalid. SELECT * FROM table WHERE JSON_VALID(payload) = 0 surfaces all rows with corrupted or malformed JSON data. This is a key QA check on legacy systems that store JSON as TEXT.
Can you create an index on a JSON column?
Not directly. To index a JSON field, create a generated (virtual or stored) column that extracts the value, then index the generated column: ALTER TABLE t ADD COLUMN user_name VARCHAR(100) AS (payload->>'$.user'); CREATE INDEX idx ON t(user_name). This lets the optimizer use the index for JSON field queries.
What is JSON_ARRAYAGG and how is it different from GROUP_CONCAT?
JSON_ARRAYAGG aggregates values into a proper JSON array (e.g. [1, 2, 3]). GROUP_CONCAT creates a delimited string (e.g. "1,2,3"). JSON_ARRAYAGG preserves data types (numbers remain numbers, not strings) and produces valid JSON output suitable for API responses or further JSON processing.
Best Practices
Common Pitfalls
SELECT * usage
Cause: Retrieves unnecessary columns, slow queries. Solution: Specify exact columns needed for verification.
Missing WHERE clause
Cause: Updates/deletes entire table accidentally. Solution: Always verify WHERE conditions before execution.
NULL handling
Cause: NULL != NULL in comparisons. Solution: Use IS NULL / IS NOT NULL for proper NULL checking.
Case sensitivity in WHERE
Cause: Query returns no results due to case mismatch. Solution: Use LOWER() or UPPER() for case-insensitive comparison.
FAQs
How do I verify data integrity in tests?
Write SQL queries to check counts, sum values, and validate data relationships. Assert results match expected values.
How do I clean test data after tests?
Use DELETE with WHERE clause or TRUNCATE for tables. Wrap in transaction and ROLLBACK if needed.
What's the difference between GROUP BY and DISTINCT?
DISTINCT removes duplicate rows. GROUP BY groups rows and allows aggregate functions. Use based on your requirement.
How do I find slow queries?
Use EXPLAIN PLAN to analyze query performance. Check indexes, avoid SELECT *, optimize JOINs.
How do I handle date comparisons in SQL?
Use DATE() function for date-only comparison. Use BETWEEN for date ranges. Handle timezone differences.
How do I test stored procedures?
Call procedures using CALL syntax. Verify output parameters and result sets. Use ROLLBACK to undo changes.