Selenium WebDriver Tutorial for QA Test Automation with Java
What is Selenium?
Selenium is the world's most widely-used open-source browser automation framework. For over two decades, it has been the industry standard for QA automation, used by thousands of companies ranging from startups to Fortune 500 enterprises. Unlike Cypress which is JavaScript-specific and modern, Selenium is language-agnostic and battles-tested across virtually every enterprise environment.
Selenium WebDriver communicates with browsers through native browser APIs and dedicated driver programs (ChromeDriver, GeckoDriver, etc.). It works by sending commands to the browser driver, which translates those commands into browser-level actions. This architecture makes Selenium powerful for testing across multiple browsers and platforms β a critical requirement in many enterprise QA departments.
Why Selenium Still Dominates Enterprise QA
Despite newer tools like Cypress gaining popularity, Selenium remains the dominant choice in enterprise environments for several reasons:
- Multi-language support: Write tests in Java, Python, C#, JavaScript, Ruby, Kotlin β choose the language your team knows best.
- True cross-browser testing: Test on Chrome, Firefox, Safari, Edge, and even older IE versions simultaneously.
- Ecosystem maturity: Decades of frameworks, best practices, and community knowledge.
- Enterprise integration: Works seamlessly with Jenkins, Azure DevOps, CI/CD pipelines, and legacy test management tools.
- Grid scalability: Run thousands of tests in parallel across distributed machines.
If your organization has strict requirements for multi-browser testing, needs to support multiple programming languages, or is an established enterprise with legacy test infrastructure, Selenium is almost certainly the right choice.
Selenium Architecture & Components
Selenium WebDriver
Direct API between your test code and the browser. Core of modern Selenium v4+. Sends commands via the W3C WebDriver protocol.
Browser Drivers
ChromeDriver, GeckoDriver, EdgeDriver β translate WebDriver commands to browser actions. Selenium 4+ Manager auto-downloads these.
Selenium Grid
Distribute test execution across multiple browsers and machines. Essential for parallel execution of large test suites.
Selenium IDE
Browser extension for record-and-playback automation. Great for learning basics and quick prototyping, though not recommended for maintenance-heavy test suites.
Language Flexibility: Choose What Your Team Knows
One of Selenium's greatest strengths is its multi-language support. Your QA team can write tests in whatever language makes sense for your organization:
This language flexibility is particularly valuable in large organizations where developers and QA engineers may use different tech stacks. A Java development team can have their QA engineers write Selenium tests in Java, while a Python Data team can use Selenium with Python. This eliminates the friction of forcing teams to learn entirely new languages just for testing.
Selenium vs Cypress: Choosing the Right Tool
| Feature | Selenium | Cypress |
|---|---|---|
| Languages | Java, Python, C#, JS⦠| JavaScript / TypeScript only |
| Browser support | All major browsers | Chrome, Firefox, Edge, Electron |
| Execution | Outside browser via WebDriver | Inside browser (same JS loop) |
| Setup | Maven/Gradle + WebDriver | npm install (very easy) |
| API Testing | Not built-in | Built-in cy.request() |
| Best for | Enterprise, multi-language teams | Modern JS web apps |
Basic Selenium Test
import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; public class FirstTest { public static void main(String[] args) { WebDriver driver = new ChromeDriver(); driver.get("https://thetechworldlabs.com"); System.out.println("Title: " + driver.getTitle()); driver.quit(); } }
Top Interview Questions β Selenium
What is Selenium WebDriver?
WebDriver is an interface providing methods to control browsers programmatically. It directly communicates with the browser via native browser APIs β no intermediary.
findElement vs findElements?
findElement() returns the first matching element or throws NoSuchElementException. findElements() returns a List β empty if nothing found, never throws an exception.
What is Selenium Grid?
Grid enables distributed test execution across multiple machines and browsers simultaneously, dramatically reducing total test suite run time.
Implicit vs Explicit Wait?
Implicit wait applies globally to all findElement calls. Explicit wait targets a specific element and condition β always prefer explicit over implicit for reliable tests.
Prerequisites
Setup Steps
Create Maven Project
In IntelliJ: File β New β Project β Maven. Set Group ID (e.g. com.techworldlabs) and Artifact ID.
Add Dependencies
Add Selenium Java and TestNG to pom.xml. Maven downloads everything automatically.
Create Test Class
Create a class in src/test/java, instantiate ChromeDriver, and write your first test.
Run Tests
Right-click the class β Run, or use mvn test from terminal.
pom.xml Dependencies
<!-- Selenium Java --> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>4.18.1</version> </dependency> <!-- TestNG --> <dependency> <groupId>org.testng</groupId> <artifactId>testng</artifactId> <version>7.9.0</version> <scope>test</scope> </dependency>
Test Anatomy
@BeforeClass β Setup
Instantiate WebDriver and maximize the window. Runs once before all tests in the class.
@Test β Test Case
Navigate to a URL, interact with elements, and assert expected outcomes.
Assert β Verify
Use TestNG's Assert class to compare expected vs actual values and fail the test if they differ.
@AfterClass β Teardown
Call driver.quit() to close all browser windows and release WebDriver resources.
close() closes the current tab only. quit() closes all windows and ends the session β always use quit() in teardown.Complete First Test with TestNG
import org.openqa.selenium.*; import org.openqa.selenium.chrome.ChromeDriver; import org.testng.Assert; import org.testng.annotations.*; public class MyFirstTest { WebDriver driver; @BeforeClass public void setUp() { driver = new ChromeDriver(); driver.manage().window().maximize(); } @Test public void testPageTitle() { driver.get("https://thetechworldlabs.com"); Assert.assertEquals(driver.getTitle(), "TechWorld Labs"); } @AfterClass public void tearDown() { driver.quit(); } }
Locator Types
| Locator | Method | Example |
|---|---|---|
| ID | By.id() | By.id("login-btn") |
| Name | By.name() | By.name("username") |
| CSS Selector | By.cssSelector() | By.cssSelector(".btn-primary") |
| XPath | By.xpath() | By.xpath("//button[@type='submit']") |
| Link Text | By.linkText() | By.linkText("Click here") |
| Class Name | By.className() | By.className("form-control") |
Which locator is most reliable?
ID is the most reliable when present β it must be unique per page. CSS selectors are fast and readable. Avoid auto-generated or positional XPaths as they break easily when the DOM changes.
findElement vs findElements?
findElement() returns the first match or throws NoSuchElementException. findElements() returns a List β empty if nothing found, never throws.
XPath Cheat Sheet
| Expression | Selects |
|---|---|
//tag | All elements of that tag anywhere in DOM |
//tag[@attr='val'] | Tag with exact attribute value |
//tag[contains(@attr,'val')] | Partial attribute match |
//tag[text()='value'] | Exact text match |
//tag[contains(text(),'val')] | Partial text match |
//parent/child | Direct child element |
//parent//descendant | Any descendant |
(//tag)[n] | nth occurrence (1-indexed) |
//tag[@a and @b] | Multiple attribute conditions |
CSS Selector Cheat Sheet
| Selector | Matches |
|---|---|
#id | Element with specific ID |
.class | Elements with that class |
tag[attr='val'] | Attribute exact match |
tag[attr*='val'] | Attribute contains value |
parent > child | Direct child only |
:nth-child(n) | nth child element |
Java Examples
// By exact text By.xpath("//button[text()='Login']") // Partial text match By.xpath("//button[contains(text(),'Log')]") // Parent β child By.xpath("//form[@id='loginForm']//input[@type='email']") // CSS equivalent By.cssSelector("form#loginForm input[type='email']") // Contains class By.cssSelector("button[class*='btn-primary']")
Common WebElement Methods
| Method | What it does |
|---|---|
.click() | Click a button, link, checkbox, or radio |
.sendKeys("text") | Type into an input or textarea |
.clear() | Clear input field content |
.getText() | Get the visible text of an element |
.getAttribute("attr") | Get attribute value (href, value, classβ¦) |
.isDisplayed() | Check if element is visible |
.isEnabled() | Check if element is enabled (not greyed out) |
.isSelected() | Check if checkbox/radio is selected |
Actions Code Examples
// Click & type driver.findElement(By.id("submit")).click(); driver.findElement(By.name("email")).sendKeys("[email protected]"); // Dropdown using Select class Select dropdown = new Select(driver.findElement(By.id("country"))); dropdown.selectByVisibleText(""); dropdown.selectByValue("np"); dropdown.selectByIndex(2); // Advanced: hover, drag-and-drop Actions actions = new Actions(driver); actions.moveToElement(element).perform(); actions.doubleClick(element).perform(); actions.dragAndDrop(source, target).perform(); // Handle alert popup Alert alert = driver.switchTo().alert(); alert.accept(); // Click OK alert.dismiss(); // Click Cancel
Types of Waits
| Wait Type | How it works | Recommendation |
|---|---|---|
Implicit Wait | Polls DOM for set duration on every findElement globally | Use as baseline only |
Explicit Wait | Waits for a specific condition on a specific element | β Preferred |
Thread.sleep() | Hard pause regardless of element state | β Never use |
Fluent Wait | Custom polling interval + ignore specific exceptions | Complex async cases |
Wait Code Examples
// Implicit wait β applies to all findElement calls driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10)); // Explicit wait β waits for specific condition WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(15)); WebElement el = wait.until( ExpectedConditions.visibilityOfElementLocated(By.id("result")) ); // Other useful ExpectedConditions ExpectedConditions.elementToBeClickable(By.id("btn")); ExpectedConditions.textToBePresentInElement(el, "Success"); ExpectedConditions.urlContains("/dashboard"); ExpectedConditions.alertIsPresent();
Implicit vs Explicit Wait β which is better?
Explicit wait is always preferred. It targets a specific element and condition, making tests faster and more reliable. Mixing both can cause unpredictable timeouts.
What is FluentWait?
FluentWait lets you define a custom polling frequency and which exceptions to ignore. Use when an element appears intermittently and you need fine-grained retry control.
TestNG Annotations
| Annotation | When it runs | Common use |
|---|---|---|
@BeforeSuite | Once before all tests | Suite-wide config |
@BeforeClass | Once before class tests | Driver instantiation |
@BeforeMethod | Before each @Test | Navigate to page, reset state |
@Test | The test itself | priority, groups, dataProvider |
@AfterMethod | After each @Test | Screenshot on fail, clear cookies |
@AfterClass | Once after class tests | driver.quit() |
@DataProvider | Supplies test data | Data-driven testing (DDT) |
TestNG + DataProvider Example
import org.testng.annotations.*; import org.testng.Assert; public class LoginTest { WebDriver driver; @BeforeClass public void classSetup() { driver = new ChromeDriver(); } @BeforeMethod public void methodSetup() { driver.get("https://example.com"); } @Test(priority = 1) public void testValidLogin() { Assert.assertTrue(driver.getCurrentUrl().contains("/dashboard")); } @Test(dataProvider = "loginData") public void testLoginDDT(String user, String pass) { /* runs for each row */ } @DataProvider(name = "loginData") public Object[][] getData() { return new Object[][] { { "user1", "pass1" }, { "user2", "pass2" } }; } @AfterClass public void tearDown() { driver.quit(); } }
What is @DataProvider?
@DataProvider feeds multiple data sets to a single @Test method, running it once per row β enabling data-driven testing without duplicating test code.
How to run tests in parallel with TestNG?
In testng.xml: set parallel="methods" and thread-count. Each test runs in its own thread with its own WebDriver instance to avoid conflicts.
@BeforeClass vs @BeforeMethod?
@BeforeClass runs once before all tests in the class β use for driver setup. @BeforeMethod runs before every single @Test β use for navigation and state reset.
POM Structure
Page Class
One class per page. Holds locators (By fields) and action methods (login, logout, search). No test assertions here.
Test Class
Imports page objects and calls their methods. Contains assertions and test logic only β zero locators.
Base Class
Common setup/teardown (driver init, quit). All test classes extend it for consistency.
LoginPage.java + LoginTest.java
public class LoginPage { WebDriver driver; By emailField = By.id("email"); By passField = By.id("password"); By loginButton = By.cssSelector("button[type='submit']"); By errorMsg = By.className("error-message"); public LoginPage(WebDriver driver) { this.driver = driver; } public void login(String email, String pass) { driver.findElement(emailField).sendKeys(email); driver.findElement(passField).sendKeys(pass); driver.findElement(loginButton).click(); } public String getErrorMessage() { return driver.findElement(errorMsg).getText(); } }
public class LoginTest { WebDriver driver; LoginPage loginPage; @BeforeClass public void setUp() { driver = new ChromeDriver(); loginPage = new LoginPage(driver); driver.get("https://example.com/login"); } @Test public void testValidLogin() { loginPage.login("[email protected]", "pass123"); Assert.assertTrue(driver.getCurrentUrl().contains("/dashboard")); } @AfterClass public void tearDown() { driver.quit(); } }
Reporting Options
Allure Setup
<dependency> <groupId>io.qameta.allure</groupId> <artifactId>allure-testng</artifactId> <version>2.26.0</version> </dependency>
# Run tests (generates allure-results/) mvn clean test # Generate HTML report allure generate allure-results --clean -o allure-report # Open in browser allure open allure-report
Multiple Windows & Tabs
getWindowHandle()
Returns the handle (unique ID) of the current window. Save this to switch back later.
getWindowHandles()
Returns a Set of all open window handles. Iterate to find the new window and switch to it.
switchTo().window(handle)
Switches WebDriver context to the specified window by its handle string.
driver.close()
Closes the current window only. Then switchTo() the parent handle to resume testing there.
iFrames
| Method | How | Use When |
|---|---|---|
switchTo().frame(int index) | 0-based index | Frame has no id/name |
switchTo().frame(String name) | id or name attribute | Frame has id or name |
switchTo().frame(WebElement) | Locate frame element first | Dynamic/nested frames |
switchTo().defaultContent() | Back to main page | After iFrame work is done |
switchTo().parentFrame() | One level up | Nested iFrames |
JavaScript Alerts
| Alert Type | Method | Notes |
|---|---|---|
| Simple Alert | alert.accept() | Click OK only |
| Confirm Box | accept() / dismiss() | OK or Cancel |
| Prompt Box | alert.sendKeys("text") + accept() | Type into alert then OK |
| Get text | alert.getText() | Read alert message |
WebDriverWait + ExpectedConditions.alertIsPresent() before switching to an alert to avoid NoAlertPresentException.Multiple Window Switching
// Save parent window handle String parentHandle = driver.getWindowHandle(); // Click link that opens new window driver.findElement(By.linkText("Open New Tab")).click(); // Get all handles and switch to new window Set<String> allHandles = driver.getWindowHandles(); for (String handle : allHandles) { if (!handle.equals(parentHandle)) { driver.switchTo().window(handle); break; } } System.out.println("New window: " + driver.getTitle()); // Close child window and switch back driver.close(); driver.switchTo().window(parentHandle);
iFrame Handling
// Switch by index driver.switchTo().frame(0); // Switch by name or id driver.switchTo().frame("iframeName"); // Switch by WebElement (most reliable) WebElement iframe = driver.findElement(By.cssSelector("iframe.content-frame")); driver.switchTo().frame(iframe); // Interact inside iframe driver.findElement(By.id("insideIframe")).click(); // Switch back to main page driver.switchTo().defaultContent();
Alert Handling
// Wait for alert then handle it WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(5)); wait.until(ExpectedConditions.alertIsPresent()); Alert alert = driver.switchTo().alert(); String msg = alert.getText(); System.out.println("Alert says: " + msg); // Accept (OK) alert.accept(); // Dismiss (Cancel) // alert.dismiss(); // Prompt β type text then accept // alert.sendKeys("My Input"); alert.accept();
How do you switch to a new browser window?
Save parent handle with getWindowHandle(). After the action opens a new window, loop through getWindowHandles() to find the new handle (one that doesn't equal the parent), then call driver.switchTo().window(newHandle).
Difference between switchTo().frame() and switchTo().defaultContent()?
switchTo().frame() moves WebDriver context into an iFrame so you can interact with elements inside it. defaultContent() resets context back to the main page. Without switching back, findElement calls on the main page will fail.
What exception is thrown when an alert is not present?
NoAlertPresentException. Always use WebDriverWait with ExpectedConditions.alertIsPresent() before calling driver.switchTo().alert() to prevent this.
Difference between alert.accept() and alert.dismiss()?
accept() clicks the OK/Yes button. dismiss() clicks Cancel/No. For a simple alert with no Cancel button, both close it β but always use accept() for OK and dismiss() for Cancel semantically.
How do you handle nested iFrames?
Switch into the outer iframe first, then switch into the inner iframe. Use switchTo().parentFrame() to go one level up, or switchTo().defaultContent() to jump directly back to the main page.
What is JavascriptExecutor?
JavascriptExecutor is a Selenium interface that lets you run arbitrary JavaScript in the browser context. Cast your WebDriver to it: JavascriptExecutor js = (JavascriptExecutor) driver;
When to Use JavascriptExecutor
Click Hidden Elements
When an element is not interactable via normal .click() (hidden, overlapped by another element), use JS click as a workaround.
Scroll Operations
Scroll to a specific element, scroll to the top/bottom of the page, or scroll by pixel amount using window.scrollTo().
Get/Set Values
Read values from read-only fields or set values into fields that resist sendKeys() β date pickers, masked inputs, custom widgets.
Highlight Elements
Change element border/background color temporarily β useful for visual debugging of test execution.
JavaScript Executor Examples
JavascriptExecutor js = (JavascriptExecutor) driver; // Scroll to bottom of page js.executeScript("window.scrollTo(0, document.body.scrollHeight);"); // Scroll to top js.executeScript("window.scrollTo(0, 0);"); // Scroll element into view WebElement el = driver.findElement(By.id("footer")); js.executeScript("arguments[0].scrollIntoView(true);", el); // JS click (for hidden/overlapped elements) js.executeScript("arguments[0].click();", el); // Set value into a read-only or masked input js.executeScript("arguments[0].value='2026-04-15';", driver.findElement(By.id("datePicker"))); // Get text of a hidden element String text = (String) js.executeScript( "return arguments[0].innerText;", el); // Highlight element for debugging js.executeScript( "arguments[0].style.border='3px solid red';", el); // Get page title via JS String title = (String) js.executeScript("return document.title;"); // Refresh page via JS js.executeScript("location.reload();");
What is JavascriptExecutor in Selenium?
It is a Selenium interface that allows execution of JavaScript code inside the browser. It is used when standard WebDriver methods are insufficient β like clicking hidden elements, scrolling, or manipulating the DOM directly.
How do you scroll to an element using Selenium?
Cast driver to JavascriptExecutor and call: js.executeScript("arguments[0].scrollIntoView(true);", element). In Selenium 4 you can also use: new Actions(driver).scrollToElement(element).perform().
Difference between executeScript and executeAsyncScript?
executeScript runs synchronously and returns immediately. executeAsyncScript handles asynchronous operations β it injects a callback you must invoke to signal completion, used for AJAX calls or setTimeout-based interactions.
When should you NOT use JavascriptExecutor?
Avoid it as a first resort. Standard Selenium methods validate element state (visibility, clickability) before acting β JS executor bypasses those checks, which can mask real UI bugs. Use only when standard methods genuinely fail.
Project Structure
selenium-framework/ βββ src/ β βββ main/java/com/techworldlabs/ β β βββ base/ β BaseTest.java β β βββ topics/ β Page Object classes β β βββ utils/ β DriverFactory, ConfigReader, ScreenshotUtil β βββ test/java/com/techworldlabs/ β βββ tests/ β Test classes βββ src/test/resources/ β βββ config.properties β Environment configs β βββ testng.xml β Test suite config βββ pom.xml
Key Components
DriverFactory
Centralized WebDriver creation. Uses ThreadLocal for parallel-safe driver management β each thread gets its own driver instance.
ConfigReader
Reads config.properties. Provides base URL, browser, timeout values β no hardcoded values anywhere in test code.
BaseTest
Common @BeforeMethod / @AfterMethod using DriverFactory. All test classes extend BaseTest β zero boilerplate in individual tests.
ScreenshotUtil
Captures screenshots on test failure in @AfterMethod. Saves with timestamp to target/screenshots/ for easy debugging.
DriverFactory.java
public class DriverFactory { private static ThreadLocal<WebDriver> driver = new ThreadLocal<>(); public static WebDriver getDriver() { return driver.get(); } public static void initDriver(String browser) { WebDriver d; switch (browser.toLowerCase()) { case "firefox": d = new FirefoxDriver(); break; case "edge": d = new EdgeDriver(); break; default: d = new ChromeDriver(); } d.manage().window().maximize(); d.manage().timeouts().implicitlyWait(Duration.ofSeconds(5)); driver.set(d); } public static void quitDriver() { if (driver.get() != null) { driver.get().quit(); driver.remove(); } } }
ConfigReader.java
public class ConfigReader { private static Properties props = new Properties(); static { try (InputStream in = ConfigReader.class .getResourceAsStream("/config.properties")) { props.load(in); } catch (IOException e) { throw new RuntimeException(e); } } public static String get(String key) { return props.getProperty(key); } } // config.properties example: // base.url=https://example.com // browser=chrome // timeout=10
BaseTest.java + ScreenshotUtil.java
public class BaseTest { @BeforeMethod public void setUp() { DriverFactory.initDriver(ConfigReader.get("browser")); DriverFactory.getDriver().get(ConfigReader.get("base.url")); } @AfterMethod public void tearDown(ITestResult result) { if (result.getStatus() == ITestResult.FAILURE) { ScreenshotUtil.capture(DriverFactory.getDriver(), result.getName()); } DriverFactory.quitDriver(); } }
public class ScreenshotUtil { public static void capture(WebDriver driver, String testName) { TakesScreenshot ts = (TakesScreenshot) driver; File src = ts.getScreenshotAs(OutputType.FILE); String timestamp = new SimpleDateFormat("yyyyMMdd_HHmmss") .format(new Date()); String dest = "target/screenshots/" + testName + "_" + timestamp + ".png"; try { FileUtils.copyFile(src, new File(dest)); System.out.println("Screenshot saved: " + dest); } catch (IOException e) { e.printStackTrace(); } } }
What is DriverFactory and why use it?
DriverFactory centralizes WebDriver creation and management. Using ThreadLocal<WebDriver> makes it thread-safe for parallel execution β each thread gets its own driver instance, preventing race conditions between parallel tests.
What is ThreadLocal in the context of Selenium?
ThreadLocal provides a separate variable instance per thread. In parallel Selenium tests, each thread gets its own WebDriver via ThreadLocal so tests don't share and corrupt each other's browser session.
How do you manage different environments (dev/qa/prod)?
Use separate config.properties per environment and pass the env as a Maven property: -Denv=qa. ConfigReader reads System.getProperty("env") and loads the correct file. Never hardcode URLs in test code.
When do you capture screenshots in a framework?
Capture on test failure in @AfterMethod by checking ITestResult.getStatus() == FAILURE. Save with timestamp + test name to avoid overwrites. Attach to Allure/ExtentReports for failure visibility.
What is the difference between @BeforeMethod and @BeforeClass for driver setup?
@BeforeClass creates one driver for all tests in the class β tests share state and can interfere. @BeforeMethod creates a fresh driver before each test β isolated, clean, preferred for reliable test suites.
Apache POI Classes
| Class | Purpose | File Type |
|---|---|---|
XSSFWorkbook | Open .xlsx files | Excel 2007+ |
HSSFWorkbook | Open .xls files | Excel 97-2003 |
XSSFSheet | Access a specific sheet | β |
XSSFRow | Access a row | β |
XSSFCell | Access a cell value | β |
DataFormatter | Get cell value as String regardless of type | β |
Data-Driven Testing Flow
Create Excel File
Add column headers in row 0, test data in subsequent rows. Place the file in src/test/resources/testdata/.
ExcelUtil Class
Utility that opens the workbook, reads rows/cells, and returns data as a 2D Object array for TestNG.
@DataProvider
TestNG @DataProvider method calls ExcelUtil and returns the 2D array to feed data row by row to the test.
@Test receives data
Method parameters match data columns. TestNG automatically runs the test once per row β no loops needed.
pom.xml β Apache POI Dependency
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>5.2.5</version> </dependency>
ExcelUtil.java
import org.apache.poi.xssf.usermodel.*; import org.apache.poi.ss.usermodel.DataFormatter; public class ExcelUtil { public static Object[][] readData(String filePath, String sheetName) throws IOException { FileInputStream fis = new FileInputStream(filePath); XSSFWorkbook workbook = new XSSFWorkbook(fis); XSSFSheet sheet = workbook.getSheet(sheetName); DataFormatter fmt = new DataFormatter(); int rows = sheet.getLastRowNum(); // row 0 = header, excluded int cols = sheet.getRow(0).getLastCellNum(); Object[][] data = new Object[rows][cols]; for (int r = 1; r <= rows; r++) { XSSFRow row = sheet.getRow(r); for (int c = 0; c < cols; c++) { data[r - 1][c] = fmt.formatCellValue(row.getCell(c)); } } workbook.close(); return data; } }
Test with Excel DataProvider
@DataProvider(name = "loginExcel") public Object[][] excelData() throws IOException { return ExcelUtil.readData( "src/test/resources/testdata/loginData.xlsx", "LoginSheet"); } @Test(dataProvider = "loginExcel") public void testLoginWithExcel(String username, String password) { loginPage.login(username, password); Assert.assertTrue( driver.getCurrentUrl().contains("/dashboard"), "Login failed for: " + username); }
What is Apache POI?
Apache POI is an open-source Java library for reading and writing Microsoft Office formats (Excel, Word, PowerPoint). In Selenium, it reads test data from .xlsx files and feeds it to tests via @DataProvider.
Difference between HSSFWorkbook and XSSFWorkbook?
HSSFWorkbook handles .xls format (Excel 97-2003). XSSFWorkbook handles .xlsx format (Excel 2007+). Always prefer XSSFWorkbook for new test data files since .xls is a legacy format.
What is Data-Driven Testing?
DDT is a testing approach where the same test logic runs with multiple data sets. Test code is written once, and data is externalized to Excel, CSV, or a database. This maximises coverage without any code duplication.
How do you handle numeric cells in Apache POI?
Numeric cells return a double by default. Use DataFormatter to get the cell's displayed string value regardless of cell type β it's the safest approach and handles dates, numbers, and text uniformly.
Grid Architecture
Hub
The central server that receives test requests and routes them to available Nodes. Acts as the traffic controller for the entire Grid.
Node
A machine registered with the Hub. Has browsers installed and executes the tests assigned by the Hub, reporting results back.
RemoteWebDriver
Instead of ChromeDriver, tests use RemoteWebDriver pointing to the Hub URL β the Hub routes it to the correct Node automatically.
Docker Grid
Run Hub and Nodes in Docker containers β the easiest way to spin up a Grid locally or in CI without manual Node setup.
Grid 4 Modes
| Mode | Flag | Use Case |
|---|---|---|
| Standalone | standalone | Single machine β Hub + Node combined, for local testing |
| Hub | hub | Central routing server in distributed multi-machine setup |
| Node | node | Worker machine that executes tests |
http://localhost:4444 showing live sessions, node capacity, and queue depth in real time.Start Grid β Standalone Mode
# Download selenium-server-4.x.jar, then: java -jar selenium-server-4.18.0.jar standalone # Hub and Node separately: java -jar selenium-server-4.18.0.jar hub java -jar selenium-server-4.18.0.jar node --hub http://localhost:4444
Docker Compose Grid
version: "3"
services:
selenium-hub:
image: selenium/hub:4.18.0
ports: ["4444:4444"]
chrome:
image: selenium/node-chrome:4.18.0
depends_on: [selenium-hub]
environment:
- SE_EVENT_BUS_HOST=selenium-hub
firefox:
image: selenium/node-firefox:4.18.0
depends_on: [selenium-hub]
environment:
- SE_EVENT_BUS_HOST=selenium-hub
RemoteWebDriver Code
ChromeOptions options = new ChromeOptions(); WebDriver driver = new RemoteWebDriver( new URL("http://localhost:4444"), options); driver.get("https://example.com"); System.out.println(driver.getTitle()); driver.quit();
testng.xml β Parallel on Grid
<suite name="GridSuite" parallel="methods" thread-count="5">
<test name="AllTests">
<classes>
<class name="tests.LoginTest"/>
<class name="tests.SearchTest"/>
</classes>
</test>
</suite>
What is Selenium Grid used for?
Selenium Grid enables parallel test execution across multiple machines and browsers. It reduces total suite time dramatically β a 2-hour suite can run in 20 minutes with 6 parallel nodes.
Difference between Hub and Node?
The Hub is the central coordinator β it receives test requests and assigns them to Nodes based on requested browser capabilities. Nodes are worker machines with browsers installed. One Hub, many Nodes.
How does RemoteWebDriver differ from ChromeDriver?
ChromeDriver launches a local browser on the same machine. RemoteWebDriver sends commands over HTTP to the Grid Hub, which forwards them to a Node that has the requested browser. Test code looks identical β only instantiation differs.
How do you ensure thread safety when running parallel tests on Grid?
Use ThreadLocal<WebDriver> in DriverFactory so each thread has its own driver instance. Never use a shared static WebDriver field in parallel tests β it causes race conditions and session collisions.
Why CI/CD for Tests?
CI Setup Requirements
Headless Mode
CI servers have no display. Always run Chrome/Firefox in headless mode β add --headless=new to ChromeOptions in your DriverFactory when CI environment is detected.
Maven Command
mvn clean test is the standard trigger. Pass browser and environment as -D parameters: mvn clean test -Dbrowser=chrome -Denv=qa
Artifact Publishing
Archive allure-results/ or test-output/ as build artifacts so reports are accessible after the build β even when tests fail.
GitHub Actions Workflow
name: Selenium Tests
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up JDK 17
uses: actions/setup-java@v4
with:
java-version: '17'
distribution: 'temurin'
- name: Run Selenium Tests
run: mvn clean test -Dbrowser=chrome-headless -Denv=qa
- name: Upload Allure Results
if: always()
uses: actions/upload-artifact@v4
with:
name: allure-results
path: allure-results/
Jenkinsfile (Declarative Pipeline)
pipeline {
agent any
stages {
stage('Checkout') {
steps { checkout scm }
}
stage('Test') {
steps {
sh 'mvn clean test -Dbrowser=chrome-headless'
}
}
}
post {
always {
publishHTML(target: [
reportDir: 'allure-report',
reportFiles: 'index.html',
reportName: 'Allure Report'
])
}
failure {
mail to: 'qa-team@company.com',
subject: "FAILED: ${env.JOB_NAME}",
body: "Build ${env.BUILD_URL} failed."
}
}
}
Why do you need headless mode in CI?
CI/CD servers have no graphical display. Headless Chrome/Firefox runs without opening a visible window β the full browser engine (JS, CSS, DOM) operates normally, just without rendering to a screen.
How do you integrate Selenium tests with Jenkins?
Create a Jenkinsfile with pipeline stages. The Test stage runs mvn clean test. Use the HTML Publisher or Allure Jenkins plugin to publish reports. Maven returns exit code 1 on test failure, which automatically marks the build FAILED.
How do you pass different environments to tests in CI?
Use Maven properties: -Denv=qa -Dbrowser=chrome. DriverFactory reads System.getProperty("browser") and ConfigReader reads System.getProperty("env") to load the correct properties file at runtime.
What happens when Selenium tests fail in CI?
Maven exits with code 1, marking the CI build FAILED. The pipeline can be configured to: send Slack/email notifications, block PR merges as a GitHub required status check, and archive screenshots and logs for failure analysis.
Browser Options Classes
| Browser | Options Class | Driver Class |
|---|---|---|
| Chrome | ChromeOptions | ChromeDriver |
| Firefox | FirefoxOptions | FirefoxDriver |
| Edge | EdgeOptions | EdgeDriver |
| Safari | SafariOptions | SafariDriver |
Headless Mode Benefits
Cross-browser DriverFactory
public static WebDriver createDriver(String browser) { return switch (browser.toLowerCase()) { case "firefox" -> new FirefoxDriver(new FirefoxOptions()); case "edge" -> new EdgeDriver(new EdgeOptions()); default -> new ChromeDriver(new ChromeOptions()); }; }
Headless Chrome & Firefox
// Headless Chrome (Chrome 112+) ChromeOptions chromeOpts = new ChromeOptions(); chromeOpts.addArguments("--headless=new"); chromeOpts.addArguments("--window-size=1920,1080"); chromeOpts.addArguments("--no-sandbox"); // required in Docker chromeOpts.addArguments("--disable-dev-shm-usage"); // required in Docker WebDriver driver = new ChromeDriver(chromeOpts); // Headless Firefox FirefoxOptions ffOpts = new FirefoxOptions(); ffOpts.addArguments("-headless"); WebDriver ffDriver = new FirefoxDriver(ffOpts);
testng.xml β Cross-browser Parallel
<suite name="CrossBrowser" parallel="tests" thread-count="3">
<test name="Chrome">
<parameter name="browser" value="chrome"/>
<classes><class name="tests.LoginTest"/></classes>
</test>
<test name="Firefox">
<parameter name="browser" value="firefox"/>
<classes><class name="tests.LoginTest"/></classes>
</test>
<test name="Edge">
<parameter name="browser" value="edge"/>
<classes><class name="tests.LoginTest"/></classes>
</test>
</suite>
How do you implement cross-browser testing in Selenium?
Parameterise the browser name via testng.xml or a Maven -D property. DriverFactory reads this at runtime and instantiates the correct driver. All test logic is identical β only the driver creation differs, managed in one place.
What is headless browser testing?
Headless runs a real browser without a visible window. The full browser engine (JavaScript, CSS, DOM) operates normally β only the graphical rendering to a screen is skipped. Tests run faster and work in display-free CI servers.
What Chrome flags are required for headless in Docker?
--headless=new (new headless engine), --no-sandbox (Docker runs as root; Chrome sandbox requires non-root), --disable-dev-shm-usage (Docker's /dev/shm is small, causing Chrome crashes without this), and --window-size to define a viewport.
Can you take screenshots in headless mode?
Yes. TakesScreenshot works identically in headless mode β the browser renders the page in memory and captures it. This is exactly how CI pipelines produce failure screenshots without any display attached.
Difference between --headless and --headless=new in Chrome?
The old --headless (pre-Chrome 112) used a separate rendering pipeline that sometimes behaved differently from the headed browser. --headless=new (Chrome 112+) uses the exact same rendering engine as headed Chrome, making behaviour consistent.
File Upload Approaches
sendKeys on <input type="file">
The simplest and most reliable approach. Use sendKeys() with the absolute file path directly on the file input element β no clicking needed, no dialog pops up.
Robot Class
Java's Robot class can type into OS-level file dialogs. Use when the page opens a native OS file picker instead of an HTML input element.
AutoIT (Windows only)
A Windows scripting tool that controls native OS dialogs. Compile a .au3 script to .exe and execute it from Java when a native dialog appears.
JavascriptExecutor
Make a hidden file input visible using JS, then use sendKeys normally. Works when the input is styled/hidden but still present in the DOM.
File Download Handling
| Approach | How | Works On |
|---|---|---|
| ChromeOptions download path | Set default download directory β skip browser dialog | Chrome / Edge |
| FirefoxProfile | Set browser.download.dir preference | Firefox |
| Verify with File.exists() | Poll the download folder until file appears | All browsers |
new File("path").getAbsolutePath().File Upload β sendKeys
// Locate the native file input element WebElement fileInput = driver.findElement(By.cssSelector("input[type='file']")); // Send absolute file path β no dialog opens! String filePath = new File("src/test/resources/testdata/sample.pdf") .getAbsolutePath(); fileInput.sendKeys(filePath); // Verify filename appeared on the page WebElement fileName = driver.findElement(By.cssSelector(".file-name")); Assert.assertTrue(fileName.getText().contains("sample.pdf")); // Submit the form driver.findElement(By.id("uploadBtn")).click();
Upload Hidden Input via JS
JavascriptExecutor js = (JavascriptExecutor) driver; WebElement fileInput = driver.findElement(By.cssSelector("input[type='file']")); // Make hidden input visible js.executeScript("arguments[0].style.display='block';", fileInput); // Now sendKeys works normally fileInput.sendKeys(new File("src/test/resources/testdata/image.png") .getAbsolutePath());
File Download β ChromeOptions
String downloadPath = System.getProperty("user.home") + "/Downloads/test"; HashMap<String, Object> prefs = new HashMap<>(); prefs.put("download.default_directory", downloadPath); prefs.put("download.prompt_for_download", false); // no dialog prefs.put("plugins.always_open_pdf_externally", true); // download PDF ChromeOptions opts = new ChromeOptions(); opts.setExperimentalOption("prefs", prefs); WebDriver driver = new ChromeDriver(opts); // Verify file downloaded β poll until it appears (max 10s) File downloadedFile = new File(downloadPath + "/report.pdf"); WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10)); wait.until(d -> downloadedFile.exists()); Assert.assertTrue(downloadedFile.exists(), "File not downloaded!");
How do you handle file upload in Selenium?
If the page has a native HTML input[type='file'], use sendKeys() with the absolute file path. Selenium types the path directly into the input β the OS dialog never appears. This is the simplest and most reliable approach.
Why use getAbsolutePath() instead of a hardcoded path?
Relative paths depend on the current working directory, which varies between IDE runs, Maven builds, and CI. getAbsolutePath() always resolves from the file system root and works correctly in all environments.
What do you do when the file input is hidden?
Use JavascriptExecutor to make the input visible: js.executeScript("arguments[0].style.display='block';", fileInput). Then sendKeys works normally. Never try to click a hidden element β make it interactable first.
How do you handle file download verification in Selenium?
Configure the browser download path via ChromeOptions prefs so no save dialog appears. After triggering the download, use WebDriverWait with a lambda (d -> file.exists()) to poll until the file appears in the folder, then assert its existence.
When would you use Robot class for file upload?
When the application opens a native OS file picker dialog (not an HTML input) β for example, a drag-and-drop upload widget that triggers the OS dialog via JavaScript. Robot can type the file path and press Enter to confirm the OS dialog.
Cookie API Methods
| Method | What it does |
|---|---|
driver.manage().getCookies() | Returns all cookies for the current domain as a Set |
driver.manage().getCookieNamed("name") | Returns a single cookie by name |
driver.manage().addCookie(cookie) | Adds a new cookie to the browser |
driver.manage().deleteCookieNamed("name") | Deletes one cookie by name |
driver.manage().deleteCookie(cookie) | Deletes a specific Cookie object |
driver.manage().deleteAllCookies() | Clears all cookies for the current domain |
Key Use Cases
Skip Login UI
Obtain a valid session cookie via API call, inject it into the browser, then navigate directly to authenticated topics β saves login time across hundreds of tests.
Test Cookie Consent
Add or delete GDPR consent cookies and verify the banner appears/disappears or that analytics fires correctly based on consent state.
State Reset Between Tests
Call deleteAllCookies() in @AfterMethod or @BeforeMethod to ensure each test starts with a clean session with no leftover auth state.
Read Session Tokens
Read a session or JWT cookie after login and use it in API requests within the same test scenario β combining UI and API testing.
driver.get("https://example.com") before any addCookie() call.Read & Print All Cookies
driver.get("https://example.com"); Set<Cookie> cookies = driver.manage().getCookies(); for (Cookie c : cookies) { System.out.println(c.getName() + " = " + c.getValue()); } // Get a specific cookie Cookie sessionCookie = driver.manage().getCookieNamed("JSESSIONID"); System.out.println("Session: " + sessionCookie.getValue());
Add & Delete Cookies
// Must navigate to domain first driver.get("https://example.com"); // Add a cookie Cookie consent = new Cookie.Builder("gdpr_consent", "accepted") .domain("example.com") .path("/") .isSecure(false) .build(); driver.manage().addCookie(consent); // Delete one cookie driver.manage().deleteCookieNamed("gdpr_consent"); // Delete all cookies (reset session) driver.manage().deleteAllCookies();
Skip Login by Injecting Session Cookie
// Step 1: Log in via UI once and capture session cookie driver.get("https://example.com/login"); driver.findElement(By.id("email")).sendKeys("user@test.com"); driver.findElement(By.id("password")).sendKeys("pass123"); driver.findElement(By.id("loginBtn")).click(); Cookie session = driver.manage().getCookieNamed("session_id"); // Step 2: For subsequent tests β inject cookie, skip login UI driver.get("https://example.com"); // must visit domain first driver.manage().addCookie(session); driver.get("https://example.com/dashboard"); // now logged in! Assert.assertTrue(driver.getCurrentUrl().contains("/dashboard"));
How do you add a cookie in Selenium?
First navigate to the domain (same-origin rule), then use driver.manage().addCookie(new Cookie("name", "value")). Use Cookie.Builder for more control over domain, path, and expiry attributes.
How do you skip the login UI in tests using cookies?
Log in once via UI to get a valid session cookie. Save that cookie object. In subsequent tests, navigate to the domain, inject the cookie via addCookie(), then navigate directly to the authenticated page β completely bypassing the login form.
Why must you navigate to the domain before adding a cookie?
Browsers enforce the same-origin policy β you can only add cookies for the domain currently loaded. If you try to add a cookie before navigating, the cookie will be rejected or added to the wrong domain.
How do you reset session state between tests?
Call driver.manage().deleteAllCookies() in @AfterMethod or @BeforeMethod. This clears all session, authentication, and preference cookies, ensuring each test starts with a completely clean browser state.
What information does a Selenium Cookie object hold?
A Cookie object stores: name, value, domain, path, expiry date, isSecure (HTTPS-only flag), and isHttpOnly flag. Use cookie.getName(), getValue(), getDomain(), getExpiry() to read these attributes in your tests.
What is Shadow DOM?
Shadow DOM is a browser feature that encapsulates a component's internal HTML, CSS, and JS from the main document tree. Elements inside a shadow root are invisible to standard Selenium findElement calls β regular XPath and CSS selectors cannot pierce shadow boundaries.
Shadow Host
The normal DOM element that contains the shadow root. You locate this using standard findElement β it is visible to Selenium.
Shadow Root
The root of the encapsulated DOM tree. In Selenium 4+, use element.getShadowRoot() to get it. In Selenium 3, use JavascriptExecutor.
Shadow DOM findElement
Once you have the shadow root, call shadowRoot.findElement() using CSS selectors only β XPath does not work inside Shadow DOM.
Nested Shadow Roots
Shadow DOMs can be nested β get the outer shadow root, find the inner shadow host inside it, then get the inner shadow root and find your target element.
Dynamic Element Challenges
| Problem | Cause | Solution |
|---|---|---|
StaleElementReferenceException | DOM was updated after element was found β reference is stale | Re-locate the element; use retry logic or ExpectedConditions |
| Dynamic IDs | Framework generates different IDs on each page load | Use stable attributes: name, class, data-*, partial text, or parent-child XPath |
| Loading Spinner | Element behind spinner is not clickable yet | Wait for spinner to disappear: invisibilityOfElementLocated |
| AJAX Content | Element not in DOM yet when test runs | Explicit wait with visibilityOfElementLocated |
| Lazy-loaded content | Elements load only after scrolling | Scroll element into view with JS, then wait |
Shadow DOM β Selenium 4 API
// Step 1: Find the shadow host element (visible in main DOM) WebElement shadowHost = driver.findElement(By.cssSelector("my-custom-element")); // Step 2: Get the shadow root (Selenium 4+ native API) SearchContext shadowRoot = shadowHost.getShadowRoot(); // Step 3: Find elements inside shadow DOM using CSS only WebElement inputInsideShadow = shadowRoot.findElement( By.cssSelector("input[name='search']")); inputInsideShadow.sendKeys("Selenium"); // Nested shadow roots WebElement innerHost = shadowRoot.findElement(By.cssSelector("inner-component")); SearchContext innerRoot = innerHost.getShadowRoot(); innerRoot.findElement(By.cssSelector("button.submit")).click();
Shadow DOM β JavascriptExecutor (Selenium 3)
JavascriptExecutor js = (JavascriptExecutor) driver; // Access shadow root via JS WebElement shadowHost = driver.findElement(By.cssSelector("my-element")); WebElement shadowRoot = (WebElement) js.executeScript( "return arguments[0].shadowRoot", shadowHost); // Find inside shadow DOM WebElement el = (WebElement) js.executeScript( "return arguments[0].shadowRoot.querySelector('input')", shadowHost); el.sendKeys("test");
StaleElementReferenceException β Retry
public void clickWithRetry(By locator, int maxAttempts) { for (int i = 0; i < maxAttempts; i++) { try { driver.findElement(locator).click(); return; // success β exit } catch (StaleElementReferenceException e) { if (i == maxAttempts - 1) throw e; // rethrow on last attempt } } }
Wait for Spinner to Disappear
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(15)); // Wait for spinner to disappear wait.until(ExpectedConditions .invisibilityOfElementLocated(By.cssSelector(".loading-spinner"))); // Now the content is ready β interact safely wait.until(ExpectedConditions .visibilityOfElementLocated(By.id("results"))).click(); // Handle dynamic ID β use contains() or data attributes By.xpath("//button[contains(@id,'submit')]") By.cssSelector("button[data-action='submit']")
What is Shadow DOM and why does Selenium struggle with it?
Shadow DOM is a browser feature that encapsulates a component's internal markup inside a shadow root, isolated from the main document. Standard Selenium findElement calls can't pierce the shadow boundary β they only search the main DOM tree, making shadow elements invisible to regular locators.
How do you access Shadow DOM elements in Selenium 4?
Find the shadow host with a normal findElement, then call shadowHost.getShadowRoot() to get a SearchContext. Use that to call findElement with CSS selectors. Note: XPath is NOT supported inside Shadow DOM β CSS only.
What is StaleElementReferenceException?
It is thrown when a WebElement reference is no longer valid β the DOM has been modified (refreshed, updated via AJAX, re-rendered) after the element was located. The fix is to re-locate the element fresh, or use a retry loop around the interaction.
How do you handle elements with dynamic IDs?
Use stable attributes that don't change: name, type, class (partial), custom data-* attributes, visible text (contains(text(),'...')), or a parent-child XPath that navigates from a stable ancestor. Never build locators on auto-generated numeric IDs.
How do you handle a loading spinner that blocks interaction?
Use WebDriverWait with ExpectedConditions.invisibilityOfElementLocated(spinnerLocator) to wait for the spinner to disappear before interacting with the page content. This is far more reliable than Thread.sleep() because it returns as soon as the spinner is gone.
Best Practices for Selenium
Common Issues & Solutions
StaleElementReferenceException
Cause: DOM refreshed after element reference captured. Solution: Re-locate element after DOM changes or use wait strategies to ensure stable element reference.
NoSuchElementException
Cause: Element not found or not yet loaded. Solution: Implement WebDriverWait with ExpectedConditions.presenceOfElementLocated() before interaction.
Tests fail due to timing
Cause: Elements interact before page fully loads. Solution: Use explicit waits for visibility, clickability, or specific text presence before actions.
Cross-browser compatibility issues
Cause: Different browser behaviors. Solution: Test on Chrome, Firefox, Edge using Selenium Grid or parallel execution in CI/CD.
Frequently Asked Questions
What's the difference between implicit and explicit waits?
Implicit: Global setting that applies to all element searches. Explicit: Specific wait for specific elements using WebDriverWait. Explicit waits are recommended for better control.
How do I run Selenium tests in parallel?
Use TestNG with parallel="tests" or parallel="methods" in testng.xml. Configure driver instances per thread using ThreadLocal pattern.
Can I run Selenium tests on mobile browsers?
Use Appium (built on Selenium WebDriver) for mobile automation. For desktop browsers on different OS, use Selenium Grid.
How do I handle Shadow DOM in Selenium?
Shadow DOM elements aren't directly accessible. Solution: Use JavaScriptExecutor to pierce Shadow DOM or use browser-specific capabilities.
What's the best way to handle dynamic elements?
Use relative XPath, CSS selectors with data attributes, or wait for element presence/visibility before interacting.
How do I capture screenshots on test failure?
Implement a TestListener with onTestFailure() method to capture screenshots using TakesScreenshot interface.