Reasons to avoid RandomStringUtils for test data generation

How to use the Factory design pattern to create browser instances: the simple approach

March 14, 2021

The best way to add a Request Body to a POST request using Rest-Assured

November 28, 2021

Published by Elias on October 10, 2021

Which problem do we want to solve?

We will try to solve one of the worst practices used on tests at any level: fixed/hard-coded data.

I want to avoid as much as possible any manual pre-action before I can run my tests and, because of that, I try to avoid as well the usage of static files (CSV, TXT, XLS, JSON).

Here we will see a common usage from Java developers: the RamdomStringUItils and how it might not be the best choice for automatic data generation.

By the way, I recommend automatic data generation in the tests using the Test Data Factory approach, and you can find an example here in my blog: Test Data Factory: Why and How to Use.

The examples described here are simple, without the usage of the Test Data Factory, and will show you why the RandomStringUtils might not be the best approach.

Example

We will automatically generate data for a Customer object with the following criteria

Attribute	Type	Constraints
id	int	Not null
name	String	Not null and size between 2 and 50 characters
profession	String	Not null and size between 2 and 30 characters
accountNumber	String	Not null and size as 18 characters
address	String	Not null and size between 2 and 50 characters
phoneNumber	String	Not null and size between 11 and 14 characters
birthday	Date	Not null

Customer object

To reduce the number of tests, the key point is to generate valid data given the constraints. In a professional environment, we would implement the tests for the edge cases as well.

Think about this Customer data as an object used in any test level (unit, integration, service, UI).

What does the RandomStringUtils class do?

RandomStringUtils is a class from the Apache Commons Lang library that generates random Strings based on different conditions like:

length
letters
numbers
alphanumeric
ASCII
numeric
print

It’s a static class where you can directly generate any String, so it’s super handy!

See the example below, where you can generate a different set of random data.

public class RandomStringUtilsExample {

    public static void main(String[] args) {
        // returns a String with 5 numbers
        // example 82114
        RandomStringUtils.randomNumeric(5);

        // returns an alphanumeric String with length as 30 mixing upper and lower cases
        // example gQ6RB8MiwKOg9O3qnHFo7I3jilHoIy
        RandomStringUtils.randomAlphanumeric(30);
    }
}

What is the result of using RandomStringUtils class?

Let’s first take a look at the code example implementing the usage of RamdonStringUtils:

line 7 uses the RandomStringUtils.randomNumeric() method to generate an int value and, to make it, possible we are parsing the String into Int using Integer.valueOf()
lines 8 to 12 use RandomStringUtils.randomAlphanumeric() to generate alphanumeric data
line 13 has a fixed date as now (today) because RandomStringUtils generates only Strings

class BasicExampleTest {

    @Test
    @DisplayName("Data validations using RandomStringUtils")
    void randomStringUtils() {
        CustomerData customerData = CustomerData.builder().
                id(Integer.valueOf(RandomStringUtils.randomNumeric(10))).
                name(RandomStringUtils.randomNumeric(50)).
                profession(RandomStringUtils.randomAlphanumeric(30)).
                accountNumber(RandomStringUtils.randomAlphanumeric(18)).
                address(RandomStringUtils.randomAlphanumeric(50)).
                phoneNumber(RandomStringUtils.randomAlphanumeric(14)).
                birthday(new Date()).
                build();
    }
}

The output of the test execution, if we print or inspect the customerData object, is:

{
  "id": 1335130963,
  "name": "GGXS19kN6kSuzHwW6T0YjJCxUaIyKKmAaUdQH51gdUAtt1TwqY",
  "profession": "0kk8HSiFgCUVfLzbD3PyR6cn8j0LH3",
  "accountNumber": "PqvekXb9ekRAJi3ypy",
  "address": "90lqP2LHnQMWtmMP8vasO3BR5dsICIL85u5sJ0yjGKWXxCkFsj",
  "phoneNumber": "OpoJ3tOE53woy9",
  "birthday": "Sep 26, 2021, 10:01:10 PM"
}

We could successfully generate the necessary data! Yay!

What does DataFaker do?

DataFaker is an open-source library based on (actually an improvement of) DataFaker to generate fake data.

I invite you to take a look at the GitHub repo and see the different objects to generate data.

What is the result of using DataFaker?

The code implementation to generate data using the CustomerData class is:

in line 9, the number() method is in use to generate a random number
in line 10, the name() method is in use to generate a full name
in line 11, the company() is in use to generate a profession
in line 12, the finance() method is in use to generate a valid IBAN for the Netherlands country
in line 13, the address() method is in use to generate a full street address
in line 14, the phoneNumber() method is in use to generate a cell phone number
in line 15, the date() method is in use to generate birthday data for the age between 18 and 90

class BasicExampleTest {

    @Test
    @DisplayName("Data validations using faker library")
    void faker() {
        Faker faker = new Faker();

        CustomerData customerData = CustomerData.builder().
                id((int) faker.number().randomNumber()).
                name(faker.name().name()).
                profession(faker.company().profession()).
                accountNumber(faker.finance().iban("NL")).
                address(faker.address().streetAddress()).
                phoneNumber(faker.phoneNumber().cellPhone()).
                birthday(faker.date().birthday(18, 90)).
                build();
    }
}

The output of the test execution, if we print or inspect the customerData object, is:

{
  "id": 520543,
  "name": "Tena Pagac",
  "profession": "photographer",
  "accountNumber": "NL07HUUN1518167413",
  "address": "12672 Romaguera Tunnel",
  "phoneNumber": "(561) 638-5813",
  "birthday": "Mar 5, 1982, 10:29:18 AM"
}

We could successfully generate the necessary data! But let’s not focus on the differences.

Comparing both approaches

There are two aspects I would like to consider to choosing between one approach or another:

legibility of future troubleshooting (log analysis)
easy data creation with different criteria

We can see the main differences by comparing the data results side by side (click on the image to expand it):

Legibility of future troubleshooting (analysis)

The regular activity for an engineer who writes code is troubleshooting: we constantly see the logs and debug the application to understand current and future problems in the code.

Now imagine yourself looking at the CustomerData object where the data was filled in with the RandomStringUtils approach: it’s hard to correlate the data you have with a list of objects you might get or even take a look at the data used inside a log file.

Easy data creation in different criteria

For most of the attributes present in the CustomerData class, you can use RandomStringUtils to generate the different criteria. For example, you can easily set 51 characters to the name attribute and expect a failing constraint validation using RandomStringUtils.randomAlphanumeric(51);

For more specialized data, like phone number and date you need a proper library, and DataFaker can generate both data.

In this way, we can make the process easier by adopting one library.

Considerations

Of course, I’d put more emphasis on the DataFaker library because we have almost everything we need to generate data, but it does not exclude a possible necessity to use the RandomStringUtils class or any other class placed in the Apache Commons library.

The main consideration here is the ability to generate all the possible data you need using a single source of truth without reinventing the wheel, as well as the indirect benefits it will show during the troubleshooting process.

Examples

The avoid-random-string-utils project shows a basic example comparing RandomStringUtils vs DataFaker.

The restassured-complete-basic-example project has a factory data class to generate all the necessary data in different conditions. It’s a good real-world example.

Elias

Java Champion, Senior Principal Software Engineer, Oracle ACE for Java, Java Magazine NL Editor, Browserstack Champion.

Reasons to avoid RandomStringUtils for test data generation

How to use the Factory design pattern to create browser instances: the simple approach

The best way to add a Request Body to a POST request using Rest-Assured

How to use the Factory design pattern to create browser instances: the simple approach

The best way to add a Request Body to a POST request using Rest-Assured

Which problem do we want to solve?

Example

What does the RandomStringUtils class do?

What is the result of using RandomStringUtils class?

What does DataFaker do?

What is the result of using DataFaker?

Comparing both approaches

Legibility of future troubleshooting (analysis)

Easy data creation in different criteria

Considerations

Examples

Elias

Related posts

JUnit 5 – When to use CSV Providers

One of the most underrated Maven configurations: maven.config

JUnit 5 – When to use ArgumentsSource

Leave a Reply Cancel reply