Spring Batch Example - Read and Process CSV File to MySQL Database
Last Updated :
28 Apr, 2025
GeekMart is a multinational technology company that focuses on the e-commerce business. It has a large number of suppliers who supply huge volumes of products to them. Suppliers share the product details in CSV file format. Help GeekMart to read the product details from the CSV file and store it in the “Product” table of MySQL database.
- Requirement 1: Read the records from the Products.csv file and store the data in the Products table
- Requirement 2: Do not store the products in the database if the quantity of the product is zero.
- Requirement 3: Before storing the product details in the database, increase the price of the product. Increase the price by 2%.
- Requirement 4: Skip all the products whose format is not as expected. A maximum of 20 records can be skipped by the application. If more than 20 products are not in the proper format, the job should fail.
- Requirement 5: If any records are skipped from writing into the database, display all such product details on the console at the end of the job.
Example
Input:
ProductId | Name | Category | Quantity | Price |
---|
P1 | Samsung Galaxy | Mobile | 10 | 19999 |
---|
P2 | Samsung Galaxy Note 8 | Mobile | 8 | 38999 |
---|
P3 | MacBook Pro | Laptop | 5 | 209999 |
---|
P4 | MacBook Pro Air | Laptop | 4 | 109999 |
---|
P5 | Sony Bravia | TV | 25 | 63690 |
---|
P6 | LG Ultra HD | TV | 20 | 37990 |
---|
(Sample Products.csv file)
Output: The spring batch should create a job that reads data from the Products.csv file, process it & filters records as per the requirements, and finally insert it into the Products tables in MySQL.
Approach: It follows a batch processing approach using the Spring Batch framework to read the data from the Products.csv file, apply some transformations to the data, and then store the data in the Products table of a MySQL database.
Here is a high-level overview of the approach used in the code:
- Configure Job and Step: Spring Batch provides a configuration mechanism to define jobs and steps. In this code, the ProductJob is defined as the job that reads the data from the CSV file, processes it, and stores it in the database. The ProductStep is defined as a step within the job that defines the reader, processor, and writer for the batch job.
- Define Reader: Spring Batch provides multiple readers for different file formats. In this code, FlatFileItemReader is used to read the data from the Products.csv file. The reader reads the data line-by-line and maps each line to a Product object.
- Define Processor: The processor is responsible for processing the data before writing it into the database. In this code, the processor applies two transformations to the data: it increases the price of each product by 2%, and it skips the products whose quantity is zero.
- Define Writer: Spring Batch provides multiple writers for different types of databases. In this code, JdbcBatchItemWriter is used to write the data into the MySQL database. The writer takes a list of Product objects and writes them to the database using JDBC.
- Define Skip Policy: In this code, a custom skip policy is defined to skip the products whose format is not as expected. If more than 20 products are not in the proper format, the job fails.
- Define Listener: Spring Batch provides multiple listeners for different events in a batch job. In this code, a custom listener is defined to display the skipped product details on the console.
Overall, this approach is a standard and effective way of processing large volumes of data in batches, where data can be read, processed, and written in chunks, making it suitable for handling large amounts of data with limited memory resources.
To fulfill the above requirements, we can use Spring Batch to read the CSV file, process the records and write them into the MySQL database. Here is a sample Spring Batch job configuration for the same:
Java
// Configuration file
@Configuration
@EnableBatchProcessing
public class ProductBatchConfiguration {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
private final DataSource dataSource;
public ProductBatchConfiguration(
JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
DataSource dataSource)
{
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
this.dataSource = dataSource;
}
@Bean
public Job importProductJob(
JobCompletionNotificationListener listener)
{
return jobBuilderFactory.get("importProductJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(productStep())
.end()
.build();
}
@Bean public Step productStep()
{
return stepBuilderFactory.get("productStep")
.<Product, Product>chunk(10)
.reader(productItemReader())
.processor(productItemProcessor())
.writer(productItemWriter())
.faultTolerant()
.skipLimit(20)
.skip(CsvFormatException.class)
.listener(new ProductSkipListener())
.build();
}
@Bean
public FlatFileItemReader<Product> productItemReader()
{
return new FlatFileItemReaderBuilder<Product>()
.name("productItemReader")
.resource(new ClassPathResource("Products.csv"))
.delimited()
.delimiter(",")
.names(new String[] { "productId", "name",
"category", "quantity",
"price" })
.linesToSkip(1)
.fieldSetMapper(
new BeanWrapperFieldSetMapper<Product>() {
{
setTargetType(Product.class);
}
})
.build();
}
@Bean
public ItemProcessor<Product, Product>
productItemProcessor()
{
return new ItemProcessor<Product, Product>() {
@Override
public Product process(Product product)
throws Exception
{
if (product.getQuantity() == 0) {
return null; // do not store if quantity
// is zero
}
product.setPrice(
product.getPrice()
* 1.02); // increase price by 2%
return product;
}
};
}
@Bean
public JdbcBatchItemWriter<Product> productItemWriter()
{
return new JdbcBatchItemWriterBuilder<Product>()
.itemSqlParameterSourceProvider(
new BeanPropertyItemSqlParameterSourceProvider<>())
.sql(
"INSERT INTO Product (productId, name, " "category, quantity, price) VALUES " "(:productId, :name, :category, :quantity, " ":price)")
.dataSource(dataSource)
.build();
}
}
// this code is contributed by prophet1999
Here, we have defined a Spring Batch job named importProductJob with a single step named productStep. The job reads the records from the Products.csv file using a FlatFileItemReader, processes each record using an ItemProcessor, and writes the valid records into the MySQL database using a JdbcBatchItemWriter.
In the productStep, we have also defined fault-tolerant behavior by setting the skipLimit to 20 and handling CsvFormatException using the skip method. Additionally, we have added a ProductSkipListener to capture all the skipped records.
We also have a Product model class defined with getters and setters for each attribute in the CSV file. Finally, we have a JobCompletionNotificationListener to print the job execution status at the end of the job. You can run this Spring Batch job using a JobLauncher instance as follows:
Java
@Autowired
JobLauncher jobLauncher
Complexity Analysis
- Reading from CSV file: The time complexity of reading from a CSV file using the FlatFileItemReader is O(n), where n is the number of lines in the file. The space complexity of the reader is also O(n), as it reads one line at a time and stores it in memory.
- Processing: The time complexity of the processing step is O(n), where n is the number of products. The processing step applies two transformations to the data: it increases the price of each product by 2%, which is a constant-time operation, and it skips the products whose quantity is zero, which is also a constant-time operation.
- Writing to the database: The time complexity of writing to the database using the JdbcBatchItemWriter is O(n), where n is the number of products. The space complexity of the writer is also O(n), as it stores the products in memory before writing them to the database.
- Skip policy: The time complexity of the skipping policy is O(1), as it checks if the number of skipped records exceeds 20, which is a constant-time operation.
- Listener: The time complexity of the listener is O(k), where k is the number of skipped products. The space complexity of the listener is also O(k), as it stores the skipped products in memory before displaying them on the console.
- Overall, the time complexity of the code is O(n), where n is the number of products,
- The space complexity is also O(n), as it stores the products in memory before writing them to the database. However, the space complexity of the code could increase to O(n+k), where k is the number of skipped products if a large number of products are skipped.
Similar Reads
Spring Boot Batch Processing Using Spring Data JPA to CSV File The Spring Batch is a framework in the Spring Boot ecosystem It can provide a lot of functionalities for Batch processing. The Spring Batch framework simplifies the batch development of applications by providing reliable components and other patterns for common batch processing concerns. Mostly, bat
7 min read
Spring Data JPA @Query Annotation with Example Spring Data JPA or JPA stands for Java Persistence API, so before looking into that, we must know about ORM (Object Relation Mapping). So Object relation mapping is simply the process of persisting any Java object directly into a database table. @Query Annotation is used for defining custom queries
7 min read
CRUD Operation in MySQL Using PHP, Volley Android - Read Data In the previous article, we have performed the insert data operation. In this article, we will perform the Read data operation. Before performing this operation first of all we have to create a new PHP script for reading data from SQL Database. Prerequisite: You should be having Postman installed i
9 min read
Data Binding in Spring MVC with Example Data Binding, as the name itself, is a self-explanatory word. In data binding what we have to do is we have to capture or store the data so that we can bind that data with another resource (for example displaying the data in the frontend part) as per our needs or we can also read the data from a var
8 min read
Spring Boot - CRUD Operations Using Redis Database Redis is an in-memory data structure that is used for faster access to data. It is used to store data that needs to be accessed frequently and fast. It is not used for storing large amounts of data. If you want to store and retrieve large amounts of data you need to use a traditional database such a
8 min read
Two-Way Data Binding in Spring MVC with Example Data Binding, as the name itself, is a self-explanatory word. In data binding what we have to do is we have to capture or store the data so that we can bind that data with another resource (for example displaying the data in the frontend part) as per our needs or we can also read the data from a var
7 min read
Java Servlet and JDBC Example | Insert data in MySQL Prerequisites: Servlet, JDBC Connectivity To start with interfacing Java Servlet Program with JDBC Connection: Proper JDBC Environment should set-up along with database creation. To do so, download the mysql-connector.jar file from the internet, As it is downloaded, move the jar file to the apache-t
4 min read
Inserting Single and Multiple Records in MySQL using Java Java Database Connectivity is an API that helps us to connect a Java program to any SQL Database Instance. This connection gives power to developers to programmatically insert, delete, or update data from the database. JDBC contains in-built methods to run queries and updates on the database. In thi
6 min read
Spring Boot Integration With MySQL as a Maven Project Spring Boot is trending and it is an extension of the spring framework but it reduces the huge configuration settings that need to be set in a spring framework. In terms of dependencies, it reduces a lot and minimized the dependency add-ons. It extends maximum support to all RDBMS databases like MyS
4 min read
Spring Data JPA - Delete Records From MySQL Spring Boot simplifies database operations using Spring Data JPA, which provides built-in methods for CRUD (Create, Read, Update, Delete) operations. In modern application development, data manipulation is a critical task, and Java Persistence API (JPA) simplifies this process. Java persistence API
4 min read