Spring Batch and MongoDB — Reading and Writing from Excel file

In this post, we’ll explain how to read and write an Excel file with Spring Batch using a custom reader and writer.

We are going to create a simple Spring boot application that demonstrates how to use Spring batch with mongoDB.

Prerequisites

  • Spring Boot 2.4
  • Maven 3.6.1.
  • JAVA 8.
  • Mongo 4.4.
  • Apache poi 4.1.2

Spring Batch Overview

Spring batch is a Spring Framework library that processes large amounts of data without human intervention. Spring Batch has support for CSV, XML files, but natively doesn’t support XLSX files. It is not designed for use as a scheduling framework. However, it can be used in combination with a scheduling framework such as Quartz, Control-M, etc.

You can achieve storing data to XLSX with support of Apache POI SXSSF API.

Getting Started

As an example, we will create a Spring Batch job that can import employee information from an Excel file. This file contains information like firstname, lastname, number, email address and department. When we read the employee information from an Excel file, we have insert in mongodb when employee not exist.

Employee file excel sample

Create Spring boot project and add the following dependencies in pom.xml

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>4.1.2</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>4.1.2</version>
</dependency>

Project Structure

Here is our project structure:

To get started, we need a model class. For this tutorial, we have an Employee model class.

@Data
@AllArgsConstructor
@NoArgsConstructor
@Accessors(chain = true)
@Document(collection = "employee")
public class Employee  {

    @Id
    private String id;

    private String firstName;

    private String lastName;

    @NotNull
    @Indexed(unique = true, direction = IndexDirection.DESCENDING)
    private String number;

    private String email;

    private String department;

    private double salary;
}

The @Document annotation marks a class as being a domain object that we want to persist in the database.

@Repository
public interface EmployeeRepository extends MongoRepository<Employee, String> {
}

Apache POI is a Java library that allows reading and writing Microsoft Documents. It supports Excel, Word, PowerPoint, and even Visio. We have created an abstract class which is responsible for reading and writing to an Excel file

Create a BatchConfiguration.java

Batch configuration class annotated it with @Configuration and @EnableBatchProcessing. The @EnableBatchProcessing enables Spring Batch features and provide a base configuration for setting up batch jobs in an @Configuration class.

/**
 * Configuration for batch
 */
@EnableBatchProcessing
@Configuration
public class BatchConfiguration {

    public final JobBuilderFactory jobBuilderFactory;

    public final StepBuilderFactory stepBuilderFactory;

    public BatchConfiguration(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
        this.jobBuilderFactory = jobBuilderFactory;
        this.stepBuilderFactory = stepBuilderFactory;
    }

    @Bean
    public JobParametersValidator jobParametersValidator() {
        return new EmployeeJobParametersValidator();
    }

    @Bean
    public JobParametersValidator compositeJobParametersValidator() {
        CompositeJobParametersValidator bean = new CompositeJobParametersValidator();
        bean.setValidators(Collections.singletonList(jobParametersValidator()));
        return bean;
    }

    @Bean
    public ItemProcessor<Employee, Employee> itemProcessor() {
        return new EmployeeItemProcessor();
    }

    @Bean
    public ItemReader<Employee> itemReader() {
        return new EmployeeItemReader();
    }

    @Bean
    public MongoItemWriter<Employee> writer(MongoTemplate mongoTemplate) {
        return new MongoItemWriterBuilder<Employee>().template(mongoTemplate).collection("employee")
                .build();
    }


    /**
     * Declaration step
     * @return {@link Step}
     */
    @Bean
    public Step employeeStep(MongoItemWriter<Employee> itemWriter) {
        return stepBuilderFactory.get("employeeStep")
                .<Employee, Employee>chunk(50)
                .reader(itemReader())
                .processor(itemProcessor())
                .writer(itemWriter)
                .build();
    }



    /**
     * Declaration job
     * @param listener {@link JobCompletionListener}
     * @return {@link Job}
     */
    @Bean
    public Job employeeJob(JobCompletionListener listener, Step employeeStep) {
        return jobBuilderFactory.get("employeeJob")
                .incrementer(new RunIdIncrementer())
                .listener(listener)
                .flow(employeeStep)
                .end()
                .validator(compositeJobParametersValidator())
                .build();
    }

}

EmployeeItemReader is the custom class that allows us to read the contents of the Excel file. It extends AbstractExcelPoi and implements ItemReader and StepExecutionListener interfaces. This class populates the fields of the created Employee object by using the EmployeeItemRowMapper class.

EmployeeItemProcessor.java is the implementation class that should implement the ItemProcessor<I, O> interface of the Spring Batch framework.
In our case, it is responsible for checking if the employee number already exists in the database before inserting. At the end of the job, it inserts all the data that could not be saved in an Excel result file to inform the user.

Once the data is processed, the writer bean saves the data to mongoDB.

Our employee Job is defined as follows:

Note that we have added a compositeJobParametersValidator() bean as a validator that allows you to check if the input file type of the job is in Excel format.

Run the Application

For this example, we’ve added a scheduler that starts the job every 2 minutes.

Conclusion

That’s it.
Thank you for taking the time to read this.

The source file can be found here

Related Posts