DFLib, a Java
DataFrame

Home of open source Java data tools: a DataFrame library, charts and visualization, Jupyter kernel.

To get started, follow DFLib documentation

DFLib DataFrame

DFLib is a lightweight, pure Java, in-memory DataFrame. It provides a unique data manipulation API that allows to process data incrementally in a highly-composable manner. Unlike "big data" DataFrames, DFLib does not require special infrastructure and can be used in any Java application. Out of the box, DFLib supports loading and saving to CSV, Excel, RDBMS, Avro, Parquet, and JSON formats, and works with a variety of data sources (local filesystem, web services, zip archives, etc.). DFLib is fully open source and is distributed under the Apache License.

What is "DataFrame"?

DataFrame is a 2-dimensional table-like data structure in a programming language with data processing capabilities similar to SQL: filtering, joins, unions, window functions, columns / rows manipulation, and so on. DataFrames are very common in other languages and ecosystems (Python pandas, Spark, R, etc.), but are still quite a rarity in Java.

Who and Why Should Use DFLib?

The short answer is "any Java programmer for any data processing task". More specifically, there are two broad categories of use cases. First is data engineering, i.e. building software (often called "data pipelines") to extract and transform data from internal or external sources. ETL, data aggregation, loading Excel files into your database all fall into this category.

Second is data analysis that has traditionally been a realm of Python. DFLib opens it up to Java developers. With support for Jupyter notebooks (based on our own Jupyter Kernel), and a charting solution, we provide the entire toolkit needed for interactive and visual data work.

DFLib JJava, a Java Jupyter Kernel

Jupyter notebook is the most popular interactive environment for working with data among scientists, data engineers and educators. It can run in the browser and is also integrated in common IDEs, like VSCode and Intellij IDEA. DFLib provides a kernel (called "JJava") to work in Jupyter using Java language. It perfectly complements the DFLib DataFrame, but the kernel itself doesn't depend on the DataFrame library and can be used with any Java code.

Charts and Dashboards

An important part of data work is visual analysis and presentation. DFLib integrates with Apache ECharts to provide a Java charting and dashboards solution. A chart can be programmed in Java by applying a simple fluent API to a DataFrame. Charts are generated by DFLib as chunks of JavaScript and can either be displayed directly in Jupyter notebooks or embedded in HTML pages. Additionally, DFLib allows to create full HTML pages with multiple charts (i.e. "dashboards") using built-in or custom page templates.

Documentation Charts Forum GitHub