Skip to content

A clean and intuitive wrapper for DuckDB in FreePascal, providing easy database operations and DataFrame-like result handling.

License

Notifications You must be signed in to change notification settings

ikelaiah/duckdb-fp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

59 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦† DuckDB for FreePascal: An Intuitive Database Wrapper

A simple interface to work with DuckDB in FreePascal applications, featuring a DataFrame-like structure for handling query results similar to R or Python pandas.

πŸ“š Table of Contents

⚠️ Work in Progress

This project is currently under active development. Do expect bugs, missing features and API changes.

Current development focus:

  • Better DataFrame functionality
  • More examples

Last tested with:

  • FreePascal 3.2.2
  • DuckDB 1.1.2
  • Lazarus 3.6
  • Win 11

πŸš€ Getting Started with DuckDB for FreePascal

This guide will help you get started with the DuckDB FreePascal wrapper, covering the most common use cases.

πŸ“‹ Prerequisites

  • FreePascal 3.2.2 or later
  • Lazarus 3.6 (to run examples and tests)
  • DuckDB DLL v1.1.2 or later

πŸ”§Installation

  1. Add these files to your project:

    • src/DuckDB.Wrapper.pas
    • src/DuckDB.DataFrame.pas
    • src/DuckDB.SampleData.pas (optional, for sample datasets)
    • src/libduckdb.pas
  2. Ensure the DuckDB library (DLL/SO) is in your application's path

  3. Add the required units to your project:

uses
  DuckDB.Wrapper, DuckDB.DataFrame;

πŸ†• Getting Started from Scratch

Create a new DataFrame with custom columns and add data:

var
  DuckFrame: TDuckFrame;
begin
  // Create DataFrame with specified columns and types
  DuckFrame := TDuckFrame.CreateBlank(['Name', 'Age', 'City'],
                                    [dctString, dctInteger, dctString]);
  try
    // Add rows
    DuckFrame.AddRow(['John', 25, 'New York']);
    DuckFrame.AddRow(['Alice', 30, 'Boston']);
    
    // Display the DataFrame
    DuckFrame.Print;
    
    // Save to CSV if needed
    DuckFrame.SaveToCSV('output.csv');
  finally
    DuckFrame.Free;
  end;
end;

πŸ¦† Getting Started with DuckDB Tables

Connect to a DuckDB database and query existing tables:

var
  DB: TDuckDBConnection;
  Frame: TDuckFrame;
begin
  DB := TDuckDBConnection.Create;
  try
    // Connect to database (use ':memory:' for in-memory database)
    DB.Open('mydata.db');
    
    // Query existing table
    Frame := DB.Query('SELECT * FROM my_table');
    try
      // Basic operations
      Frame.Print;                    // Display data
      Frame.Describe;                 // Show statistical summary
      Frame.Info;                     // Show structure info
      
      // Basic analysis
      WriteLn('Row count: ', Frame.RowCount);
      
      // Access specific values
      WriteLn(Frame.ValuesByName[0, 'column_name']);
      
    finally
      Frame.Free;
    end;
  finally
    DB.Free;
  end;
end;

πŸ“„ Getting Started with CSV Files

Load data from CSV files and analyze it:

var
  DF: TDuckFrame;

begin
  // Basic usage - default settings (has headers, comma delimiter)
  DF := TDuckFrame.CreateFromCSV('data.csv');
  try
    DF.Print;  // Display the data
    WriteLn;

    DF.Describe; // Show summary statistics
    WriteLn;

    DF.UniqueCounts('country').Print; // Show unique counts of country
  finally
    DF.Free;
  end;
end.

πŸ“„ Working with Parquet Files

Create a DataFrame from Parquet files:

var
  Frame: TDuckFrame;
begin
  // Load single Parquet file
  Frame := TDuckFrame.CreateFromParquet('data.parquet');
  try
    Frame.Print;  // Display the data
  finally
    Frame.Free;
  end;
end;

Load multiple Parquet files as a single DataFrame:

var
  Frame: TDuckFrame;
  Files: array of string;
begin
  SetLength(Files, 3);
  Files[0] := 'data1.parquet';
  Files[1] := 'data2.parquet';
  Files[2] := 'data3.parquet';
  
  // Load multiple files
  Frame := TDuckFrame.CreateFromParquet(Files);
  try
    WriteLn('Total rows: ', Frame.RowCount);
    Frame.Print;
  finally
    Frame.Free;
  end;
end;

πŸ” Common Operations

πŸ“Š Analyzing Data

// Statistical summary
Frame.Describe;

// Structure information
Frame.Info;

// Missing value analysis
Frame.NullCount.Print;

// First/last rows
Frame.Head(5).Print;  // First 5 rows
Frame.Tail(5).Print;  // Last 5 rows

// Correlation analysis
Frame.CorrPearson.Print;  // Pearson correlation
Frame.CorrSpearman.Print; // Spearman correlation

πŸ”— Combining DataFrames

var
  Combined: TDuckFrame;
begin
  // Union with duplicate removal
  Combined := Frame1.Union(Frame2);
  
  // Union keeping all rows
  Combined := Frame1.UnionAll(Frame2);
  
  // Union modes:
  // umStrict - Requires exact column match
  // umCommon - Only common columns
  // umAll    - All columns (NULL for missing)
  Combined := Frame1.Union(Frame2, umCommon);
end;

🚨 Error Handling

Always use try-finally blocks and handle exceptions:

try
  // Your DuckDB operations here
except
  on E: EDuckDBError do
    WriteLn('DuckDB Error: ', E.Message);
  on E: Exception do
    WriteLn('Error: ', E.Message);
end;

🚢 Next Steps

  • Check the examples folder for more detailed examples
  • Read the API documentation for other features
  • Check the TESTING.md file for information on how to run and maintain the test suite

πŸ“‘ API Reference

✨Features

  • Native DuckDB Integration πŸ¦†

    • Seamlessly connect and interact with DuckDB databases for efficient data processing.
  • Comprehensive DataFrame Operations πŸ“Š

    • Perform operations similar to pandas (Python) or data frames (R), enabling intuitive data manipulation.
  • File Handling πŸ“

    • CSV Files:
      • Read CSV files with automatic type inference (TDuckDBConnection.ReadCSV).
      • Save DataFrames to CSV with RFC 4180 compliance (TDuckFrame.SaveToCSV).
    • Parquet Files:
      • Load single or multiple Parquet files into a DataFrame (TDuckFrame.CreateFromParquet).
  • Data Analysis Capabilities πŸ”

    • Basic Statistics:
      • Calculate mean, standard deviation, minimum, maximum, and quartiles (Describe).
    • Correlation Analysis:
      • Compute Pearson (CorrPearson) and Spearman (CorrSpearman) correlation matrices.
    • Frequency Counts:
      • Generate frequency counts of unique values (ValueCounts).
      • Count the number of unique entries in a column (UniqueCounts).
    • Missing Value Handling:
      • Remove rows with any null values (DropNA).
      • Fill null values with a specified value (FillNA).
  • DataFrame Combination Techniques πŸ”—

    • Join Operations:
      • Perform joins with another DataFrame based on different join modes (Join).
    • Union Operations:
      • Combine DataFrames and remove duplicates (Union).
      • Combine DataFrames while keeping all rows, including duplicates (UnionAll).
      • Remove duplicate rows (Distinct).
    • Flexible Union Modes:
      • umStrict: Requires an exact match of column names and types.
      • umCommon: Includes only columns that exist in both DataFrames.
      • umAll: Includes all columns, filling missing values with NULL where necessary.
  • Data Access Methods πŸ”‘

    • Retrieve column information by index or name (GetColumn, GetColumnByName).
    • Access and modify data using row and column indices or names (Values, ValuesByName, SetValue).
    • Find the index of a column by its name (FindColumnIndex).
  • Data Manipulation Methods πŸ› οΈ

    • Row Operations:
      • Clear all data from the DataFrame (Clear).
      • Add new rows with specified values (AddRow).
    • Column Operations:
      • Add new columns with specified names and types (AddColumn).
      • Rename existing columns (RenameColumn).
      • Select and retain specific columns (Select).
    • Filtering and Sorting:
      • Filter rows based on column values and comparison operators (Filter).
      • Sort DataFrames by one or multiple columns in ascending or descending order (Sort).
  • Sampling πŸ“Š

    • Sample the first few rows (Head) or the last few rows (Tail) of the DataFrame.
    • Retrieve random samples of data either by count or percentage (Sample).
  • Descriptive Statistics and Information πŸ“ˆ

    • Display a comprehensive statistical summary of the DataFrame (Describe).
    • Show basic information such as the number of rows, columns, data types, and memory usage (Info).
    • Count null values per column (NullCount).
  • Visualization 🎨

    • Plot histograms for numeric columns to visualize data distribution (PlotHistogram).
  • Helper Methods 🧰

    • Type Conversion:
      • Attempt to convert values from one data type to another (TryConvertValue).
    • Statistical Calculations:
      • Calculate statistical metrics for columns (CalculateColumnStats).
      • Determine specific percentiles within data (CalculatePercentile).
  • Pretty Printing ✨

    • Display DataFrame contents in a formatted table with customizable row limits (Print).
  • Error Handling and Resource Management πŸ›‘οΈ

    • Utilize try-finally blocks to ensure proper memory management.
    • Handle exceptions gracefully to maintain robust applications.
  • Flexible Data Loading πŸ“₯

    • Load data from DuckDB result sets (LoadFromResult).
    • Support for both single and multiple Parquet files, facilitating scalable data processing.
  • Data Export πŸ“€

    • Export processed and analyzed data to CSV files, ensuring compatibility and ease of data sharing.
  • Integration with DuckDB Connection πŸ¦†

    • Directly query and manipulate data from DuckDB databases, enhancing data workflow efficiency.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE.md file for details.

πŸ™ Acknowledgments

About

A clean and intuitive wrapper for DuckDB in FreePascal, providing easy database operations and DataFrame-like result handling.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published