Skip to content

Common Generic DataFile DB V1.0 Aim to ingest kind of unstructured or half structure source (format including csv/json/xml/arvo/orc/parquet/protobuf/apache arrow) and add SQL Capacity and ETL Capacity without flush datas to any Database or hadoop filesystem. Data file can ingest from local/hdfs/ApacheVfs/AWS s3/google cloud storage/minio/Aliyun/ten

License

Notifications You must be signed in to change notification settings

robinhood-jim/GenericFileDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenericFileDB

Build Status license
structure


Common Generic DataFile DB V1.0 Aim to ingest kind of unstructured or half structure source (format including csv/json/xml/arvo/orc/parquet/protobuf/apache arrow) and add SQL Capacity and ETL Capacity without flush datas to any Database or hadoop filesystem. Data file can ingest from local/hdfs/ApacheVfs/AWS s3/google cloud storage/minio/Aliyun/tencent cos/baidu BOS/huawei OBS and etc. Files less than 4G bytes can process without flush to tmp path. large than 4G orc/parquet/arrow binary file must be download first. Now only support one file SQL filtering,later will support multiple files with MapReduce

Prerequisites

  • Java 11+ above.
  • Maven 3.8.6 above
  • add following to you pom
<dependency>
    <groupId>com.robin.gfdb</groupId>
    <artifactId>core</artifactId>
    <version>1.0-SNAPSHOT</version>
</dependency>

Examples

read csv from FileSystemAccessor

    DataCollectionMeta.Builder builder=new DataCollectionMeta.Builder();
    builder.addColumn("id", Const.META_TYPE_BIGINT,null);
    builder.addColumn("name",Const.META_TYPE_STRING,null);
    builder.addColumn("description",Const.META_TYPE_STRING,null);
        ......
    try(LocalFileSystem fileSystem=LocalFileSystem.getInstance();
        AbstractFileReader reader=new CsvFileReader(meta,fileSystem)){
        fileSystem.init(meta);
        reader.init();
        while(reader.hasNext()){
            outputMap=reader.next();
            log.info("{}",outputMap);
        }finally {
            CommRecordFilter.close();
        }
        

About

Common Generic DataFile DB V1.0 Aim to ingest kind of unstructured or half structure source (format including csv/json/xml/arvo/orc/parquet/protobuf/apache arrow) and add SQL Capacity and ETL Capacity without flush datas to any Database or hadoop filesystem. Data file can ingest from local/hdfs/ApacheVfs/AWS s3/google cloud storage/minio/Aliyun/ten

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages