From 7d4ae79a9c95238157f84899a3e28f1869f8a580 Mon Sep 17 00:00:00 2001 From: Dallas Coble Date: Thu, 18 Mar 2021 17:18:07 +0800 Subject: [PATCH] typos and grammer update --- _layouts/index.html | 73 ++++++++++---------- _posts/2020-08-04-hello-base.md | 83 +++++++++++------------ _posts/2020-11-08-rustfest2020.md | 8 +-- _posts/2021-01-01-plans.md | 29 ++++---- _posts/2021-01-05-rust_china_conf_2020.md | 10 +-- _posts/2021-01-08-eng_rust_tips_1.md | 42 ++++++------ _posts/2021-03-16-announce_base_fe.md | 62 ++++++++--------- 7 files changed, 151 insertions(+), 156 deletions(-) diff --git a/_layouts/index.html b/_layouts/index.html index acd1e24..092c1fd 100644 --- a/_layouts/index.html +++ b/_layouts/index.html @@ -58,7 +58,7 @@

TensorBase

New Bigdata Warehousing for SMEs

- Join the Alpha Plan + Join the Alpha
@@ -67,57 +67,54 @@

Fastest OLAP Performance

TensorBase is optimized for large-scale data analysis.
- 5x to 10000x faster than that of - ClickHouse for billion+ rows query set. + 5x to 10000x faster than ClickHouse for billion+ row query sets.

-

Read the performance legend of TensorBase. +

Read the performance benchmarks of TensorBase.

Open Source Powered

TensorBase is written in Rust with the open source gene.
- Unbreakable solid engineering from day - 0.
- Do not compromise on the beauty of - engineering when pursuing ultimate performance. + Unbreakable solid engineering from day 0.
+ Does not compromise when pursuing ultimate performance.

Economically Efficient

-

TensorBase is optimized for modern commodity hardwares.
- Petabyte level data housing in one - box.
+

TensorBase is optimized for modern commodity hardware.
+ Petabyte level data housing in one box.
Bigdata analysis costs are reduced to - 1/100 of that of mainstream peers. + 1/100th vs that of mainstream peers.

DBA-Free Automation

-

TensorBase maximally hides database details for users.
+

TensorBase automatically tunes database details for users.
No dedicated administrators needed.
- Reliability and high availability for - regular failures are integrated out of box.
- One-executable-file green installation. + Reliability and high availability for recovery from + failures is integrated out of box.
+ One-executable-file installations.

ClickHouse Compatible

-

TensorBase works seamlessly with current ClickHouse tooling ecosystem.
+

TensorBase works seamlessly with the current ClickHouse tooling ecosystem.
TensorBase supports mixed deployments with ClickHouse.
Fearless adaption as a drop-in or - accelerator depends on your workloads. + accelerator depending on your workloads.

Cloud Neutral

TensorBase is optimized for SMEs (Small and Medium Enterprises).
- Not Binding to any cloud vendors and - components. Cloud unlocking empowers ordinary individuals bigdata wisdom in affordable - costs. + Not Binding to any cloud vendors or + components.
+ Cloud native approach empowers ordinary individuals + with bigdata wisdom at affordable costs.

@@ -138,11 +135,11 @@

TensorBase Frontier Edition Alpha Demo

Alpha Plan

-

TensorBase has provided an alpha release of its Frontier Edition to early partners.

+

TensorBase provides an alpha release of its Frontier Edition to early partners.

The Alpha - release has not yet reached the production quality. This plan is targeted to let early partners - evaluate the applicability of TensorBase in their production and data context, and feedback - their ideas, bugs for further improvements.

+ release is undergoig rapid development and has not yet reached the production quality. + This plan is targeted to let early partners evaluate the applicability of TensorBase with their + data and context. It also allows them to provide feedback, ideas, and bug reports to further improve TensorBase.

To join the alpha plan, just drop a message to alpha@tensorbase.io.

@@ -152,25 +149,25 @@

Alpha Plan

Beta Release

TensorBase plans to distribute a commercial-friendly free binary of TensorBase FE when the - production quality reached.

+ production quality is reached.

For SMEs (Small and Medium Enterprises), it is enough to use the free binary distribution. - Beta plan on the top of free binary will be provided to meet the needs of different businesses. + A Beta plan in addition to the free binary will be provided to help meet the needs of different businesses.

-

Users of alpha plan are continuously serviced under the beta plan directly.

+

Users of alpha plan are continuously serviced under the beta plan.

Enterprise Collaboration Source Plan

TensorBase FE proposes an enterprise collaboration source plan for any organization which - agrees with TensorBase's visions and willing to help it.

+ agrees with TensorBase's vision and is willing to help it.

-

To join the ESCP, no money needed, companies just -

  • Contributes at least one person into TensorBase.
  • +

    To join the ECSP, no money is needed, companies just: +

  • Contribute at least one person to help TensorBase development.
  • - Then you have the ESCP benefits like: -

  • Full source use rights, including any commercial uses.
  • + Then you have the ECSP benefits like: +
  • Full source use rights, including any commercial uses.
  • Propose and give priority to solving the requirements from your companies.
  • Day 0 maintenance from TensorBase's outstanding engineering talents.
  • @@ -183,11 +180,11 @@

    Enterprise Collaboration Source Plan

    Open Source

    -

    TensorBase is written in Rust with open source genes in the blood.

    -

    TensorBase provides an open source Base Edition, which is mixed current open source - implementations with tech ideas pulled from its Frontier edition.

    +

    TensorBase is written in Rust with open source genes since the beginning.

    +

    TensorBase provides an open source Base Edition, which is an open source + implementations with tech ideas pulled from the Frontier edition.

    - Based on the practices of ESCP, TensorBase actively explores the path to complete open source. TensorBase hopes to merge two editions into one finally. + Based on the practices of theECSP, TensorBase actively explores the path to becoming completely open source. TensorBase hopes to merge two editions into one.

    If you are willing to help, join our community or drop a message. @@ -226,4 +223,4 @@

    Open Source

    - \ No newline at end of file + diff --git a/_posts/2020-08-04-hello-base.md b/_posts/2020-08-04-hello-base.md index 3715f80..112a23d 100644 --- a/_posts/2020-08-04-hello-base.md +++ b/_posts/2020-08-04-hello-base.md @@ -1,14 +1,14 @@ --- layout: post -title: "TensorBase: a modern engineering effort for taming data deluges" +title: "TensorBase: A modern engineering effort for taming data deluges" date: 2020-08-04 --- -Today, I am pleased to announce the milestone 0 of [TensorBase](https://github.com/tensorbase/tensorbase) (called Base for short as following). +Today, I am pleased to announce milestone 0 of [TensorBase](https://github.com/tensorbase/tensorbase) (called Base for short). -Base is a modern engineering effort for building a high performance and cost-effective bigdata analytics infrastructure in an open source culture. +Base is a modern engineering effort for building high performance and cost-effective bigdata analytics infrastructure with an open source culture. -Base is written in the Rust language and its friend C language. +Base is written in the Rust language and its friend C. ## Philosophy @@ -16,37 +16,36 @@ Base is written in the Rust language and its friend C language. ### First principle -Base is a project from the ["first principle"](https://en.wikipedia.org/wiki/First_principle). That is, unlike many other NoSQL/NewSQL products which claimed to be based on [Google Spanner](https://en.wikipedia.org/wiki/Spanner_(database)) or similar, Base is based on almost **nothing**. +Base is a project from the ["first principle"](https://en.wikipedia.org/wiki/First_principle). That is, unlike many other NoSQL/NewSQL products which claim to be based on [Google Spanner](https://en.wikipedia.org/wiki/Spanner_(database)) or similar, Base is based on almost **nothing** and built from the ground up. -The first principle for Base is just, only starting from current available commodities and software tools(a.k.a., languages and operation systems), asks three questions: +The first principle for Base is starting only from currently available commodities and software tools(a.k.a. languages and operation systems), and asks three questions: -* (Dream #1) What limit can Base reach, facing the planetary data tide? +* (Dream #1) What limit can Base reach, facing the planetary tide of data? -* (Dream #2) What an exciting community can Base build around? +* (Dream #2) What exciting community can build around Base? -* (Dream #3) How can Base grant the public the ability to control the big data? +* (Dream #3) How can Base grant the public the ability to control and harness big data? ### Shared Something -In the design of distributed data systems, there are two architectures often talked: shared-nothing and shared everything. +In the design of distributed data systems, there are two architectures often talked about: shared-nothing and shared everything. -These are two extremes. Commonly, shared nothing is considered to be more scalable. But, it is suboptimal or even bad for non-trivial problems. For a database like product, nothing to share is awkward. For many global-viewpoint operations, it most likely requires global data movement. The remaindering is what/which/how data to remove. Easy to reason that a way of being ignorant of the environment and context is impossible to achieve the global optimum. +These are two extremes. Commonly, shared nothing is considered to be more scalable but, it is suboptimal or even bad for non-trivial problems. For a database like product, sharing nothing would be awkward. From a global operations viewpoint, it would most likely require the global movement of data. Then the remaining problem is what/which/how data to remove. Without sharing context and being ignorant of the environment it becomes impossible to achieve the global optimum. -OTOH, share everything, in fact, is idealized. Image that we have single big infinitely expandable computer. All problem solved... So, share everything truly compromises on scalability and has requirements to hardware. +OTOH, shared everything, in fact is idealized but unrealistic. Imagaine that we have a single big infinitely expandable computer. All our problems solved... but this isn't realistic. So, shared everything truly compromises on scalability and has requirements to hardware. -**Shared Something** is a dynamic best effort architecture sitting between shared nothing and everything, which achieves best performance via finitely sharing partial necessary contexts which are environments and computing dependent. This is just a highly abstract proposal in that Base is still in answering it. The direction to approach is, self adaptive computing via some shared infrastructures in the society of decentralization. Here, shared infrastructures do not mean centralized components. It could be just mechanisms to allow efficient information sharing/exchanging. +**Shared Something** is a dynamic best effort architecture sitting between shared nothing and everything, which achieves best performance via finitely sharing partial necessary contexts which are environment and computing dependent. This is just a highly abstract proposal and Base is still working on answering it. The direction to approach is, self adaptive computing via some shared infrastructures in the society of decentralization. Here, shared infrastructures do not mean centralized components. It could be just mechanisms to allow efficient information sharing/exchanging. ### Architectural Performance -Performance is one of core architectural designs. Long-year experience of performance engineering taught me that if performance cannot be considered architecturally first, the final evolution is just the performance collapse that cannot be repaired unless rewritten (but usually you have no chance to rewrite). +Performance is one of core architectural designs. Long-years of experience with performance engineering taught me that if performance cannot be considered architecturally first, the final evolution results in the performance collapsing but that it cannot be repaired unless rewritten (but usually you have no chance to rewrite). -Scalability is another character often talked by projects. The truth is, good scalability does not mean high performance. Poor performance implementations are probably easier to achieve high scalability (in that you use a relative low baseline which may shows no bottleneck in any aspects). High scalability within low performance is cost expensive, economic killer and high-carbon producer... +Scalability is another character often talked by projects. The truth is, good scalability does not mean high performance. Poor performance implementations are probably easier to achieve high scalability (in that you use a relative low baseline which may show no bottlenecks in any aspects). High scalability with low performance is cost expensive, economic killer and a high-carbon producer... Via architectural performance design, with Dream #1 in mind, Base is unique. - ## System For short, the architectures are summarized in the following figure. @@ -57,26 +56,26 @@ For short, the architectures are summarized in the following figure.

    System sketch of TensorBase

    -In this system, some parts are interesting, and some parts are indeed well-known (like in papers) but ridiculously no open-source counterpart. The details of components is out the scope of this post in that innovation big bangs here. - +In this system, some parts are interesting, and some parts are indeed well-known (in regards to papers) but unfortunately no open-source counterpart exists. + One of core efforts of Base is to provide a **highly hackable system** for the community under the guidance of Dream #2. The system of Base is built with Rust and C. Rust lays a great base for system engineering. C is used to power a critical runtime kernel for read path(query) via a high-performance c jit compiler. -The rationale for C is that Base needs a jit compiler for full performance (it is pity that recently I see some project say its ability to call vcl vectorized functions as "fastest" database...). +The rationale for C is that Base needs a jit compiler for full performance (it is pity that recently I see some project say its ability to call vcl vectorized functions as a "fastest" database...).

    By lowering the contributing bar, Base hopes more people can enjoy to engage in the community. -On the top of comfortable languages, the nature of "first principle" of Base, allows contributors to be more pleased to build elegant performance critical system with the help of the external excellent. For instance, to use modern Linux kernel's xdp in network stack to enable a scalable and cost-efficient RDMA alternative on commodity network (io_uring is that new standard tool in lower performance). +On the top of comfortable languages, the nature of "first principle" of Base, allows contributors to be more pleased to build elegant performance critical systems with the help of the excellent external tools. For instance, using modern Linux kernel's xdp in the network stack to enable a scalable and cost-efficient RDMA alternative on commodity network hardware (io_uring is that new standard tool in lower performance). -Note: storage and C jit compiler in the system are not released or released in binary form in the m0 (they are under heavy changes) but will come later. +Note: storage and C jit compiler in the system are not released or released in binary form in the m0. They are under heavy changes but will come later. ## Launch --------- -Let see what's in Base m0. +Let see what's in Base milestone 0. In m0, Base provides a prototyping level but full workflow from the perspective of simple data analysis tools: @@ -85,7 +84,7 @@ In m0, Base provides a prototyping level but full workflow from the perspective * provides a subcommand _table_, to create a table definition 2. command line tool: _baseshell_ - * provides a query client (now client is a monolithic to include everything) + * provides a query client (now the client is a monolithic to include everything) * m0 only supports query with single integer column type sum aggregation intentionally.

    @@ -102,7 +101,7 @@ Base explicitly separates write/mutation behaviors into the cli baseops. the pro ```bash cargo run --release --bin baseops import csv -c /jian/nyc-taxi.csv -i nyc_taxi:trip_id,pickup_datetime,passenger_count:0,2,10:51 ``` -Base import tool uniquely supports to import csv partially into storage like above. Use help to get more infos. +Base's import tool uniquely supports importing csv partially into storage like above. Use help to get more info. 3. run _baseshell_ to issue query against Base ```bash @@ -111,18 +110,18 @@ cargo run --release --bin baseshell

    ### Benchmark -New York Taxi Data is an interesting [dataset](https://clickhouse.tech/docs/en/getting-started/example-datasets/nyc-taxi/). It is a size-fit-for-quick-bench real world dataset. And, it is often used in eye-catching promotional headlines, such as "query 1.xB rows in milliseconds". Now, I compare Base m0 against another OLAP DBMS [ClickHouse](https://en.wikipedia.org/wiki/ClickHouse)(20.5.2.7, got in July, 2020): +New York Taxi Data is an interesting [dataset](https://clickhouse.tech/docs/en/getting-started/example-datasets/nyc-taxi/), it is a size fit for quickly benchmarking a real world dataset. It is often used in eye-catching promotional headlines, such as "query 1.xB rows in milliseconds". Now, we compare Base m0 against another OLAP DBMS [ClickHouse](https://en.wikipedia.org/wiki/ClickHouse)(20.5.2.7, got in July, 2020): -1. The Base csv import tool is vectorized. It supports raw csv processing at ~20GB/s in memory. The ~600GB 1.46 billion nyc taxi dataset importing run saturates my two SATA raid0 (1GB/s) and finished in 10 minutes. ClickHouse run at 600MB/s in the same hardware. +1. The Base csv import tool is vectorized. It supports raw csv processing at ~20GB/s in memory. The ~600GB 1.46 billion nyc taxi dataset importing run, saturates my two SATA raid0 (1GB/s) and finished in 10 minutes. ClickHouse ran at 600MB/s on the same hardware. - NOTE: Because ClickHouse does not support csv partially importing. So, 600MB/s ClickHouse importing is done in a ClickHouse favored way: to use a column-stripped csv. 600GB for Base is much larger than node's memory, but the size of column-stripped csv for ClickHouse is memory-fit (and the cacheline is fully utilized). The official reports that ClickHouse on 8-disk raid5 takes [76 minutes to finish](https://clickhouse.tech/docs/en/getting-started/example-datasets/nyc-taxi/). + NOTE: ClickHouse does not support csv partial importing. So, the 600MB/s ClickHouse importing is done in a ClickHouse favored way, by using a column-stripped csv. 600GB for Base is much larger than a node's memory, but the size of column-stripped csv for ClickHouse is memory-fit (and the cacheline is fully utilized). The official report states that ClickHouse on 8-disk raid5 takes [76 minutes to finish](https://clickhouse.tech/docs/en/getting-started/example-datasets/nyc-taxi/). 2. for the following simple aggregation, ```sql select sum(trip_id-100)*123 from nyc_taxi ``` - **Base m0 are 6x faster than ClickHouse (20.5.2.7)**: Base runs in ~100ms, ClickHouse runs in 600ms+. + **Base m0 is 6x faster than ClickHouse (20.5.2.7)**: Base runs in ~100ms, ClickHouse runs in 600ms+.

    @@ -136,34 +135,34 @@ select sum(trip_id-100)*123 from nyc_taxi

    Aggregation result in ClickHouse client

    -Note #1: The [official ClickHouse configs](https://clickhouse.tech/docs/en/getting-started/example-datasets/nyc-taxi/) are used. The bench tests run several times, and use the best result for ClickHouse. (Then, this is just an in-memory query in same hardware, no actual disk io is involved and the ClickHouse result is on par with [officially listed single node result](https://clickhouse.tech/docs/en/getting-started/example-datasets/nyc-taxi/)). +Note #1: The [official ClickHouse configs](https://clickhouse.tech/docs/en/getting-started/example-datasets/nyc-taxi/) are used. The bench test runs several times, and uses the best result for ClickHouse. (This is just an in-memory query on the same hardware, no actual disk io is involved and the ClickHouse results are on par with [officially listed single node result](https://clickhouse.tech/docs/en/getting-started/example-datasets/nyc-taxi/)). -Note #2: it is interesting to see a bug in Base's jit compiler: it is expected that the second run of kernel should be shorter than it first run. +Note #2: It is interesting to see a bug in Base's jit compiler; it is expected that the second run of kernel should be shorter than the first run. -Note #3: for showing the version of ClickHouse, the figure just picks the first run screenshot after client login. This does not affect the result in that the ClickHouse is in the server-client arch. +Note #3: For showing the version of ClickHouse, the figure just picks the first run screenshot after client login. This does not affect the results, in that ClickHouse is in the server-client arch. -Here, what I want to emphasize that the performance of Base for such trivial case is almost hitting the ceiling of single six-channel Xeon SP. That is because the parallel summation is just a trivial memory bandwidth benchmarking loop. There are tiny improvement rooms stay tuned: -* ~15ms of current jit compilation (not trivial, after above bug fixed, extra 15-20ms saving); +Here, what I want to emphasize is that the performance of Base on a trivial case is almost hitting the ceiling of single six-channel Xeon SP. That is because the parallel summation is just a trivial memory bandwidth benchmarking loop. There is still room for some tiny improvements, so stay tuned: +* ~15ms of current jit compilation (not trivial, after the bug fix, an extra 15-20ms saving); * 10-20ms of current naive parallel unbalancing (trivial for just fixing, not trivial for architectural elegance);

    -Although Base m0 just gives a prototype, it is **trivial** to expand to all other operations for a single table in hours. Welcome to Base community! +Although Base m0 just gives a prototype, it is **trivial** to expand to all other single table operations in hours. Welcome to the Base community! -## Meeting in Community +## Meeting the Community ----------------------- -The true comparison is that, thanks to the appropriate ideologies (not only first principle...year-after-year experiences, practices, thoughts) and tools (Rust and C...), two-month Base can crush four-year ClickHouse in performance. +The true comparison is that, thanks to the appropriate ideologies (not only first principle...year-after-year experiences, practices, thoughts) and tools (Rust and C...), in two-months Base can already crush four-year ClickHouse on performance for these simple starting metrics. Base is ambitious: from storage, to sql compilation, to mixed analytics load scheduling, to client, to performance engineering, to reliability engineering, to ops engineering and to security and privacy. Take a glimpse: -1. Base invents a data and control unified linear IR for SQL/RA (I called "sea-of-pipes"). -2. Base favors a kind of high level semantic keeping transforms from IR to C which provides lighting fast, easy maintaining code generations. -3. Base supports to write C in Rust via proc_macro and in future it is not hard to provide an in-Rust-source error diagnoses and debugging experiences. OTOH, this allows to use any Rust logics to generate arbitrary C templates. -4. Base is in prototyping a new tiered compilation which sunshines in short-time queries compared to other open-source LLVM IR based counterparts. +1. Base invents in a data and control unified linear IR for SQL/RA (I called "sea-of-pipes"). +2. Base favors a kind of high level semantic keeping transforms from IR to C which provide lighting fast, easily maintainable code for generations. +3. Base supports writting C in Rust via proc_macro and in the future it is not hard to provide an in-Rust-source error diagnoses and debugging experiences. OTOH, this allows to use any Rust logics to generate arbitrary C templates. +4. Base is prototyping a new tiered compilation which sunshines in short-time queries compared to other open-source LLVM IR based counterparts. 5. ...

    -If you are from Rust or C communities, please [give Base a star](https://github.com/tensorbase/tensorbase). To help Base being the strong community voice: we can build best bigdata analytics infrastructure in the planet. +If you are from Rust or C communities, please [give Base a star](https://github.com/tensorbase/tensorbase). This helps Base build a strong community voice: where we can build the best bigdata analytics infrastructure on the planet. -If you are having some of dreams which Base have, we are the same data nerds. [Join us!](https://github.com/tensorbase/tensorbase) +If you are having some of the same dreams which Base has and want to work together with us data nerds, [Join us!](https://github.com/tensorbase/tensorbase) diff --git a/_posts/2020-11-08-rustfest2020.md b/_posts/2020-11-08-rustfest2020.md index 7974c00..b881ecd 100644 --- a/_posts/2020-11-08-rustfest2020.md +++ b/_posts/2020-11-08-rustfest2020.md @@ -4,16 +4,16 @@ title: "RustFest Global 2020: Architect a High-performance SQL Query Engine in R date: 2020-11-08 --- -I have presented a talk ["Architect a High-performance SQL Query Engine in Rust"](https://rustfest.global/session/18-architect-a-high-performance-sql-query-engine-in-rust/) in the RustFest Global 2020. +I have presented the talk ["Architect a High-performance SQL Query Engine in Rust"](https://rustfest.global/session/18-architect-a-high-performance-sql-query-engine-in-rust/) at the RustFest Global 2020 confrence. -In this talk, I summarize main apsects of TensorBase from designs, implementations and engineerings. This is the first systematic writing about TensorBase. I try to keep the readings as simple as possible although the topics of high performance and are usually hard. So, this presentation is a nice introducation for the newcomers who are interesting to get more familiar with TensorBase. And so, there are still tons of details and advanced aspects which I have no enough time to touch upon. I hope more exciting information could be shared gradually with the growth of TensorBase. +In this talk, I summarize main apsects of TensorBase from designs, implementations and engineering. This is the first systematic writing about TensorBase. I try to keep the readings as simple as possible although the topics of high performance and are usually hard. This presentation is a nice introducation for the newcomers who are interested in getting more familiar with TensorBase. There are still a ton of details and advanced aspects which I didn't have enough time to touch upon. I hope in the future that more exciting information can be shared gradually with the growth of TensorBase. Here is the presentation.

    -And RustFest team will release the video of the talk at some time if you are interested in. +And RustFest team will release the video of the talk at some time if you are interested. -As I have mentioned in the talk, the next version of TensorBase is targeted to be released in this month. The biggest change from current first M0 release is that in this new version, TensorBase will pivot to a ClickHouse compatible bigdata system implementation. I am still busy in a lot of work to make this release date being caught up with. It is planned to provide more information when that new version is released. +As I have mentioned in the talk, the next version of TensorBase is targeted to be released this month. The biggest change from the first M0 release is that in this new version, TensorBase will pivot to a ClickHouse compatible bigdata system implementation. I am still busy catching up with a lot of work to make this release date. It is planned to provide more information when the new version is released. Stay tuned! diff --git a/_posts/2021-01-01-plans.md b/_posts/2021-01-01-plans.md index 7569253..6886335 100644 --- a/_posts/2021-01-01-plans.md +++ b/_posts/2021-01-01-plans.md @@ -1,10 +1,10 @@ --- layout: post -title: "Core Peering Plan and Community Practicer Plan" +title: "Core Peering Plan and Community Practice Plan" date: 2021-01-01 --- -To invite more interested people to join the community of TensorBase and Rust development, I proposed two plans/suggestions as follows: +To invite more interested people to join the community of TensorBase and Rust development, I propose two plans/suggestions as follows: ## Core Peering Plan @@ -15,44 +15,43 @@ Peer coding for [TensorBase Base edition](https://github.com/tensorbase/tensorba ### Requirements 1. Participants: Data or system enthusiasts, comfortable with Rust and C programming. (Otherwise, you will not be excited about this plan.) -2. You are guaranteed to put your a regular period of time (e.g. half an hour every day) into the peers coding. (So we can continue to discuss topics of interest.) - +2. You are guaranteed to put a regular period of time (e.g. half an hour every day) into the peer coding. (So we can continue to discuss topics of interest.) ### Benefits -Works with me and other data nerds (when have) for seeking the sky-high of massive data storage and analysis. +Work with me and other data nerds (when available) for seeking the cutting edge of massive data storage and analysis. I am [this guy](https://jinmingjian.xyz/resume/), who is eager, self-confident and proud to make peer with you. Don't hesitate to contact with me if you are a data nerd. -This plan is scheduled to be started in around a month late in that I am a little busy in next release of TensorBase now. But I just leave this infos in that even any early friends are welcome. +This plan is scheduled to be started in around a month later, in that I am a little busy finalizing the next release of TensorBase now. But I just leave this info so that any early friends are welcome. -## Community Practicer Plan +## Community Practice Plan ---------------------------- -Free style coding in Rust, in that it is not required to have a rigor schedule. +Free style coding in Rust, in that it is not required to have a rigorous schedule. -You do not want to put much time in data engineering but still have performance thinkings similar to that of me. +You do not have that much time for data engineering but still have similar performance ideas to mine. ### Directions -Linux/Backend focused with architectural performance. +Linux/Backend focused on architectural performance. ### Potential Topics -1. BaseLog: lock-free logging, to maximize the total performance of a system with minimized logging overheads -2. BaseNet: network io framework with new paradigm (no reactor, no async, no stupid long call stack but top performance) +1. BaseLog: lock-free logging, to maximize the total performance of a system while minimizing logging overhead +2. BaseNet: network io framework with a new paradigm (no reactor, no async, no stupid long call stack but top performance) 3. BaseCsv: bug fixed SIMD version of csv processing library 4. ... or you ideas ### Benefits -Works with me and other high perf nerds (when have) for seeking top performance with engineering in mind probably from a very different view. +Works with me and other high perf nerds (when available), seeking top performance and engineering in mind.. ### Participants -No requirement for skills. You are just loving the direction of this plan and having time to start to code. You are encouraged to provide your own ideas to see if we can start together. +No requirement for skills. You are just loving the direction of this plan and have time to start to code. You are encouraged to provide your own ideas to see if we can start together. ### Non-goal -Feature driven. Tons of same or similar feature based projects may exist in the earth. You just pick up some of these for your learning or just-working usage. +Feature driven. Tons of same or similar feature based projects may exist on the earth. You are just picking up some of these for your learning or just-working usage. ``` diff --git a/_posts/2021-01-05-rust_china_conf_2020.md b/_posts/2021-01-05-rust_china_conf_2020.md index b97a30a..d036339 100644 --- a/_posts/2021-01-05-rust_china_conf_2020.md +++ b/_posts/2021-01-05-rust_china_conf_2020.md @@ -6,20 +6,20 @@ date: 2021-01-05 I've just joined the [first Rust China Conf](https://2020conf.rustcc.cn/schedule.html) with TensorBase. -In this talk, the "where-from", "where-in" and "where-to" of TensorBase as Rust Based open source project has been presented. For the time limitation, I skip the technical details. You can consult [sources](https://github.com/tensorbase/tensorbase) and [another talk](https://tensorbase.io/2020/11/08/rustfest2020.html) to get some more. +In this talk, the "where-from", "where-in" and "where-to" of TensorBase as a Rust Based open source project have been presented. Because of the time limitation, I skip the technical details. You can consult [sources](https://github.com/tensorbase/tensorbase) and [another talk](https://tensorbase.io/2020/11/08/rustfest2020.html) to get some more info on the technical side of things. -Ancient Chinese says, "to teach fishing is better than to give fish". So, I hope you could see more when you stand at the sky of 100,000 miles high. And I also plan to share more documents from and experiences in TensorBase when more time released in the near future. +Ancient Chinese saying says, "to teach fishing is better than to give fish". So, I hope you could see more when you stand at the sky of 100,000 miles high. I also plan to share more documents from and experiences of building TensorBase. Here is the presentation.

    -And there is also a video record(in Chinese) which could be [seen here](https://www.bilibili.com/video/BV1Yy4y1e7zR?p=25). +And there is also a video (in Chinese) which could be [seen here](https://www.bilibili.com/video/BV1Yy4y1e7zR?p=25). -The great milestone before the conference is the primary completion of new base server. We do dirty works for our users and help them to achieve engineering excellences. That's the new engineering TensorBase wants to start. Futhermore, I am really looking forward to your [joining the community](https://tensorbase.io/2021/01/01/plans.html). +The great milestone finished before the conference was the primary completion of a new base server. We do the dirty work for our users and help them to achieve engineering excellence. That's the new engineering TensorBase wants to start. Futhermore, I am really looking forward to your [joining the community](https://tensorbase.io/2021/01/01/plans.html). -The most fascinating thing, which I got in the conference, is that we have a strong and heart-warming community! That's Rust! +The most fascinating thing, which I found at the conference, is that we have a strong and heart-warming community! That's Rust! Let's Rust! diff --git a/_posts/2021-01-08-eng_rust_tips_1.md b/_posts/2021-01-08-eng_rust_tips_1.md index d992c23..22273d6 100644 --- a/_posts/2021-01-08-eng_rust_tips_1.md +++ b/_posts/2021-01-08-eng_rust_tips_1.md @@ -7,25 +7,25 @@ date: 2021-01-08 ## Preamble Developing high-performance systems is a mind-twisting task. We face endless clever or stupid cases to conquer. It could be meaningful that we leave thoughts here to benefit all. -This series hope to present that **fast-reading tips**, from a viewpoint of real engineering, to reflect the problem we meet in [this project](https://tensorbase.io/) and how we solve them in an elegant way by the modern Rust. +This series hopes to present **fast-reading tips**, from a viewpoint of real engineering. We aim to reflect on the problems that we meet in [this project](https://tensorbase.io/) and how we solved them in an elegant way with modern Rust. -The idea of this series is inspired by our recent Rust Chinese community's conferences and online talkings. I really love our Rust community and thanks for all the help got from the community. +The idea of this series is inspired by our recent Rust Chinese Community conferences and online talks. I really love our Rust community and want to give thanks for all the help I got from the community. ## Proc Macro -Proc macro is an relatively unique language characteristics in Rust. It is great for library writers but notorious to figure out the problem when bugs happen. Unfortunately, bugs always happen... So, it should be careful to use self-brewed proc macros in large engineering. +Proc macro is an relatively unique language characteristics in Rust. It is great for library writers but notoriously difficult to figure out the problem when bugs happen. Unfortunately, bugs always happen... So, it should be with careful care that you use self-brewed proc macros in large engineering projects. -Sometimes, the desires to have more advanced representations exceed the fear from coding for the proc macro. You start to try! +Sometimes, the desire to have more advanced representations exceed the fear from coding proc macros. You start to try! -Commonly you've known, ```cargo expand``` and ```eprintln!```[1]. They are generally good with a little boring works to make them inconvenient. Occasionally, they meet problems for temporary broken IDEs or that you just do not want invasive prints. +Commonly you've known, ```cargo expand``` and ```eprintln!```[1]. They are generally good but with a little boring works to make them sometimes inconvenient. Occasionally, they meet problems with temporary broken IDEs or that you just do not want invasive prints. -Here, I suggest two other ways which I used in the Base engineering. +Here, I suggest two other ways which I used with engineering Base. ## Proc_macro_diagnostic API -The nightly has introduced "Procedural Macro Diagnostics" APIs [3] under the feature "proc_macro_diagnostic" as friendly diag-info-show tool which is seamlessly integrated into the proc macro output. +The nightly release has introduced "Procedural Macro Diagnostics" APIs [3] under the feature "proc_macro_diagnostic" as a friendly diag-info-show tool which is seamlessly integrated into the proc macro output. -Here, I use [the s! macro in TensorBase as an example](https://github.com/tensorbase/tensorbase/blob/812ade62dec267652cc21373bb5efddda9097925/crates/base/tests/proc_macro_tests.rs#L35), which makes your writing C, Java like codes in your Rust sources in a free style. (In fact, this is just a raw token container with in-Rust value interpolation, you can embed almost any language in Rust using it.) +Here, I use [the s! macro in TensorBase as an example](https://github.com/tensorbase/tensorbase/blob/812ade62dec267652cc21373bb5efddda9097925/crates/base/tests/proc_macro_tests.rs#L35), which allows you to write C, Java like code in your Rust sources in a free style. (In fact, this is just a raw token container with in-Rust value interpolation, you can embed almost any language in Rust using it.) A normal working scenario like this: @@ -45,9 +45,9 @@ When I miss a delimiter $ for dsadsa, then it just panics,

    -Here, we may not quickly understand the exact problem in that we have no useful indication. +Here, we may not quickly understand the exact problem because we have no useful indication. -Then, we use diag APIs in the potential key parsing points, like: +Now we can use diag APIs in the potential key parsing points, like:

    @@ -60,32 +60,32 @@ And try another case - interpolated variable typo, we got this:

    -We find that is an error prompt immediately in vscode/RA, which is shown that what's the problem identity and the location/span of this ident (Note: here the span is not exact which may be a bug or just a surprise). +We find an error prompt immediately in vscode/RA, which shows what the problems identity is and the location/span of this ident (Note: here the span is not exact which may be a bug or just a surprise). -By changing the API call from "error" to "warning", we got a "non-blocking" warning style prompt like this: +By changing the API call from "error" to "warning", we get a "non-blocking" warning style prompt like this:

    -There are four APIs: __error__, __warning__, __help__ and __note__ on Span for your favor. Consult the tracking issue for more[3] . +There are four APIs: __error__, __warning__, __help__ and __note__ on Span for depending on your needs. Consult the tracking issue for more info[3] . -In a real engineering, it is not hard to provide an in-Rust direct external language editing experience to use that language compiler(e.g. GCC, JavaC) on top of this diagnostics API. +In real engineering, it is not hard to provide an in-Rust direct external language editing experience to use that language compiler(e.g. GCC, JavaC) on top of this diagnostics API. Great user-friendly proc macro diag experience! ## Unit Testability for Proc Macro -Generally, the proc macro is just one compiler plugin kind to run at compilation time. However, it is even hard to figure out a good entrance for this kind debugging in that we are not language developers. +Generally, the proc macro is just one type of compiler plugin that runs at compilation time. However, it is hard to figure out a good entrance for this kind debugging because we are not language developers. -Another not-well-known way is, our core team gradually makes the proc macro unit testable (WIP). +Another not-well-known way to do this that our core team is gradually making the proc macro unit testable (WIP). -You just write a unit-testability-friendly tests like your conventional unit tests, for example, [unit tests for above s! macro test](https://github.com/tensorbase/tensorbase/blob/812ade62dec267652cc21373bb5efddda9097925/crates/base/proc_macro/src/lib.rs#L101). +You just write a unit-testability-friendly test like your conventional unit tests, for example, [unit tests for above s! macro test](https://github.com/tensorbase/tensorbase/blob/812ade62dec267652cc21373bb5efddda9097925/crates/base/proc_macro/src/lib.rs#L101). -The advantage of unit test is that you use every swiss knifes in your toolbox with a right engineering style(no any adhoc setup). +The advantage of unit test is that you can use every swiss knife in your toolbox with the right engineering style (don't need any adhoc setup). -For example, I enable the live debugging to the test to see what I have in kinds of proc macro syntax objects, like TokenStream here just by one click in RA: +For example, I enable the live debugging of the test to see what I have in diffrent kinds of proc macro syntax objects. Like with TokenStream here I just need one click in RA:

    @@ -93,11 +93,11 @@ For example, I enable the live debugging to the test to see what I have in kinds

    Neat! Do not need endless prints in you code any more! -Finally, it is hopedsh fearless proc macro programming coming soon:) +Finally, it is hoped that fearless proc macro programming is coming soon:) ## [Comments](https://www.reddit.com/r/rust/comments/kszyoa/engineering_rust_tips_1_proc_macro_debugging/) ## References 1. [cargo expand](https://github.com/dtolnay/cargo-expand) 2. [Debugging-tips from Dtolnay](https://github.com/dtolnay/proc-macro-workshop#debugging-tips) -3. [Tracking Issue: Procedural Macro Diagnostics (RFC 1566)](https://github.com/rust-lang/rust/issues/54140) \ No newline at end of file +3. [Tracking Issue: Procedural Macro Diagnostics (RFC 1566)](https://github.com/rust-lang/rust/issues/54140) diff --git a/_posts/2021-03-16-announce_base_fe.md b/_posts/2021-03-16-announce_base_fe.md index 581098e..cb593f0 100644 --- a/_posts/2021-03-16-announce_base_fe.md +++ b/_posts/2021-03-16-announce_base_fe.md @@ -5,7 +5,7 @@ date: 2021-03-16 --- ## Announcement -TensorBase is proud to announce that, today the alpha plan of Frontier Edition is officially available. TensorBase Frontier Edition alpha is a 5x ~ 10000x drop-in replacement or accelerator for ClickHouse. +TensorBase is proud to announce that, today the alpha plan of the Frontier Edition is officially available. TensorBase Frontier Edition alpha is a 5x ~ 10000x drop-in replacement or accelerator for ClickHouse. Watch the [alpha demo here](/#demo). Welcome to join in the [alpha plan here](/#alpha). @@ -13,7 +13,7 @@ Watch the [alpha demo here](/#demo). Welcome to join in the [alpha plan here](/# You may see that some databases claim they can aggregate ["billions of rows per second"](https://www.google.com/search?q=billion+rows+per+second). The truth is that, their "billions of rows" aggregations can be done in ~40 milliseconds on a modern socket as shown in TensorBase. -Let's start the journey of benchmark! +Let's start the benchmark journey! #### Round 1 - Architecture Gap @@ -26,15 +26,15 @@ Let's start the journey of benchmark! |SELECT max(123 * number+456 * number+789 * number) FROM system.numbers | 303.363 sec / 26.37 GB/s | 0.028 sec / ~ | 10833x |

    -In ClickHouse, system.numbers is a kind of virtual table to represent the natural number dataset. The measurements for system.numbers/numbers_mt makes no senses for the real world, but still uncovers the unique of TensorBase. +In ClickHouse, system.numbers is a kind of virtual table to represent the natural number dataset. The measurements for system.numbers/numbers_mt makes no senses for the real world, but can still be useful for benchmarking and showing what make TensorBase unique. -In the above benchmark, TensorBase FE's "count"/"sum" and even complex "sum" are small constants, which ridiculously 1000x to 10000x faster than that of ClickHouse. For this case, the speed ratio of TB to CH is quasi-infinite. This is true. Because TensorBase FE's JIT compiler does the smart constant time compilation for that interval-predefined loop. On the contrary, as the complexity of the expression increases a little, the performance degrades sharply in ClickHouse. +In the above benchmark, TensorBase FE's "count"/"sum" and even complex "sum" are small constants, which are 1_000x to 10_000x faster than that of ClickHouse. For this case, the speed ratio of TB to CH is quasi-infinite. This is because TensorBase FE's JIT compiler does a smart constant time compilation for that interval-predefined loop. On the contrary, as the complexity of the expression increases a little, the performance degrades sharply in ClickHouse. -This is an excellent demo of modern compilation technology to data analysis, which can not done in the volcano model of common open source OLAP databases. +This is an excellent demo of modern compilation technology applied to data analysis, which can not done in the volcano model of common open source OLAP databases. -This is also a great example of the performance gap caused by the architecture gap. We have entered the post-Moor era. [**Architectural performance**](/2020/08/04/hello-base.html) is one of core methodologies from TensorBase for the post-Moore era. More about TensorBase's methodologies have been cooked [here presentations](/about) and [source codes](https://github.com/tensorbase/tensorbase/tree/m0). +This is also a great example of the performance gap caused by the architecture gap. We have entered the post-Moor era. [**Architectural performance**](/2020/08/04/hello-base.html) is one of core methodologies from TensorBase for the post-Moore era. More about TensorBase's methodologies have been discussed [in these presentations](/about) and [source codes](https://github.com/tensorbase/tensorbase/tree/m0). -Note: attentive readers may still be interested in that 27 mill-seconds overhead(the TensorBase guy calls it, "uncore" cost, a.k.a., cost on parsing/optimization/scheduling...). This overhead can be reduced to 3 milliseconds or so for special cases and shown in the following real world "count" benchmark. +Note: attentive readers may still be interested in that 27 mill-seconds overhead (the TensorBase guy calls it, "uncore" cost, a.k.a., cost on parsing/optimization/scheduling). This overhead can be reduced to 3 milliseconds or so for special cases and shown in the following real world "count" benchmark. #### Round 2 - Real World Bigdata @@ -44,14 +44,14 @@ Note: attentive readers may still be interested in that 27 mill-seconds overhead

    Comparison On End-to-end Query Time

    -Here, the NYC taxi 1.47 billion dataset is used. The alpha release of TensorBase supports main operations on single table. Four kinds of single-table queries are compared: +Here, the NYC taxi 1.47 billion dataset is used. The alpha release of TensorBase supports main operations on a single table. Four kinds of single-table queries are compared: * Q#1: simple count * Q#2: a little complex sum * Q#3: simple two-column aggregation * Q#4: simple group by * Q#5: a little complex group by -More detailed benchmark infos could be seen in [the project](https://github.com/tensorbase/tensorbase_frontier_edition). +More detailed benchmark info can be seen in [the project](https://github.com/tensorbase/tensorbase_frontier_edition). On the surface, the end-to-end query time of TensorBase is 1/4.5 to 1/31 of that of ClickHouse. @@ -66,55 +66,55 @@ The so called "limit" is memory bandwidth, in that most bigdata analysis is memo

    Comparison On Bandwidth Usage

    -Here, as example, the end-to-end bandwidth of Q#2 and Q#4 is shown, which is calculated by (queried dataset size)/(query time). +For example, here the end-to-end bandwidth of Q#2 and Q#4 is shown, which is calculated by (queried dataset size)/(query time). -Let's talk about Q#2 firstly. For the end-to-end bandwidth of Q#2 in ClickHouse is 6.5 GB/sec, while that in TensorBase is 81.4 GB/sec. If only consider the execution of the query core, that is, if omit the "uncore" cost, the core bandwidth is 95 GB/sec, which is calculated by (queried dataset size)/(query time - "uncore" time). The node is setup with 6-channel DDR4-2400 REG ECC DRAMs, its measured practical max bandwidth (by Intel VTune) is just 100 GB/sec. That is, TensorBase achieves 95% core memory bandwidth utilization in this case. Memory wall hit! +Let's talk about Q#2 first. For the end-to-end bandwidth of Q#2 in ClickHouse is 6.5 GB/sec, while in TensorBase it is 81.4 GB/sec. If you only consider the execution of the query core, that is if you omit the "uncore" cost, the core bandwidth is 95 GB/sec, which is calculated by (queried dataset size)/(query time - "uncore" time). The node is setup with 6-channel DDR4-2400 REG ECC DRAMs, its measured practical max bandwidth (by Intel VTune) is just 100 GB/sec. That is, TensorBase achieves 95% core memory bandwidth utilization in this case. Memory wall hit! #### How High is the Sky? -Note, Q#2 is a complex sum (at least with predicate). For a simple count/sum, it is naive to hit the memory wall (but unfortunately the result of current ClickHouse still far from this). +Note, Q#2 is a complex sum (at least with predicate). For a simple count/sum, it is naive to hit the memory wall (but unfortunately the result of current ClickHouse is still far from this). How about more complex group-by? See Q#4: the end-to-end bandwidth of Q#4 in TensorBase FE is 66.6 GB/sec, of which the core bandwidth is 80 GB/sec. TensorBase FE achieves 80% core memory bandwidth utilization in this datetime involved group-by! -TensorBase FE is the first product to claim that it can run kinds of group-by queries in almost the same speed as non group-by aggregation queries in the modern CPU. This is non-trivial. +TensorBase FE is the first product to claim that it can run kinds of group-by queries in almost the same speed as non group-by aggregation queries on the modern CPU. This is non-trivial. -Furthermore, TensorBase wants to answer a question: how high is the sky? or where is the performance limit of bigdata analysis? It deserves more stories, and TensorBase invites you [into the journey](/community) for that more. +Furthermore, TensorBase wants to answer a question: how high is the sky? or where is the performance limit of bigdata analysis? It deserves more stories, and TensorBase invites you [along for the journey](/community). ## Rust Powered -People may have a question: why does TensorBase use Rust to write its system? Why not just stand on the shoulders of C++ implementations? It is true that, a good engineer can use any language to complete any engineering. But, +People may have a question: why does TensorBase use Rust to write it's system? Why not just stand on the shoulders of C++ implementations? It is true that, a good engineer can use any language to complete any task. But, -* A good engineer will not being pleasured in using any language -* It is not a good engineering only relying on good engineers +* A good engineer will not be satisfied and productive using any language +* You can't always rely on only having the best egineers available #### Constrained Programming Model is Engineering Paradigm Revolution -I played with ClickHouse monthly, I met three no warning crashes in-between simple but gigabyte ingestions, queries. This is a normal C++ project experience. But it is unnecessarily normal for the modern engineering. +I used ClickHouse monthly, and in one month I met three no warning crashes in-between simple but large (gigabyte) ingestions and queries. This is a normal C++ project experience. But it is unnecessarily normal for the modern engineering. -In TensorBase FE, there is no crash with 256 concurrent request multi-threading stress. Also in TensorBase FE, there is no crash after TB level data have been ingested day by day from day 1 of storage layer ready till now. In TensorBase, such a confidence has been established: unbreakable solid after the compilation pass. +In TensorBase FE, there were no crashes even under 256 concurrent muli-threaded request stress. Also in TensorBase FE, there have been no crashes after TB level data has been ingested day by day from day 1 of the storage layer being ready till now. In TensorBase, such a confidence has been established: unbreakable solid after the compilation passes thanks to Rust. This era also urgently needs new engineering paradigm. Rust is actually setting off an engineering paradigm revolution. The explicitly constrained programming model of Rust eliminates potential security risks before production runs. TensorBase's methodologies can only be thoroughly embodied by Rust. -TensorBase does not compromise on the beauty of engineering when pursuing ultimate performance. And, as you seen, the experience from TensorBase shows that, with the help of Rust, both can be greatly achieved! +TensorBase does not compromise on the beauty of engineering when pursuing ultimate performance. And, as you have seen, the experience from TensorBase shows that with the help of Rust, both can be greatly achieved! ## Economics And Sociology of Data Warehouse -This topic deserves to be covered in dedicated articles. 10x performance improvement is not just 10x cost reduction. On the other hand, the cost of bigdata analysis is much high and becoming higher nowadays. This makes that ordinary individuals are hard to benefit from the bigdata. Furthermore, a well thought and engineered infrastructure can not only benefit its users but also help the operations of whole society. +This topic deserves to be covered in dedicated articles. 10x performance improvement is not just 10x cost reduction. On one hand, the cost of bigdata analysis is becoming higher and higher nowadays. This makes it so that for ordinary individuals it is hard to benefit from bigdata. Furthermore, a well thought and engineered infrastructure can not only benefit its users but also help the operations of whole society. -## Open Source Dilemma For Small +## Open Source Dilemma For Small Startups -The founder of TensorBase hopes the project can help all common people and businesses in this era to understand and benefit from the bigdata unique to this era. But he is just one person. If a big giant open sources a project, it has enough resources to control the project grown under its designed path. This obviously does not hold for small individuals and startups. Recent [elasticsearch relicensing](https://www.google.com/search?q=elastic+relicensing) is just one of new examples, but won't be the last. +The founder of TensorBase hopes the project can help all common people and businesses in this era to understand and benefit from the bigdata unique to this era. But he is just one person. If a tech giant open sources a project, it has enough resources to control the project and ensure that it grows under it's designed path. This obviously does not hold true for small individuals and startups. Recent [elasticsearch relicensing](https://www.google.com/search?q=elastic+relicensing) is just one of new examples, but won't be the last. TensorBase's thoughts are divided into following three aspects: #### Open source Base Edition -TensorBase splits the performance insensitive part into the APLv2 based open source [Base Edition (a.k.a., TensorBase BE)](https://github.com/tensorbase/tensorbase). As you seen, many open source projects have done in OLAP fields. TensorBase BE may explore different paths, for example, fully Rust, modular/pluggable/embeddable designs(thinking an OLAP version SQLite in Rust), and modern high performance components in Rust. +TensorBase splits the performance insensitive part into the APLv2 based open source [Base Edition (a.k.a., TensorBase BE)](https://github.com/tensorbase/tensorbase). As you have seen, many open source projects have done well in OLAP fields. TensorBase BE may explore different paths, for example, fully Rust, modular/pluggable/embeddable designs(thinking an OLAP version SQLite in Rust), and modern high performance components in Rust. -TensorBase BE is still in its early stage, and welcome any idea. +TensorBase BE is still in its early stage, and welcomes any new ideas. #### Free binary distribution from beta @@ -127,17 +127,17 @@ See plan in the [website here](/#ecsp). ## ClickHouse Compatible -TensorBase is the friend of ClickHouse. Although the performance of TensorBase beats ClickHouse by orders of magnitude, ClickHouse still provides more flexibilities which developers like over other OLAP implementations. TensorBase highly values ClickHouse as an commercial-friendly open source data warehouse solution, although disagrees with ClickHouse's technical route. +TensorBase is the friend of ClickHouse. Although the performance of TensorBase beats ClickHouse by orders of magnitude, ClickHouse still provides more flexibility which developers like over other OLAP implementations. TensorBase highly values ClickHouse as an commercial-friendly open source data warehouse solution, although disagrees with ClickHouse's technical route. -TensorBase contributes values to whole ClickHouse ecosystem via the communication protocol compatibility. TensorBase supports mixed deployments with ClickHouse. Users in ClickHouse ecosystem can freely choose one or both. If you want fastest responses, ask TensorBase. If you want more features which are not supported by TensorBase, query to ClickHouse with in the same client/driver/frontend. +TensorBase contributes values to the whole ClickHouse ecosystem via the communication protocol compatibility. TensorBase supports mixed deployments with ClickHouse. Users in ClickHouse ecosystem can freely choose one or both. If you want fastest responses, query TensorBase. If you want more features which are not supported by TensorBase, query to ClickHouse with in the same client/driver/frontend. -More infos about current compatibilities to ClickHouse could be seen in the [project page](https://github.com/tensorbase/tensorbase_frontier_edition). +More info about current compatibilities to ClickHouse can be seen on the [project page](https://github.com/tensorbase/tensorbase_frontier_edition). ## Near Future -In this alpha announcement, a solid foundation has been demonstrated. The beta version is coming soon in months. Besides joins, more performance and several key features for production will be shipped in beta. +In this alpha announcement, a solid foundation has been demonstrated. The beta version is coming soon in a couple months. Besides joins, more performance and several key features for production will be shipped in beta. -Finally, as the founder of TensorBase, I again invites any who agrees with TensorBase's visions to join in. [Enterprise Collaboration Source Plan](/#ecsp) is just one of that designed ideas to assist in this process. TensorBase has an open source bigdata infrastructure dream in its blood. I am optimistic about this. +Finally, as the founder of TensorBase, I again invites any who agrees with TensorBase's visions to join in. [Enterprise Collaboration Source Plan](/#ecsp) is just one of that designed ideas to assist in this process. TensorBase has an open source bigdata infrastructure dream at it's core. I am optimistic about this. If you are interested or willing to help, don't hesitate to join the community or drop me a message. @@ -145,4 +145,4 @@ If you are interested or willing to help, don't hesitate to