Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4-write-process #345

Merged
merged 75 commits into from
Jan 22, 2025
Merged

v4-write-process #345

merged 75 commits into from
Jan 22, 2025

Conversation

761417898
Copy link
Contributor

@761417898 761417898 commented Dec 25, 2024

Implemented the V4 tree model and table model write processes. Support string data types.. The following still need to be implemented/fixed:

  1. Implemented the read process for the V4 table model.
  2. Support encryption.
  3. Verified support for aligned time series in the table model.

cpp/src/common/device_id.h Outdated Show resolved Hide resolved
cpp/src/common/device_id.h Outdated Show resolved Hide resolved
cpp/src/common/schema.h Outdated Show resolved Hide resolved
cpp/src/common/schema.h Outdated Show resolved Hide resolved
cpp/src/common/schema.h Outdated Show resolved Hide resolved
cpp/src/common/tsfile_common.h Outdated Show resolved Hide resolved
cpp/src/common/tsfile_common.h Outdated Show resolved Hide resolved
cpp/src/file/tsfile_io_writer.cc Outdated Show resolved Hide resolved
cpp/src/writer/tsfile_writer.cc Outdated Show resolved Hide resolved
cpp/src/writer/tsfile_writer.h Outdated Show resolved Hide resolved

int segment_num() override { return static_cast<int>(segments_.size()); }

std::vector<std::string> get_segments() const override { return segments_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If read only
you can return const std::vector<std::string>& to avoid copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return tableName_;
}

size_t lastSeparatorPos = device_id_.find_last_of('.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use PATH_SEPARATOR_CHAR instead of '.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 170 to 172
std::string get_device_name() const override { return device_id_; };

std::string get_table_name() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can verify that you need to return a copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (ret == common::E_OK) {
if (RET_FAIL(common::SerializationUtil::write_str(measurement_name_,
out))) {
for (auto &prop : props_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto& prop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

: table_name_(table_name),
measurement_schemas_(measurement_schemas),
column_categories_(column_categories) {
int idx = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use size_t to avoid potential problems

Comment on lines 43 to 45
std::string to_string() {
return std::string(buf_, len_);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already one such method. You can view the develop branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


int add_timestamp(uint32_t row_index, int64_t timestamp);

// int set_value(int row_index, int schema_index, double val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

int ret = write_byte(CHUNK_GROUP_HEADER_MARKER);
if (ret != common::E_OK) {
return ret;
}
ret = write_string(device_name);
device_name->serialize(write_stream_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret = device_name->serialize(write_stream_);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 111 to 115
auto cur_device_name = cur_device_name_;
if (*iter.get()->device_name_ == *cur_device_name) {
use_prev_alloc_cgm_ = true;
cur_chunk_group_meta_ = iter.get();
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use cur_device_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

Comment on lines 111 to 130
int write_typed_column(storage::ChunkWriter *chunk_writer,
int64_t *timestamps, bool *col_values,
common::BitMap &col_notnull_bitmap,
int32_t row_count);
common::BitMap &col_notnull_bitmap, int start_idx,
int end_idx);
int write_typed_column(storage::ChunkWriter *chunk_writer,
int64_t *timestamps, int32_t *col_values,
common::BitMap &col_notnull_bitmap,
int32_t row_count);
common::BitMap &col_notnull_bitmap, int start_idx,
int end_idx);
int write_typed_column(storage::ChunkWriter *chunk_writer,
int64_t *timestamps, int64_t *col_values,
common::BitMap &col_notnull_bitmap,
int32_t row_count);
common::BitMap &col_notnull_bitmap, int start_idx,
int end_idx);
int write_typed_column(storage::ChunkWriter *chunk_writer,
int64_t *timestamps, float *col_values,
common::BitMap &col_notnull_bitmap,
int32_t row_count);
common::BitMap &col_notnull_bitmap, int start_idx,
int end_idx);
int write_typed_column(storage::ChunkWriter *chunk_writer,
int64_t *timestamps, double *col_values,
common::BitMap &col_notnull_bitmap,
int32_t row_count);
common::BitMap &col_notnull_bitmap, int start_idx,
int end_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to use template?

Comment on lines 137 to 138
return E_OUT_OF_RANGE;
}
Copy link
Contributor

@ColinLeeo ColinLeeo Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果row_index 是 uint32_t的话,就把max_row_num_设置成uint32_t把,现在有些参数是int32的有些是 uint32的。

@@ -61,13 +65,26 @@ class TsFileWriter {
int register_aligned_timeseries(
const std::string &device_id,
const std::vector<MeasurementSchema *> &measurement_schemas);
void register_table(const std::shared_ptr<TableSchema> &table_schema);
Copy link
Contributor

@ColinLeeo ColinLeeo Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think pass a shared pointer to writer is a good idea. The TableSchema is internal info and shold not pass a sharedPtr directly

Comment on lines 165 to 168
TableSchema(const std::string &table_name,
const std::vector<std::shared_ptr<MeasurementSchema> >
&column_schemas,
const std::vector<ColumnCategory> &column_categories)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can make sure the owner of measurement schema, dont use shared_ptr.

if (type_ == common::STRING) {
return value_.strval_;
} else {
std::cout << "not String type" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Comment on lines 47 to 52
// value_matrix_ = (void **)malloc(sizeof(void *) * schema_count);
// for (size_t c = 0; c < schema_count; c++) {
// const MeasurementSchema &schema = schema_vec_->at(c);
// value_matrix_[c] =
// malloc(get_data_type_size(schema.data_type_) * max_row_num_);
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

template <>
void Tablet::process_val(uint32_t row_index, uint32_t schema_index, common::String val) {
value_matrix_[schema_index].string_data[row_index].dup_from(val, page_arena_);
bitmaps_[schema_index].set(row_index); /* mark as non-null */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bitmaps of a Tablet in the Java edition are marked when the value is null. I think we should make them consistent.

@@ -44,6 +44,10 @@ uint32_t TupleDesc::get_single_row_len(int *erro_code) {
totol_len += sizeof(double);
break;
}
case common::STRING: {
totol_len += DEFAULT_RESERVED_SIZE_OF_STRING + STRING_LEN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totol_len -> total_len

Comment on lines 645 to 652
/*
FORCE_INLINE bool is_same_measurement_name(const common::String &s1, const
common::String &s2)
{
return s1.equal_to(s2);
}
*/
private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

bool read_file_created_;
};

// class TsFileIOReaderSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Comment on lines 285 to 286
std::cout << "ChunkGroupMeta = {insert_target_name_="
<< cgm->insert_target_name_ << ", chunk_meta_list_={";
<< cgm->device_id_ << ", chunk_meta_list_={";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{insert_target_name_= -> {device_id_=

Comment on lines 164 to 165
static const char PATH_SEPARATOR = '.';
static const int DEFAULT_SEGMENT_NUM_FOR_TABLE_NAME = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include /common/constant/tsfile_constant

[](const std::string& a, const std::string& b) {
return a.empty() ? b : a + PATH_SEPARATOR + b;
});
final_segments.push_back(table_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final_segments.push_back(table_name);
final_segments.emplace_back(std::move(table_name));

} else if (RET_FAIL(common::SerializationUtil::write_str(
prop.second, out))) {
}
if (ret != common::E_OK) break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ret != common::E_OK) break;
if (IS_FAIL(ret)) break;

value, in))) {
}
props_.insert(std::make_pair(key, value));
if (ret != common::E_OK) break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ret != common::E_OK) break;
if (IS_FAIL(ret)) break;

const std::string &get_table_name() { return table_name_; }

std::vector<std::string> get_measurement_names() const {
std::vector<std::string> ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret.reserve(column_schemas_.size())

@761417898 761417898 merged commit 00002d6 into apache:develop Jan 22, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants