Skip to content

Commit

Permalink
Merge pull request modelscope#1 from alibaba/internal_sync
Browse files Browse the repository at this point in the history
first sync with internal codes
  • Loading branch information
yxdyc authored Aug 2, 2023
2 parents 99d9b7f + a9ae21c commit b93c065
Show file tree
Hide file tree
Showing 310 changed files with 28,482 additions and 3 deletions.
Binary file added .DS_Store
Binary file not shown.
203 changes: 201 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -178,15 +178,15 @@
APPENDIX: How to apply the Apache License to your work.

To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright 2023 Alibaba

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -199,3 +199,202 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-------------------------------------------------------------------------------

Code in data_juicer/ops/common/helper_func.py, data_juicer/ops/deduplicator/document_deduplicator.py,
data_juicer/ops/deduplicator/document_simhash_deduplicator.py, data_juicer/ops/filter/character_repetition_filter.py,
data_juicer/ops/filter/flagged_words_filter.py, data_juicer/ops/filter/perplexity_filter.py,
data_juicer/ops/filter/special_characters_filter.py, data_juicer/ops/filter/stopwords_filter.py,
data_juicer/ops/filter/word_repetition_filter.py, data_juicer/ops/mapper/punctuation_normalization_mapper.py,
data_juicer/ops/mapper/remove_long_words_mapper.py, app.py is adapted from
https://huggingface.co/spaces/huggingface/text-data-filtering or
https://github.com/bigscience-workshop/data-preparation

Copyright [2021] [Bigscience]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-------------------------------------------------------------------------------

Code in data_juicer/ops/deduplicator/document_minhash_deduplicator.py is
adapted from
https://github.com/bigcode-project/bigcode-dataset

Copyright 2022 bigcode authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-------------------------------------------------------------------------------

Code in data_juicer/ops/mapper/clean_copyright_mapper.py, data_juicer/ops/mapper/clean_html_mapper.py,
data_juicer/ops/mapper/expand_macro_mapper.py, data_juicer/ops/mapper/remove_bibliography_mapper.py,
data_juicer/ops/mapper/remove_comments_mapper.py, data_juicer/ops/mapper/remove_header_mapper.py,
is adapted from
https://github.com/togethercomputer/RedPajama-Data

Copyright 2023 RedPajama authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-------------------------------------------------------------------------------

The implementations of gpt_evaluator in tools/evaluator/gpt_eval/gpt_evaluator.py
is adapted from https://github.com/lm-sys/FastChat (Apache License)

Copyright (c) 2023 The FastChat Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-----------------------------------------------------


The implementations of checkpoint converter in tools/converter/
convert_gpt_to_transformers.py and tools/converter/modeling_megatron_llama.py
are adapted from https://github.com/huggingface/transformers (Apache License)

Copyright (c) 2022 EleutherAI and the HuggingFace Inc. team.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-----------------------------------------------------

Code in thirdparty/Megatron-LM
is adapted from https://github.com/NVIDIA/Megatron-LM

# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

-----------------------------------------------------

Code in thirdparty/helm
is adapted from https://github.com/stanford-crfm/helm (Apache License)

Copyright (c) 2023 The helm Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


-----------------------------------------------------

Code in tests/run.py is adapted from https://github
.com/alibaba/FederatedScope/blob/master/tests/run.py (Apache License)

Copyright (c) 2023 The FederatedScope Team

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


-----------------------------------------------------

Code in utils/logger_utils.py is adapted from https://github.com/MegEngine/
YOLOX/blob/main/yolox/utils/logger.py (Apache License)

Copyright 2021 Megvii, Base Detection

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


Loading

0 comments on commit b93c065

Please sign in to comment.