Skip to content

Commit

Permalink
Hindi ITN Support for Cardinal, Decimal, Ordinal, Fraction, Date, Tim…
Browse files Browse the repository at this point in the history
…e, Money and Measure (#223)

* Hindi ITN Support for Cardinal, Decimal, Ordinal, Fraction, Date, Time

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Committing all changes made

Signed-off-by: Tarushi V <[email protected]>

* Updated date.py and added more test cases to cardinal for improved accuracy

Signed-off-by: Tarushi V <[email protected]>

* Updated date.py

Signed-off-by: Tarushi V <[email protected]>

* Added hi to Jenkins and cleanup

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changes and cleanup based on feedback

Signed-off-by: Tarushi V <[email protected]>

* Changes and cleanup based on feedback

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolved conflicts

Signed-off-by: Tarushi V <[email protected]>

* Committing code for measure.py

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* changes to run_evaluate.py

Signed-off-by: Tarushi V <[email protected]>

* Hindi ITN for money.py

Signed-off-by: Tarushi V <[email protected]>

* Changes and cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup date verbalizer

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* pushing .gitignore file from main branch

Signed-off-by: Tarushi V <[email protected]>

---------

Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: tarushi2k2 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
tarushi2k2 and pre-commit-ci[bot] authored Oct 30, 2024
1 parent 3b3c3a3 commit 9aa9118
Show file tree
Hide file tree
Showing 81 changed files with 3,616 additions and 5 deletions.
25 changes: 24 additions & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ pipeline {
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
JA_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-17-24-1'
HI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-29-24-0'
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
}
stages {
Expand Down Expand Up @@ -92,6 +93,23 @@ pipeline {

}
}
stage('L0: Create HI TN/ITN Grammars') {
when {
anyOf {
branch 'main'
changeRequest target: 'main'
}
}
failFast true
parallel {
stage('L0: Hi ITN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --language hi --text="बीस" --cache_dir ${HI_TN_CACHE}'
}
}

}
}

stage('L0: Create DE/ES TN/ITN Grammars') {
when {
Expand Down Expand Up @@ -323,6 +341,11 @@ pipeline {
sh 'CUDA_VISIBLE_DEVICES="" pytest tests/nemo_text_processing/es/ -m "not pleasefixme" --cpu --tn_cache_dir ${ES_TN_CACHE}'
}
}
stage('L1: Run all HI TN/ITN tests (restore grammars from cache)') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" pytest tests/nemo_text_processing/hi/ -m "not pleasefixme" --cpu --tn_cache_dir ${HI_TN_CACHE}'
}
}
stage('L1: Run all Codeswitched ES/EN TN/ITN tests (restore grammars from cache)') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" pytest tests/nemo_text_processing/es_en/ -m "not pleasefixme" --cpu --tn_cache_dir ${ES_EN_TN_CACHE}'
Expand Down Expand Up @@ -476,4 +499,4 @@ pipeline {
cleanWs()
}
}
}
}
17 changes: 17 additions & 0 deletions nemo_text_processing/inverse_text_normalization/hi/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from nemo_text_processing.inverse_text_normalization.hi.taggers.tokenize_and_classify import ClassifyFst
from nemo_text_processing.inverse_text_normalization.hi.verbalizers.verbalize import VerbalizeFst
from nemo_text_processing.inverse_text_normalization.hi.verbalizers.verbalize_final import VerbalizeFinalFst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
एक
दो
तीन
चार
पाँच
छः
:
छह
छे
सात
आठ
नौ
१० दस
११ ग्यारह
१२ बारह
१३ तेरह
१४ चौदह
१५ पन्द्रह
१६ सोलह
१७ सत्रह
१८ अठारह
१९ उन्नीस
२० बीस
२१ इक्कीस
२२ बाईस
२३ तेईस
२४ चौबीस
२५ पच्चीस
२६ छब्बीस
२७ सत्ताईस
२८ अट्ठाईस
२९ उनतीस
३० तीस
३१ इकतीस
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
जनवरी
फ़रवरी
फरवरी
मार्च
अप्रैल
अप्रील
मई
जून
जुलाई
अगस्त
सितंबर
अक्टूबर
नवंबर
दिसंबर
Loading

0 comments on commit 9aa9118

Please sign in to comment.