Add Transcribe service support #1102

danarbaugh · 2023-09-08T22:07:48Z

This adds support for all 8 resource types within the Transcribe AWS service.

Testing

The following test script can be used to create one of each resource type:

#!/bin/bash
set -ex

# Generate a random string to use as a bucket name and classifier name suffix
RANDOM_STRING=$(openssl rand -hex 20)
# Generate a random string for shorter names
SHORT_RANDOM_STRING=$(openssl rand -hex 10)

# Set your preferred bucket names
INPUT_BUCKET="input-bucket-$RANDOM_STRING"
OUTPUT_BUCKET="output-bucket-$RANDOM_STRING"

# Get AWS account ID
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
echo "AWS Account ID: $AWS_ACCOUNT_ID"

# Create input bucket
aws s3api create-bucket --bucket $INPUT_BUCKET --no-cli-pager
echo "Input bucket created: s3://$INPUT_BUCKET"

# Create output bucket
aws s3api create-bucket --bucket $OUTPUT_BUCKET --no-cli-pager
echo "Output bucket created: s3://$OUTPUT_BUCKET"

# Create IAM Role for Transcribe access
ROLE_NAME="transcribe-access-role-$RANDOM_STRING"
aws iam create-role \
  --role-name $ROLE_NAME \
  --no-cli-pager \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "transcribe.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }'

# Attach AmazonS3FullAccess managed policy to IAM role (for demo purposes)
aws iam attach-role-policy \
  --role-name $ROLE_NAME \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

echo 'Waiting 30 seconds for IAM role to propagate...'
sleep 30

cat <<EOF > test-category.json
{
  "CategoryName": "test-category-$SHORT_RANDOM_STRING",
  "Rules": [
    {
      "InterruptionFilter": {
        "AbsoluteTimeRange": {
          "First": 60000
        },
        "Negate": false,
        "ParticipantRole": "CUSTOMER",
        "Threshold": 10000
      }
    },
    {
      "NonTalkTimeFilter": {
        "Negate": false,
        "RelativeTimeRange": {
          "EndPercentage": 80,
          "StartPercentage": 10
        },
        "Threshold": 20000
      }
    },
    {
      "SentimentFilter": {
        "ParticipantRole": "AGENT",
        "Sentiments": [
          "NEGATIVE"
        ]
      }
    },
    {
      "TranscriptFilter": {
        "Negate": true,
        "AbsoluteTimeRange": {
          "First": 10000
        },
        "Targets": [
          "welcome",
          "hello"
        ],
        "TranscriptFilterType": "EXACT"
      }
    }
  ]
}
EOF

# create call analytics category
aws transcribe create-call-analytics-category \
  --cli-input-json file://test-category.json \
  --no-cli-pager
# remove temp json file
rm test-category.json

# start call analytics job
## get a sample wav to use
wget https://www2.cs.uic.edu/~i101/SoundFiles/preamble.wav
aws s3 cp preamble.wav s3://$INPUT_BUCKET/
## start the job
aws transcribe start-call-analytics-job \
  --call-analytics-job-name "test-job-$SHORT_RANDOM_STRING" \
  --media "MediaFileUri=s3://$INPUT_BUCKET/preamble.wav" \
  --data-access-role-arn arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_NAME \
  --channel-definitions ChannelId=0,ParticipantRole=AGENT ChannelId=1,ParticipantRole=CUSTOMER \
  --no-cli-pager
# remove temp sound file(s)
rm preamble.wav*

# create a language model
## make placeholder training data in the input bucket
cat <<EOF > words.txt
The quick brown fox jumps over the lazy dog
EOF
aws s3 cp words.txt s3://$INPUT_BUCKET/my-clm-training-data/
## create the language model
aws transcribe create-language-model \
  --base-model-name NarrowBand \
  --model-name my-first-language-model-$SHORT_RANDOM_STRING \
  --input-data-config S3Uri=s3://$INPUT_BUCKET/my-clm-training-data/,DataAccessRoleArn=arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_NAME \
  --language-code en-US \
  --no-cli-pager
# remove temp words file
rm words.txt

# start medical transcription job
## get a sample sound file to use
wget https://www.hpisum.com/samples/ESL-Cardio-sample.wav
aws s3 cp ESL-Cardio-sample.wav s3://$INPUT_BUCKET/
## start the job
aws transcribe start-medical-transcription-job \
  --medical-transcription-job-name "test-med-job-$SHORT_RANDOM_STRING" \
  --language-code en-US \
  --media "MediaFileUri=s3://$INPUT_BUCKET/ESL-Cardio-sample.wav" \
  --output-bucket-name $OUTPUT_BUCKET \
  --specialty PRIMARYCARE \
  --type DICTATION \
  --no-cli-pager
# remove temp sound file(s)
rm ESL-Cardio-sample.wav*

# create medical vocabulary
## create a text file with the vocabulary
cat <<EOF > my-medical-vocab.txt
Phrase,SoundsLike,IPA,DisplayAs
Los-Angeles,,l ɔ s æ n ʤ ə l ə s,Los Angeles
Eva-Maria,ay-va-ma-ree-ah,,
A.B.C.-s,ay-bee-sees,,ABCs
Amazon-dot-com,,,Amazon.com
C.L.I.,,s i ɛ l aɪ,CLI
Andorra-la-Vella,ann-do-rah-la-bay-ah,,Andorra la Vella
Dynamo-D.B.,,,DynamoDB
V.X.-zero-two,,,VX02
V.X.-zero-two-Q.,,,VX02Q
EOF
aws s3 cp my-medical-vocab.txt s3://$INPUT_BUCKET/
## create the vocabulary
aws transcribe create-medical-vocabulary \
  --vocabulary-name "test-med-vocab-$SHORT_RANDOM_STRING" \
  --language-code en-US \
  --vocabulary-file-uri "s3://$INPUT_BUCKET/my-medical-vocab.txt" \
  --no-cli-pager
# remove temp medical vocab file
rm my-medical-vocab.txt

# start transcription job
aws transcribe start-transcription-job \
  --transcription-job-name "test-job-$SHORT_RANDOM_STRING" \
  --media "MediaFileUri=s3://$INPUT_BUCKET/preamble.wav" \
  --language-code en-US \
  --no-cli-pager

# create a vocabulary
aws transcribe create-vocabulary \
  --vocabulary-name "test-vocab-$SHORT_RANDOM_STRING" \
  --language-code en-US \
  --phrases "hello" "world" "how are you" \
  --no-cli-pager

# create a vocabulary filter
aws transcribe create-vocabulary-filter \
  --vocabulary-filter-name "test-vocab-filter-$SHORT_RANDOM_STRING" \
  --language-code en-US \
  --words "hello" "world" "how" "are" "you" \
  --no-cli-pager

ekristen · 2024-10-02T00:54:45Z

This is being implemented via ekristen/aws-nuke#359

If you have a chance, please check it out and let us know if you run into an issues by opening an issue over on the fork.

Please see the copy of the notice from the README about the deprecation of this project. Sven was kind enough to grant me access to help triage and close issues and pull requests that have already been addressed in the actively maintained fork. Some additional information is located in the welcome issue for more information.

Caution

This repository for aws-nuke is no longer being actively maintained. We recommend users to switch to the actively maintained fork of this project at ekristen/aws-nuke.
We appreciate all the support and contributions we've received throughout the life of this project. We believe that the fork will continue to provide the functionality and support that you have come to expect from aws-nuke.
Please note that this deprecation means we will not be addressing issues, accepting pull requests, or making future releases from this repository.
Thank you for your understanding and support.

danarbaugh added 10 commits September 1, 2023 15:37

Add TranscribeCallAnalyticsCategory resource

3d2e2f2

Add TranscribeCallAnalyticsJob resource

2cf4f1d

Add TranscribeLanguageModel resource

3fccca5

Add TranscribeMedicalTranscriptionJob resource

312bb0f

Add TranscribeMedicalVocabulary resource

18b9bd8

Add TranscribeTranscriptionJob resource

7ecc02c

Add TranscribeVocabulary resource

9b5d127

Add TranscribeVocabularyFilter resource

8b791bf

go fmt formatting fixes

c3dde9c

Merge remote-tracking branch 'origin/main' into CL-625

735dde0

danarbaugh requested a review from a team as a code owner September 8, 2023 22:07

Merge branch 'main' into add-transcribe

1ca9fb0

sstoops approved these changes Sep 29, 2023

View reviewed changes

danarbaugh and others added 2 commits December 4, 2023 12:20

Merge branch 'main' into add-transcribe

6b97b15

Merge branch 'main' into add-transcribe

4832c28

ekristen mentioned this pull request Sep 11, 2024

Issue tracking from rebuy-de/aws-nuke ekristen/aws-nuke#275

Closed

43 tasks

ekristen mentioned this pull request Oct 2, 2024

feat(resource): implement multiple transcribe service resources ekristen/aws-nuke#359

Merged

ekristen closed this Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Transcribe service support #1102

Add Transcribe service support #1102

danarbaugh commented Sep 8, 2023

ekristen commented Oct 2, 2024

Add Transcribe service support #1102

Add Transcribe service support #1102

Conversation

danarbaugh commented Sep 8, 2023

Testing

ekristen commented Oct 2, 2024