Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.

Add Transcribe service support #1102

Closed
wants to merge 13 commits into from

Conversation

danarbaugh
Copy link

This adds support for all 8 resource types within the Transcribe AWS service.

Testing

The following test script can be used to create one of each resource type:

#!/bin/bash
set -ex

# Generate a random string to use as a bucket name and classifier name suffix
RANDOM_STRING=$(openssl rand -hex 20)
# Generate a random string for shorter names
SHORT_RANDOM_STRING=$(openssl rand -hex 10)

# Set your preferred bucket names
INPUT_BUCKET="input-bucket-$RANDOM_STRING"
OUTPUT_BUCKET="output-bucket-$RANDOM_STRING"

# Get AWS account ID
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
echo "AWS Account ID: $AWS_ACCOUNT_ID"

# Create input bucket
aws s3api create-bucket --bucket $INPUT_BUCKET --no-cli-pager
echo "Input bucket created: s3://$INPUT_BUCKET"

# Create output bucket
aws s3api create-bucket --bucket $OUTPUT_BUCKET --no-cli-pager
echo "Output bucket created: s3://$OUTPUT_BUCKET"

# Create IAM Role for Transcribe access
ROLE_NAME="transcribe-access-role-$RANDOM_STRING"
aws iam create-role \
  --role-name $ROLE_NAME \
  --no-cli-pager \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "transcribe.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }'

# Attach AmazonS3FullAccess managed policy to IAM role (for demo purposes)
aws iam attach-role-policy \
  --role-name $ROLE_NAME \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

echo 'Waiting 30 seconds for IAM role to propagate...'
sleep 30

cat <<EOF > test-category.json
{
  "CategoryName": "test-category-$SHORT_RANDOM_STRING",
  "Rules": [
    {
      "InterruptionFilter": {
        "AbsoluteTimeRange": {
          "First": 60000
        },
        "Negate": false,
        "ParticipantRole": "CUSTOMER",
        "Threshold": 10000
      }
    },
    {
      "NonTalkTimeFilter": {
        "Negate": false,
        "RelativeTimeRange": {
          "EndPercentage": 80,
          "StartPercentage": 10
        },
        "Threshold": 20000
      }
    },
    {
      "SentimentFilter": {
        "ParticipantRole": "AGENT",
        "Sentiments": [
          "NEGATIVE"
        ]
      }
    },
    {
      "TranscriptFilter": {
        "Negate": true,
        "AbsoluteTimeRange": {
          "First": 10000
        },
        "Targets": [
          "welcome",
          "hello"
        ],
        "TranscriptFilterType": "EXACT"
      }
    }
  ]
}
EOF

# create call analytics category
aws transcribe create-call-analytics-category \
  --cli-input-json file://test-category.json \
  --no-cli-pager
# remove temp json file
rm test-category.json

# start call analytics job
## get a sample wav to use
wget https://www2.cs.uic.edu/~i101/SoundFiles/preamble.wav
aws s3 cp preamble.wav s3://$INPUT_BUCKET/
## start the job
aws transcribe start-call-analytics-job \
  --call-analytics-job-name "test-job-$SHORT_RANDOM_STRING" \
  --media "MediaFileUri=s3://$INPUT_BUCKET/preamble.wav" \
  --data-access-role-arn arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_NAME \
  --channel-definitions ChannelId=0,ParticipantRole=AGENT ChannelId=1,ParticipantRole=CUSTOMER \
  --no-cli-pager
# remove temp sound file(s)
rm preamble.wav*

# create a language model
## make placeholder training data in the input bucket
cat <<EOF > words.txt
The quick brown fox jumps over the lazy dog
EOF
aws s3 cp words.txt s3://$INPUT_BUCKET/my-clm-training-data/
## create the language model
aws transcribe create-language-model \
  --base-model-name NarrowBand \
  --model-name my-first-language-model-$SHORT_RANDOM_STRING \
  --input-data-config S3Uri=s3://$INPUT_BUCKET/my-clm-training-data/,DataAccessRoleArn=arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_NAME \
  --language-code en-US \
  --no-cli-pager
# remove temp words file
rm words.txt

# start medical transcription job
## get a sample sound file to use
wget https://www.hpisum.com/samples/ESL-Cardio-sample.wav
aws s3 cp ESL-Cardio-sample.wav s3://$INPUT_BUCKET/
## start the job
aws transcribe start-medical-transcription-job \
  --medical-transcription-job-name "test-med-job-$SHORT_RANDOM_STRING" \
  --language-code en-US \
  --media "MediaFileUri=s3://$INPUT_BUCKET/ESL-Cardio-sample.wav" \
  --output-bucket-name $OUTPUT_BUCKET \
  --specialty PRIMARYCARE \
  --type DICTATION \
  --no-cli-pager
# remove temp sound file(s)
rm ESL-Cardio-sample.wav*

# create medical vocabulary
## create a text file with the vocabulary
cat <<EOF > my-medical-vocab.txt
Phrase,SoundsLike,IPA,DisplayAs
Los-Angeles,,l ɔ s æ n ʤ ə l ə s,Los Angeles
Eva-Maria,ay-va-ma-ree-ah,,
A.B.C.-s,ay-bee-sees,,ABCs
Amazon-dot-com,,,Amazon.com
C.L.I.,,s i ɛ l aɪ,CLI
Andorra-la-Vella,ann-do-rah-la-bay-ah,,Andorra la Vella
Dynamo-D.B.,,,DynamoDB
V.X.-zero-two,,,VX02
V.X.-zero-two-Q.,,,VX02Q
EOF
aws s3 cp my-medical-vocab.txt s3://$INPUT_BUCKET/
## create the vocabulary
aws transcribe create-medical-vocabulary \
  --vocabulary-name "test-med-vocab-$SHORT_RANDOM_STRING" \
  --language-code en-US \
  --vocabulary-file-uri "s3://$INPUT_BUCKET/my-medical-vocab.txt" \
  --no-cli-pager
# remove temp medical vocab file
rm my-medical-vocab.txt

# start transcription job
aws transcribe start-transcription-job \
  --transcription-job-name "test-job-$SHORT_RANDOM_STRING" \
  --media "MediaFileUri=s3://$INPUT_BUCKET/preamble.wav" \
  --language-code en-US \
  --no-cli-pager

# create a vocabulary
aws transcribe create-vocabulary \
  --vocabulary-name "test-vocab-$SHORT_RANDOM_STRING" \
  --language-code en-US \
  --phrases "hello" "world" "how are you" \
  --no-cli-pager

# create a vocabulary filter
aws transcribe create-vocabulary-filter \
  --vocabulary-filter-name "test-vocab-filter-$SHORT_RANDOM_STRING" \
  --language-code en-US \
  --words "hello" "world" "how" "are" "you" \
  --no-cli-pager

@danarbaugh danarbaugh requested a review from a team as a code owner September 8, 2023 22:07
@ekristen
Copy link
Contributor

ekristen commented Oct 2, 2024

This is being implemented via ekristen/aws-nuke#359

If you have a chance, please check it out and let us know if you run into an issues by opening an issue over on the fork.


Please see the copy of the notice from the README about the deprecation of this project. Sven was kind enough to grant me access to help triage and close issues and pull requests that have already been addressed in the actively maintained fork. Some additional information is located in the welcome issue for more information.

Caution

This repository for aws-nuke is no longer being actively maintained. We recommend users to switch to the actively maintained fork of this project at ekristen/aws-nuke.
We appreciate all the support and contributions we've received throughout the life of this project. We believe that the fork will continue to provide the functionality and support that you have come to expect from aws-nuke.
Please note that this deprecation means we will not be addressing issues, accepting pull requests, or making future releases from this repository.
Thank you for your understanding and support.

@ekristen ekristen closed this Oct 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants