Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 data remains when deleting datasource. #982

Open
JooyoungJeong opened this issue Oct 15, 2019 · 3 comments
Open

S3 data remains when deleting datasource. #982

JooyoungJeong opened this issue Oct 15, 2019 · 3 comments

Comments

@JooyoungJeong
Copy link

Hi.
I installed using release-4.2.
Hive uses s3Compatible.

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
  name: "operator-metering"
spec:
  disableOCPFeatures: true
  reporting-operator:
    spec:
      config:
        prometheus:
          # update this field
          url: "<IP>"
  hive:
    
  storage:
    type: "hive"
    hive:
      type: "s3Compatible"
      s3Compatible:
        bucket: "metering"
        secretName: "my-aws-secret"
        createBucket: false
        endpoint: "<IP>"
apiVersion: metering.openshift.io/v1
kind: ReportDataSource
metadata:
  name: mlp-test-gpu-datasource
  namespace: metering
spec:
  prometheusMetricsImporter:
    query: |
      metering:mlp_gpu_requests_slots:sum

I created a datasource and confirmed that it is stored in a bucket of s3. And deleted this datasource. It was deleted in the hive table but not in s3.

for obj in client.list_objects_v2(Bucket="metering", Prefix="metering.db/")['Contents']:
    print(obj['Key'])

metering.db/datasource_metering_mlp_test_gpu_datasource/dt=2019-10-14/20191014_120145_00422_hpwrj_fc1d84f3-536e-4a86-9097-2c41b4935e49.snappy
metering.db/datasource_metering_mlp_test_gpu_datasource/dt=2019-10-14/20191014_120157_00424_hpwrj_18644871-9f4d-4781-93a8-374aef4a67a7.snappy

Can I delete the data in s3?

Thank you

@chancez
Copy link
Contributor

chancez commented Oct 15, 2019

We don't use finalizers yet, so if the pods are deleted while the datasource is deleted, data may not be cleaned up, that being said, generally if you delete a datasource you created, it should delete the data when it drops the table which happens when you delete a datasource.

@chancez
Copy link
Contributor

chancez commented Oct 15, 2019

You can manually clean up the data if the datasource was deleted though, that should be fine. You can also drop the table from within Presto or Hive and that will do the same.

@JooyoungJeong
Copy link
Author

@chancez
Thank you for your feedback.
If I delete the datasource, the hive table is deleted. However, the s3 bucket data remained and was manually deleted.
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants