Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve overloaded ACLs in Postgres leading to "row is too big" error #2650

Open
blarghmatey opened this issue Sep 4, 2024 · 4 comments
Open
Assignees
Labels
bug Something isn't working product:infrastructure Issues related to application and operations infrastructure

Comments

@blarghmatey
Copy link
Member

Description/Context

https://dba.stackexchange.com/questions/341796/how-to-work-around-row-is-too-big-for-role-definitions-without-recreating-the

Plan/Design

After lots of digging and poking, this is the set of queries that I would like to test out for repairing our handicapped databases.

-- Explicitly remove any readonly user ACLs from the relacl row for every table
do $$
declare
stmt TEXT;
r record;
begin
for r in select relname from pg_class where relkind in ('r', 'S') and relnamespace = (select oid from pg_namespace where nspname = 'public') loop
stmt := 'update pg_catalog set relacl = array_agg(trimmed_acl) from unnest((select relacl from pg_class where relname = ''' || quote_ident(r.relname) || ''')) as trimmed_acl where trimmed_acl not in (select (''"'' || rolname || ''"=r/mitxonline'')::aclitem from pg_roles where rolname like ''%readonly%'') where relname = ''' || quote_ident(r.relname) || ''';';
EXECUTE stmt;
end loop;
end; $$
;

-- Manage shdepend entries for readonly
delete from pg_shdepend where refobjid in (select oid from pg_roles where rolname like '%readonly%');

Relevant discussion threads that I've found:

This is still untested, but I think the safest route is to snapshot one of the problematic DBs, spin up an instance from the snapshot, and try out those queries to see if the issue persists.

To verify before and after, you can run

-- Drop all readonly users from the database
do $$
declare
stmt TEXT;
u record;
begin
for u in select usename from pg_catalog.pg_user where usename like '%readonly%' loop
stmt := 'REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM "' || u.usename || '";DROP ROLE "' || u.usename || '";';
EXECUTE stmt;
end loop;
end; $$
;

Prior to the fix it will error out with the row is too big problem, and with any luck it should work afterwards.

@blarghmatey blarghmatey added bug Something isn't working product:infrastructure Issues related to application and operations infrastructure labels Sep 4, 2024
@quazi-h quazi-h self-assigned this Sep 4, 2024
@quazi-h
Copy link
Contributor

quazi-h commented Sep 4, 2024

While I was able to get a snapshot created pretty quickly, I've been having trouble trying to export an RDS snapshot to S3.
I selected AWSServiceRoleForRDS for the IAM role for the export but I keep seeing this error:
The principal export.rds.amazonaws.com isn't allowed to assume the IAM role arn:aws:iam::610119931565:role/aws-service-role/rds.amazonaws.com/AWSServiceRoleForRDS or the IAM role arn:aws:iam::610119931565:role/aws-service-role/rds.amazonaws.com/AWSServiceRoleForRDS doesn't exist.
I saw a suggestion online to add export.rds.amazonaws.com to the allowed assumed roles under Trust relationships but I don't think I have the permissions to do that.

@blarghmatey
Copy link
Member Author

There's no need to export the snapshot to S3. You will want to launch a new RDS instance from the snapshot so that we can replicate the permisisons bug in a separate environment that won't impact any of the running applications.

@quazi-h
Copy link
Contributor

quazi-h commented Sep 5, 2024

I was able to spin up and connect an RDS instance from the snapshot: https://us-east-1.console.aws.amazon.com/rds/home?region=us-east-1#database:id=testing-overloaded-acls;is-cluster=false

I was unable to delete from the dependencies tables (pg_depends / pg_shdepends) and unsuccessful in finding a workaround using existing roles. The pg_catalog tables are owned by rdsadmin which is a protected role. Reassigning and dropping ownership was not permitted.

ERROR:  permission denied to reassign objects
ERROR:  permission denied to drop objects

The drop role command also fails:

ERROR:  role "v-aws-conc-readonly-020Dt0NfQ05nF5bmLSem-1688042833" cannot be dropped because some objects depend on it
DETAIL:  255 objects in database mitxonline

@blarghmatey
Copy link
Member Author

The database surgery approach isn't going to work due to restrictions in RDS and concerns around stability. Instead we will need to go the "export and load" approach to rebuild the databases without all of the current ACL assignments. To that end, we need to identify all of the database instances that are suffering from this problem and determine which tables are affected. We may be able to only dump/load specific tables rather than the entire DB.

@blarghmatey blarghmatey assigned quazi-h and unassigned quazi-h Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working product:infrastructure Issues related to application and operations infrastructure
Projects
None yet
Development

No branches or pull requests

2 participants