Bug: Implement Logic to Handle Result Size and Item Limits in Scripts #153

jonnybottles · 2024-11-27T16:50:20Z

What happened?

The Search-UnifiedAuditLog cmdlet has a maximum result size limit (e.g., 50,000 items). When queries exceed this limit, data retrieval is incomplete, leading to partial data collection. Scripts that retrieve large datasets, such as Search-HawkTenantEXOAuditLog.ps1 and Get-HawkUserMailboxAuditing.ps1, may hit this limit and fail to capture all relevant audit log entries. This limitation affects the accuracy and completeness of the data analysis provided by Hawk.

Steps to Reproduce

Run the Search-HawkTenantEXOAuditLog.ps1 or Get-HawkUserMailboxAuditing.ps1 script in an environment with extensive audit logs.
Observe that the script retrieves only a subset of the expected data.
Note that the maximum result size of Search-UnifiedAuditLog has been reached.
Data beyond the maximum limit is not retrieved, leading to incomplete results.

Hawk Version

3.1.0

Technical Analysis

The Search-UnifiedAuditLog cmdlet has a maximum result size limit (e.g., 50,000 items) per query.
Scripts that request large date ranges or have high activity levels may exceed this limit.
Without logic to handle this limitation, the scripts retrieve only the first set of results up to the maximum limit.
Users are not warned about incomplete data retrieval, leading to potential inaccuracies in analysis.
Manually adjusting query parameters is not user-friendly and may not be practical.

Implementation Plan

Identify Affected Scripts:
- Focus on scripts that may retrieve large datasets, specifically:
  - Search-HawkTenantEXOAuditLog.ps1
  - Get-HawkUserMailboxAuditing.ps1
  - Any other scripts that utilize Search-UnifiedAuditLog and can return large result sets.
Implement Time Interval Breakdown:
- Modify the scripts to break down large queries into smaller time intervals that ensure the result size stays below the maximum limit.
- Determine the optimal time interval (e.g., days, hours) based on the expected volume of data.
- Implement a loop that adjusts the StartDate and EndDate parameters incrementally to cover the entire desired date range.
Use Pagination:
- Utilize pagination parameters such as SessionCommand or NextPage if supported.
- If Search-UnifiedAuditLog supports pagination, implement logic to retrieve all pages of results.
Implement Result Size Checks:
- After each query, check if the number of results is close to the maximum limit.
- If so, reduce the time interval and re-query to ensure no data is missed.
Optimize Performance:
- Avoid excessive API calls by calculating the appropriate time intervals based on previous query results.
- Implement asynchronous processing if possible to improve execution time.
Maintain Data Integrity:
- Ensure that data from multiple queries is combined without duplication.
- Handle overlapping time intervals carefully to avoid missing or duplicating records.
Update User Feedback:
- Provide progress updates to the user during execution.
- Inform users if large datasets are detected and how the script is handling them.
Update Unit Tests:
- Create tests that simulate large datasets to verify that the scripts handle result size limits correctly.
- Ensure that the scripts retrieve complete datasets without errors.
Documentation:
- Update script comments and documentation to explain how the scripts handle large datasets.
- Provide guidance to users on expected execution times for large data volumes.

Acceptance Criteria

Scripts retrieve complete datasets without hitting result size limits.
Data integrity is maintained across multiple query intervals.
Scripts perform efficiently without significant delays.
Users are not required to manually adjust query parameters.
Unit tests are updated to reflect changes and pass successfully.

Additional Notes:

Testing:
- Test the scripts in environments with varying sizes of audit logs.
- Simulate scenarios where the result size limit would be exceeded.
- Measure execution time and optimize where possible.
Dependencies:
- Confirm that Search-UnifiedAuditLog supports any pagination or session parameters used.
- Ensure compatibility with existing modules and versions.

The text was updated successfully, but these errors were encountered:

jonnybottles added type/bug Non-urgent code defect status/backlog In backlog / validated labels Nov 27, 2024

This was referenced Nov 27, 2024

Need to deal with the 50k Item limit #22

Closed

Search-HawkTenantEXOAuditLog Search-AdminAuditLog limit #93

Closed

T0pCyber added this to the Capability Enhancement Phase (5-8 weeks) milestone Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Implement Logic to Handle Result Size and Item Limits in Scripts #153

Bug: Implement Logic to Handle Result Size and Item Limits in Scripts #153

jonnybottles commented Nov 27, 2024

Bug: Implement Logic to Handle Result Size and Item Limits in Scripts #153

Bug: Implement Logic to Handle Result Size and Item Limits in Scripts #153

Comments

jonnybottles commented Nov 27, 2024

What happened?

Steps to Reproduce

Hawk Version

Technical Analysis

Implementation Plan

Acceptance Criteria