Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Implement Logic to Handle Result Size and Item Limits in Scripts #153

Open
jonnybottles opened this issue Nov 27, 2024 · 0 comments
Open
Labels
status/backlog In backlog / validated type/bug Non-urgent code defect

Comments

@jonnybottles
Copy link
Collaborator

What happened?

The Search-UnifiedAuditLog cmdlet has a maximum result size limit (e.g., 50,000 items). When queries exceed this limit, data retrieval is incomplete, leading to partial data collection. Scripts that retrieve large datasets, such as Search-HawkTenantEXOAuditLog.ps1 and Get-HawkUserMailboxAuditing.ps1, may hit this limit and fail to capture all relevant audit log entries. This limitation affects the accuracy and completeness of the data analysis provided by Hawk.

Steps to Reproduce

  1. Run the Search-HawkTenantEXOAuditLog.ps1 or Get-HawkUserMailboxAuditing.ps1 script in an environment with extensive audit logs.
  2. Observe that the script retrieves only a subset of the expected data.
  3. Note that the maximum result size of Search-UnifiedAuditLog has been reached.
  4. Data beyond the maximum limit is not retrieved, leading to incomplete results.

Hawk Version

3.1.0

Technical Analysis

  • The Search-UnifiedAuditLog cmdlet has a maximum result size limit (e.g., 50,000 items) per query.
  • Scripts that request large date ranges or have high activity levels may exceed this limit.
  • Without logic to handle this limitation, the scripts retrieve only the first set of results up to the maximum limit.
  • Users are not warned about incomplete data retrieval, leading to potential inaccuracies in analysis.
  • Manually adjusting query parameters is not user-friendly and may not be practical.

Implementation Plan

  1. Identify Affected Scripts:

    • Focus on scripts that may retrieve large datasets, specifically:
      • Search-HawkTenantEXOAuditLog.ps1
      • Get-HawkUserMailboxAuditing.ps1
      • Any other scripts that utilize Search-UnifiedAuditLog and can return large result sets.
  2. Implement Time Interval Breakdown:

    • Modify the scripts to break down large queries into smaller time intervals that ensure the result size stays below the maximum limit.
    • Determine the optimal time interval (e.g., days, hours) based on the expected volume of data.
    • Implement a loop that adjusts the StartDate and EndDate parameters incrementally to cover the entire desired date range.
  3. Use Pagination:

    • Utilize pagination parameters such as SessionCommand or NextPage if supported.
    • If Search-UnifiedAuditLog supports pagination, implement logic to retrieve all pages of results.
  4. Implement Result Size Checks:

    • After each query, check if the number of results is close to the maximum limit.
    • If so, reduce the time interval and re-query to ensure no data is missed.
  5. Optimize Performance:

    • Avoid excessive API calls by calculating the appropriate time intervals based on previous query results.
    • Implement asynchronous processing if possible to improve execution time.
  6. Maintain Data Integrity:

    • Ensure that data from multiple queries is combined without duplication.
    • Handle overlapping time intervals carefully to avoid missing or duplicating records.
  7. Update User Feedback:

    • Provide progress updates to the user during execution.
    • Inform users if large datasets are detected and how the script is handling them.
  8. Update Unit Tests:

    • Create tests that simulate large datasets to verify that the scripts handle result size limits correctly.
    • Ensure that the scripts retrieve complete datasets without errors.
  9. Documentation:

    • Update script comments and documentation to explain how the scripts handle large datasets.
    • Provide guidance to users on expected execution times for large data volumes.

Acceptance Criteria

  • Scripts retrieve complete datasets without hitting result size limits.
  • Data integrity is maintained across multiple query intervals.
  • Scripts perform efficiently without significant delays.
  • Users are not required to manually adjust query parameters.
  • Unit tests are updated to reflect changes and pass successfully.

Additional Notes:

  • Testing:

    • Test the scripts in environments with varying sizes of audit logs.
    • Simulate scenarios where the result size limit would be exceeded.
    • Measure execution time and optimize where possible.
  • Dependencies:

    • Confirm that Search-UnifiedAuditLog supports any pagination or session parameters used.
    • Ensure compatibility with existing modules and versions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/backlog In backlog / validated type/bug Non-urgent code defect
Projects
None yet
Development

No branches or pull requests

2 participants