Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x-pack][metricbeat][iis] improve error handling on pdh counters in application_pool data stream #42274

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

stefans-elastic
Copy link
Contributor

@stefans-elastic stefans-elastic commented Jan 9, 2025

Proposed commit message

The error occurs initAppPools > calls GetCounterPaths > calls ExpandWildCardPath > calls PdhExpandWildCardPath which returns no data and no error > PdhExpandWildCardPathreturns PdhErrno(syscall.ERROR_NOT_FOUND) error here > the error returns all the way back to initAppPools and since the error type isn't part of this condition an error log appears (logged here).

On why PDH's function returns no data and no error - I didn't find a coverage of this behavior in the docs. My guess is that object/counter exists but has no data in it (for w3wp process).
The fix I suggest is simply adding a check for PdhErrno(syscall.ERROR_NOT_FOUND) error in this condition which prevents those errors causing unnecessary error level log entries.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Screenshots

Before the change (Element not found.failed to expand counter path errors show up in the logs)
Screenshot 2025-01-09 at 2 14 55 PM

After the change (NO Element not found.failed to expand counter path errors show up in the logs)
Screenshot 2025-01-09 at 2 03 47 PM

@stefans-elastic stefans-elastic self-assigned this Jan 9, 2025
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 9, 2025
@stefans-elastic stefans-elastic added Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team bugfix and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 9, 2025
Copy link
Contributor

mergify bot commented Jan 9, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @stefans-elastic? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Jan 9, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 9, 2025
@stefans-elastic stefans-elastic marked this pull request as ready for review January 9, 2025 13:49
@stefans-elastic stefans-elastic requested a review from a team as a code owner January 9, 2025 13:49
@shmsr
Copy link
Member

shmsr commented Jan 9, 2025

diff --git a/x-pack/metricbeat/module/iis/application_pool/reader.go b/x-pack/metricbeat/module/iis/application_pool/reader.go
index 61522ba128..e11e1381f0 100644
--- a/x-pack/metricbeat/module/iis/application_pool/reader.go
+++ b/x-pack/metricbeat/module/iis/application_pool/reader.go
@@ -10,6 +10,7 @@ import (
 	"errors"
 	"fmt"
 	"strings"
+	"syscall"
 
 	"github.com/elastic/beats/v7/metricbeat/helper/windows/pdh"
 	"github.com/elastic/elastic-agent-libs/mapstr"
@@ -111,16 +112,26 @@ func (r *Reader) initAppPools() error {
 		r.log.Info("no running application pools found")
 		return nil
 	}
+
+	isPDHError := func(err error) bool {
+		return errors.Is(err, pdh.PdhErrno(syscall.ERROR_NOT_FOUND)) ||
+			errors.Is(err, pdh.PDH_CSTATUS_NO_COUNTER) ||
+			errors.Is(err, pdh.PDH_CSTATUS_NO_COUNTERNAME) ||
+			errors.Is(err, pdh.PDH_CSTATUS_NO_INSTANCE) ||
+			errors.Is(err, pdh.PDH_CSTATUS_NO_OBJECT)
+	}
+
 	var newQueries []string
 	r.workerProcesses = make(map[string]string)
 	for key, value := range appPoolCounters {
 		childQueries, err := r.query.GetCounterPaths(value)
 		if err != nil {
-			if errors.Is(err, pdh.PDH_CSTATUS_NO_COUNTER) || errors.Is(err, pdh.PDH_CSTATUS_NO_COUNTERNAME) || errors.Is(err, pdh.PDH_CSTATUS_NO_INSTANCE) || errors.Is(err, pdh.PDH_CSTATUS_NO_OBJECT) {
+			if isPDHError(err) {
 				r.log.Infow("Ignoring non existent counter", "error", err,
-					logp.Namespace("application pool"), "query", value)
+					logp.Namespace("application pool"), "query", value,
+				)
 			} else {
-				r.log.Error(err, `failed to expand counter path (query= "%v")`, value)
+				r.log.Errorf(`failed to expand counter path (query= "%v"): %w`, value, err)
 			}
 			continue
 		}

Your changes look but I do have some suggestion to make the change more readable. Also, the log "failed to expand counter path" has a bug i.e., it does have format parameter that accepts %v; now it will. I have fixed it, so could you please make that change as well?

@shmsr
Copy link
Member

shmsr commented Jan 9, 2025

^ Just one comment; rest looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify bugfix Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[IIS] [application_pool] Element Not Found for Application Pools with 'No Managed Code'
2 participants