feat(142): separate out metrics and restructure go routines and functions #137

rhoofard · 2023-10-03T18:00:33Z

No description provided.

codecov · 2023-10-03T18:02:28Z

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (064251d) 27.08% compared to head (f21909d) 39.34%.

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #137       +/-   ##
===========================================
+ Coverage   27.08%   39.34%   +12.25%     
===========================================
  Files           5        6        +1     
  Lines         443      488       +45     
===========================================
+ Hits          120      192       +72     
+ Misses        316      285       -31     
- Partials        7       11        +4

Files	Coverage Δ
...ver/gitproviderreceiver/internal/common/helpers.go	`100.00% <100.00%> (ø)`
.../internal/scraper/githubscraper/graphql_helpers.go	`59.72% <87.50%> (+9.72%)`	⬆️
...r/internal/scraper/githubscraper/github_scraper.go	`17.52% <63.15%> (+14.99%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Ryan Hoofard <42755382+rhoofard@users.noreply.github.com>

Also cleaned up some of the redundant interface conversions for SearchNode as well as what looks like a bug when collecting all the repo data. "data" was being set on each iteration through the pages for repo data but "searchData" was being used.

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

adrielp · 2023-10-13T11:28:42Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

+	ghs *githubScraper,
+	ctx context.Context,
+	client graphql.Client,
+	repos []SearchNodeRepository,
 	now pcommon.Timestamp,
 	pullRequestCh chan []PullRequestNode,


Does this pull request channel actually need to be passed in? Or can it locally be declared within the function, thereby making the call within the function the async call?

I'm thinking these can be changed iteratively

adrielp · 2023-10-13T11:28:52Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

+	ghs *githubScraper,
+	ctx context.Context,
+	client graphql.Client,
+	repos []SearchNodeRepository,
 	now pcommon.Timestamp,
 	pullRequestCh chan []PullRequestNode,
 	waitGroup *sync.WaitGroup,


Same question with waitGroup

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

adrielp · 2023-10-13T11:31:36Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

-			pullRequests = append(pullRequests, pr.Repository.PullRequests.Nodes...)
-
-			prCursor = &pr.Repository.PullRequests.PageInfo.EndCursor
+			ghs.logger.Sugar().Errorf("error getting pr data", zap.Error(err))
 		}
 		pullRequestCh <- pullRequests


getPullRequests by name should return pull requests as a value, not a channel.

Wouldn't this defeat the point of the go routines? Or do you mean keep the function basic but then just add that to the channel in the go routine outside the function?

I think we talked about this in the tech sync today

adrielp · 2023-10-13T12:18:58Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

@@ -194,21 +224,16 @@ func (ghs *githubScraper) processCommits(
 func (ghs *githubScraper) getBranches(
 	ctx context.Context,
 	client graphql.Client,
-	repos []SearchNode,
+	repos []SearchNodeRepository,
 	now pcommon.Timestamp,
 	branchCh chan []BranchNode,


Same comment here about the channels and waitGroup being passed in instead of running an async select within the function and returning the actual branch data.

Not sure what you mean by an async select here but i get the moving out the channel and wait group

adrielp · 2023-10-13T12:27:53Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

 			if ghs.cfg.MetricsBuilderConfig.Metrics.GitRepositoryContributorCount.Enabled {
 				wg1.Add(1)
 				go ghs.getContributorCount(ctx, genClient, work[i], now, &wg1)
 			}
 		}

 		for i := 0; i < opBuf; i++ {
-			go ghs.processPullRequests(ctx, genClient, now, pullRequestCh)
+			go processPullRequests(ghs, ctx, genClient, now, pullRequestCh)


These calls feel off to me and inconsistent with each other. I don't think is actually buffering the channel. https://go.dev/doc/effective_go#channels

Just need to pass the scraper in and they'd look the same. Not sure how the channel wouldn't be buffered though since we are setting it to a size > 0

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper_test.go

adrielp · 2023-10-13T12:29:50Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper_test.go

Nice code coverage improvements here 👍🏼

adrielp · 2023-10-13T12:31:57Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/graphql_helpers.go

 	data, err := getRepoDataBySearch(ctx, client, searchQuery, repoCursor)
 	if err != nil {
 		return nil, err
 	}
 	return data, nil
 }

+type mockClient struct {


Why are providing mockClient to the MakeRequest function? This is for testing only correct? Should go in helpers_test.go

This is only for testing yeah. I thought helpers_test.go was just the same thing but on the gitlab side no?

pkg/receiver/gitproviderreceiver/internal/scraper/gitlabscraper/gitlab_scraper.go

pkg/receiver/gitproviderreceiver/internal/scraper/gitlabscraper/helpers.go

jknight-liatrio · 2023-10-13T13:57:07Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

@@ -194,21 +224,16 @@ func (ghs *githubScraper) processCommits(
 func (ghs *githubScraper) getBranches(


Is there a chance that this function never completes if there are no branches or page numbers?

jknight-liatrio · 2023-10-13T13:58:50Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

-			pullRequests = append(pullRequests, pr.Repository.PullRequests.Nodes...)
-
-			prCursor = &pr.Repository.PullRequests.PageInfo.EndCursor
+			ghs.logger.Sugar().Errorf("error getting pr data", zap.Error(err))
 		}
 		pullRequestCh <- pullRequests


I think there's a chance that this function blocks infinitely if there are no PRs.

Just made a test for that and while it doesn't block, it adds an empty object to the channel which shouldn't affect anything since where they're processed from the channel there's a loop that is based on the size of the object that would be 0. Should still probably just avoid placing empty objects in there though

jknight-liatrio · 2023-10-13T14:08:18Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

@@ -368,10 +389,14 @@ func (ghs *githubScraper) scrape(ctx context.Context) (pmetric.Metrics, error) {

 		for i := 0; i < pages; i++ {
 			results := searchData.GetSearch()
-			searchRepos = append(searchRepos, results.Nodes...)
+			for _, repo := range results.Nodes {


What's the reasoning behind this iterator over results.Nodes?

Its a quick way to append everything in a slice without needing a for-loop

jknight-liatrio · 2023-10-13T14:10:25Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go


 			repoCursor = &searchData.Search.PageInfo.EndCursor
-			data, err = getRepoData(ctx, genClient, sq, ownertype, repoCursor)
+			searchData, err = getRepoData(ctx, genClient, sq, ownertype, repoCursor)


Are we overwriting the existing searchData value that gets used on line 391? Is this intentional?

Yeah this section is weird I just hadn't bothered seeing how to change it since that's how it was when i started working on it. The first bit of this data we grab is outside the for loop and we use that to get the amount of pages we need to iterate through but we also still just have the first page of data, so it gets processed first, then we grab the next page to be processed on the next loop at the end. On the other functions like this, we have a graphql query to get the count first to avoid this so that might be an option here as well, just havnt checked it out yet.

jknight-liatrio · 2023-10-13T14:11:35Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

@@ -381,13 +406,16 @@ func (ghs *githubScraper) scrape(ctx context.Context) (pmetric.Metrics, error) {

 	}

-	if _, ok := data.(*getRepoDataBySearchResponse); ok {
+	if searchRepos != nil {


Is len() a more accurate check here?

a slice of size zero is nil so they're equivalent

jknight-liatrio · 2023-10-13T14:12:20Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

@@ -381,13 +406,16 @@ func (ghs *githubScraper) scrape(ctx context.Context) (pmetric.Metrics, error) {

 	}

-	if _, ok := data.(*getRepoDataBySearchResponse); ok {
+	if searchRepos != nil {

 		var wg1 sync.WaitGroup
 		var opBuf int = 3


What is this opBuf variable intended to do?

Its the size of the buffered channel which here is also meant to be the limit for the amount of go routines pushing or pulling data from it. Its not perfect but if you have equal producers and consumers then they should in theory most of the time have their own space in the channel to push and pull from keeping the throughput as high as possible.

jknight-liatrio · 2023-10-13T14:25:40Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

@@ -381,13 +406,16 @@ func (ghs *githubScraper) scrape(ctx context.Context) (pmetric.Metrics, error) {

 	}

-	if _, ok := data.(*getRepoDataBySearchResponse); ok {
+	if searchRepos != nil {

 		var wg1 sync.WaitGroup
 		var opBuf int = 3

 		chunkSize := (len(searchRepos) + opBuf - 1) / opBuf


I'm a little confused about the algorithm here. Can we add a comment explaining the intended behavior?

This part is just checking if the previous section that gets us all of the repos we want to process data from actually returned anything and if so, continue but otherwise dont bother with the go routine stuff

jknight-liatrio · 2023-10-13T14:29:02Z

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go

@@ -402,15 +430,15 @@ func (ghs *githubScraper) scrape(ctx context.Context) (pmetric.Metrics, error) {

 			wg1.Add(2)


This is probably a stupid question, but why are we using Add(2) instead of Add(1)?

Each additional Add counter corresponds to a Done coming from one of the go routines. So there's two that are guaranteed to happen on each iteration there which is why we do Add(2). If you were to do any less it panicks on a negative waitgroup counter once Done is called one too many times.

adrielp

I think this is decent enough to merge & then iterate on the improvements. Let’s get issues created from the comments that need to be addressed through future iterations and then merge this. Sound good?

adrielp · 2023-10-17T22:35:42Z

~~This is more than a refactor, so when squashing the title needs to be updated to reflect that it’s a feature change with new metrics. @rhoofard~~ I went ahead and updated the title.

rhoofard added 3 commits September 26, 2023 14:04

refactor: updated the goroutines / channels for gitlab scraper

86fa464

Merge branch 'main' into refactor-go-routines

dfb8a6c

refactor: updated go-routines/process functions

2e9845f

github-actions bot added the go label Oct 3, 2023

rhoofard marked this pull request as ready for review October 3, 2023 18:01

rhoofard requested a review from a team as a code owner October 3, 2023 18:01

rhoofard and others added 6 commits October 3, 2023 11:12

fix: added catch for github buffer size

842c265

fix: changed config back to default

3e6f55c

Signed-off-by: Ryan Hoofard <42755382+rhoofard@users.noreply.github.com>

Merge branch 'main' into refactor-go-routines

0414921

Signed-off-by: Ryan Hoofard <42755382+rhoofard@users.noreply.github.com>

Merge branch 'main' into refactor-go-routines

c8f9334

feat: added testing for a generated graphql function

ceb9c74

adrielp reviewed Oct 13, 2023

View reviewed changes

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go Outdated Show resolved Hide resolved

adrielp reviewed Oct 13, 2023

View reviewed changes

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go Outdated Show resolved Hide resolved

adrielp reviewed Oct 13, 2023

View reviewed changes

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper.go Show resolved Hide resolved

adrielp reviewed Oct 13, 2023

View reviewed changes

pkg/receiver/gitproviderreceiver/internal/scraper/githubscraper/github_scraper_test.go Outdated Show resolved Hide resolved

adrielp reviewed Oct 13, 2023

View reviewed changes

pkg/receiver/gitproviderreceiver/internal/scraper/gitlabscraper/gitlab_scraper.go Outdated Show resolved Hide resolved

adrielp reviewed Oct 13, 2023

View reviewed changes

pkg/receiver/gitproviderreceiver/internal/scraper/gitlabscraper/helpers.go Outdated Show resolved Hide resolved

jknight-liatrio reviewed Oct 13, 2023

View reviewed changes

fix: reverted gitlab changes

4ec8194

rhoofard changed the title ~~refactor: updated gitlab go routines~~ refactor(142): updated gitlab go routines Oct 16, 2023

4lch4 linked an issue Oct 16, 2023 that may be closed by this pull request

Add tests to github_scraper.go #142

Closed

rhoofard added 3 commits October 16, 2023 22:28

refactor: moved ChunkSlice to generic helper package

f43cacf

test: updated function name, added additional test

447155d

feat: updated pr count to differentiate open and merged

8b6268c

github-actions bot added the documentation Improvements or additions to documentation label Oct 17, 2023

adrielp approved these changes Oct 17, 2023

View reviewed changes

adrielp changed the title ~~refactor(142): updated gitlab go routines~~ feat(142): separate out metrics and restructure go routines and functions Oct 17, 2023

Merge branch 'main' into refactor-go-routines

f21909d

rhoofard merged commit a68d9cf into main Oct 18, 2023
10 checks passed

rhoofard deleted the refactor-go-routines branch October 18, 2023 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(142): separate out metrics and restructure go routines and functions #137

feat(142): separate out metrics and restructure go routines and functions #137

rhoofard commented Oct 3, 2023

codecov bot commented Oct 3, 2023 •

edited

Loading

adrielp Oct 13, 2023

adrielp Oct 16, 2023

adrielp Oct 13, 2023

adrielp Oct 13, 2023

rhoofard Oct 13, 2023

adrielp Oct 16, 2023

adrielp Oct 13, 2023

rhoofard Oct 13, 2023 •

edited

Loading

adrielp Oct 13, 2023 •

edited

Loading

rhoofard Oct 13, 2023

adrielp Oct 13, 2023

adrielp Oct 13, 2023

rhoofard Oct 13, 2023

jknight-liatrio Oct 13, 2023

jknight-liatrio Oct 13, 2023

rhoofard Oct 13, 2023

jknight-liatrio Oct 13, 2023

rhoofard Oct 13, 2023

jknight-liatrio Oct 13, 2023

rhoofard Oct 13, 2023

jknight-liatrio Oct 13, 2023

rhoofard Oct 13, 2023

jknight-liatrio Oct 13, 2023

rhoofard Oct 13, 2023

jknight-liatrio Oct 13, 2023

rhoofard Oct 13, 2023

jknight-liatrio Oct 13, 2023

rhoofard Oct 13, 2023

adrielp left a comment

adrielp commented Oct 17, 2023 •

edited

Loading

		@@ -194,21 +224,16 @@ func (ghs *githubScraper) processCommits(
		func (ghs *githubScraper) getBranches(

		@@ -402,15 +430,15 @@ func (ghs *githubScraper) scrape(ctx context.Context) (pmetric.Metrics, error) {

		wg1.Add(2)

feat(142): separate out metrics and restructure go routines and functions #137

feat(142): separate out metrics and restructure go routines and functions #137

Conversation

rhoofard commented Oct 3, 2023

codecov bot commented Oct 3, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhoofard Oct 13, 2023 • edited Loading

Choose a reason for hiding this comment

adrielp Oct 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrielp left a comment

Choose a reason for hiding this comment

adrielp commented Oct 17, 2023 • edited Loading

codecov bot commented Oct 3, 2023 •

edited

Loading

rhoofard Oct 13, 2023 •

edited

Loading

adrielp Oct 13, 2023 •

edited

Loading

adrielp commented Oct 17, 2023 •

edited

Loading