Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 76 additions & 25 deletions commit-reach.c
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@
#define PARENT2 (1u<<17)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/commit-reach.c b/commit-reach.c
> index d3a9b3ed6f..c16d4b061c 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -17,8 +17,9 @@
>  #define PARENT2		(1u<<17)
>  #define STALE		(1u<<18)
>  #define RESULT		(1u<<19)
> +#define ENQUEUED	(1u<<20)
>  
> -static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
> +static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT | ENQUEUED);
> ...
> diff --git a/object.h b/object.h
> index d814647ebe..05cbf728e9 100644
> --- a/object.h
> +++ b/object.h
> @@ -74,7 +74,7 @@ void object_array_init(struct object_array *array);
>   * bundle.c:                                        16
>   * http-push.c:                          11-----14
>   * commit-graph.c:                                15
> - * commit-reach.c:                                  16-----19
> + * commit-reach.c:                                  16-------20
>   * builtin/last-modified.c:                         1617
>   * sha1-name.c:                                              20
>   * list-objects-filter.c:                                      21

Not directly the fault of this series, but we'd need to audit and
update this table of bit assignment to match more recent reality.

For example, there no longer exists sha1-name.c but the table claims
that bit 20 is in use for its own purpose, and it being stale makes
it harder to audit and ensure that this new use would not crash with
these existing uses (note. there are other uses of bit 20 in other
subsystems).

FWIW, object-name.c, which was formerly known as sha1-name.c, uses
the bit 20 as ONELINE_SEEN bit, which is used to turn textual object
names like :/string (i.e., commit with that string in its message)
into raw object name, and bit 20 is cleared from all the objects
involved in the search before the helper function returns.
Presumably, once commit-reach.c starts queueing commits and reuses
this bit for its own purpose, we will never try to parse a textual
commit object name to clobber what we thought is ENQUEUED bit,
breaking the code introduced here, so we are probably safe against
its use.

I didn't check all other uses of bit 20, though.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 5/24/26 7:40 PM, Junio C Hamano wrote:
> "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> >> diff --git a/commit-reach.c b/commit-reach.c
>> index d3a9b3ed6f..c16d4b061c 100644
>> --- a/commit-reach.c
>> +++ b/commit-reach.c
>> @@ -17,8 +17,9 @@
>>   #define PARENT2		(1u<<17)
>>   #define STALE		(1u<<18)
>>   #define RESULT		(1u<<19)
>> +#define ENQUEUED	(1u<<20)
>>   >> -static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
>> +static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT | ENQUEUED);
>> ...
>> diff --git a/object.h b/object.h
>> index d814647ebe..05cbf728e9 100644
>> --- a/object.h
>> +++ b/object.h
>> @@ -74,7 +74,7 @@ void object_array_init(struct object_array *array);
>>    * bundle.c:                                        16
>>    * http-push.c:                          11-----14
>>    * commit-graph.c:                                15
>> - * commit-reach.c:                                  16-----19
>> + * commit-reach.c:                                  16-------20
>>    * builtin/last-modified.c:                         1617
>>    * sha1-name.c:                                              20
>>    * list-objects-filter.c:                                      21
> > Not directly the fault of this series, but we'd need to audit and
> update this table of bit assignment to match more recent reality.
> > For example, there no longer exists sha1-name.c but the table claims
> that bit 20 is in use for its own purpose, and it being stale makes
> it harder to audit and ensure that this new use would not crash with
> these existing uses (note. there are other uses of bit 20 in other
> subsystems).

It would be worth adding an update patch before this patch, that
only makes these adjustments

> FWIW, object-name.c, which was formerly known as sha1-name.c, uses
> the bit 20 as ONELINE_SEEN bit, which is used to turn textual object
> names like :/string (i.e., commit with that string in its message)
> into raw object name, and bit 20 is cleared from all the objects
> involved in the search before the helper function returns.

This appears to me like the only interaction that _could_ have
overlap with paint_down_to_common().

> Presumably, once commit-reach.c starts queueing commits and reuses
> this bit for its own purpose, we will never try to parse a textual
> commit object name to clobber what we thought is ENQUEUED bit,
> breaking the code introduced here, so we are probably safe against
> its use.
> > I didn't check all other uses of bit 20, though.

FLAG_LINK in builtin/index-pack.c and FLAG_OPEN in
builtin/unpack-objects.c both seem to be completely independent from
this use in commit-reach.c.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kristofer Karlsson wrote on the Git mailing list (how to reply to this email):

I ran an audit of the flag allocation table and found three stale entries:

1. sha1-name.c was renamed to object-name.c.
2. builtin/show-branch.c uses bits 0 and 2-28, not 0-26.
3. negotiator/skipping.c is missing — it uses bits 2-5 like
negotiator/default.c, with ADVERTISED on bit 3 instead of
COMMON_REF.

I have a fixup commit ready - mini-preview below. I can submit
it on top of this patchset or create a separate patch if you prefer that?

+ * negotiator/skipping.c: 2--5
- * sha1-name.c:                                         20
+ * object-name.c:                                       20
- * builtin/show-branch.c: 0-------------------------------------------26
+ * builtin/show-branch.c: 0-----------------------------------------------28

While doing the audit I noticed that reasoning about flag safety is
currently entirely manual. Would there be interest in something more
systematic (e.g. runtime registration/assertion, dynamic allocation or static
analysis of flag usage)? I have some local work on that already, but I was
not sure if this was something worth spending time on or not.

- Kristofer


On Mon, 25 May 2026 at 03:43, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/24/26 7:40 PM, Junio C Hamano wrote:
> > "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
> > writes:
> >
> >> diff --git a/commit-reach.c b/commit-reach.c
> >> index d3a9b3ed6f..c16d4b061c 100644
> >> --- a/commit-reach.c
> >> +++ b/commit-reach.c
> >> @@ -17,8 +17,9 @@
> >>   #define PARENT2            (1u<<17)
> >>   #define STALE              (1u<<18)
> >>   #define RESULT             (1u<<19)
> >> +#define ENQUEUED    (1u<<20)
> >>
> >> -static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
> >> +static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT | ENQUEUED);
> >> ...
> >> diff --git a/object.h b/object.h
> >> index d814647ebe..05cbf728e9 100644
> >> --- a/object.h
> >> +++ b/object.h
> >> @@ -74,7 +74,7 @@ void object_array_init(struct object_array *array);
> >>    * bundle.c:                                        16
> >>    * http-push.c:                          11-----14
> >>    * commit-graph.c:                                15
> >> - * commit-reach.c:                                  16-----19
> >> + * commit-reach.c:                                  16-------20
> >>    * builtin/last-modified.c:                         1617
> >>    * sha1-name.c:                                              20
> >>    * list-objects-filter.c:                                      21
> >
> > Not directly the fault of this series, but we'd need to audit and
> > update this table of bit assignment to match more recent reality.
> >
> > For example, there no longer exists sha1-name.c but the table claims
> > that bit 20 is in use for its own purpose, and it being stale makes
> > it harder to audit and ensure that this new use would not crash with
> > these existing uses (note. there are other uses of bit 20 in other
> > subsystems).
>
> It would be worth adding an update patch before this patch, that
> only makes these adjustments
>
> > FWIW, object-name.c, which was formerly known as sha1-name.c, uses
> > the bit 20 as ONELINE_SEEN bit, which is used to turn textual object
> > names like :/string (i.e., commit with that string in its message)
> > into raw object name, and bit 20 is cleared from all the objects
> > involved in the search before the helper function returns.
>
> This appears to me like the only interaction that _could_ have
> overlap with paint_down_to_common().
>
> > Presumably, once commit-reach.c starts queueing commits and reuses
> > this bit for its own purpose, we will never try to parse a textual
> > commit object name to clobber what we thought is ENQUEUED bit,
> > breaking the code introduced here, so we are probably safe against
> > its use.
> >
> > I didn't check all other uses of bit 20, though.
>
> FLAG_LINK in builtin/index-pack.c and FLAG_OPEN in
> builtin/unpack-objects.c both seem to be completely independent from
> this use in commit-reach.c.
>
> Thanks,
> -Stolee
>
>
>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

Kristofer Karlsson <krka@spotify.com> writes:

> While doing the audit I noticed that reasoning about flag safety is
> currently entirely manual. Would there be interest in something more
> systematic (e.g. runtime registration/assertion, dynamic allocation or static
> analysis of flag usage)? I have some local work on that already, but I was
> not sure if this was something worth spending time on or not.

If there weren't existing code that are so tied to their current
uses of fixed flag bits and assumption that nobody else uses these
bits outside their intended use, I'd love to have any of these.
Uncolliding and unbounded number of usable bits per object that are
*fast* to access would be good (and commit-slab was an attempt to
introduce a framework that can be used as the basis for such a
system).  Independent of that, if we can statically analyze the uses
of these bits to prove that the same flag bits are never used at the
same time for colliding purposes, that would really be valuable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kristofer Karlsson wrote on the Git mailing list (how to reply to this email):

Good catch Jeff! I think it's possible that I missed the flag cleanup case
here, but it's also possible that I got lucky and it worked anyway.
That said, I think the observation in the other email thread/commit is key
here. I will reply back in that one, but it seems like this can all be
simplified using Jeff's idea with an amortized O(1) solution by caching a
known non-stale entry in the queue, and thus becomes obsolete. I will post
a new patchset when the discussion slows down.

As for general flag management, I will spend some more time thinking about it.
I don't fully trust static code analysis to work, but some cheap assertion
based model might give a nice trade-off.

Thanks for all the feedback!
- Kristofer

On Mon, 25 May 2026 at 09:17, Junio C Hamano <gitster@pobox.com> wrote:
>
> Kristofer Karlsson <krka@spotify.com> writes:
>
> > While doing the audit I noticed that reasoning about flag safety is
> > currently entirely manual. Would there be interest in something more
> > systematic (e.g. runtime registration/assertion, dynamic allocation or static
> > analysis of flag usage)? I have some local work on that already, but I was
> > not sure if this was something worth spending time on or not.
>
> If there weren't existing code that are so tied to their current
> uses of fixed flag bits and assumption that nobody else uses these
> bits outside their intended use, I'd love to have any of these.
> Uncolliding and unbounded number of usable bits per object that are
> *fast* to access would be good (and commit-slab was an attempt to
> introduce a framework that can be used as the basis for such a
> system).  Independent of that, if we can statically analyze the uses
> of these bits to prove that the same flag bits are never used at the
> same time for colliding purposes, that would really be valuable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff King wrote on the Git mailing list (how to reply to this email):

On Mon, May 25, 2026 at 09:53:09AM +0200, Kristofer Karlsson wrote:

> Good catch Jeff! I think it's possible that I missed the flag cleanup case
> here, but it's also possible that I got lucky and it worked anyway.

Well, you did add it to ALL_FLAGS, so it might have been your
subconscious making you lucky. :)

> That said, I think the observation in the other email thread/commit is key
> here. I will reply back in that one, but it seems like this can all be
> simplified using Jeff's idea with an amortized O(1) solution by caching a
> known non-stale entry in the queue, and thus becomes obsolete. I will post
> a new patchset when the discussion slows down.

Nifty, thanks.

> As for general flag management, I will spend some more time thinking about it.
> I don't fully trust static code analysis to work, but some cheap assertion
> based model might give a nice trade-off.

I think it would be really nice if we had per-operation flags kept
outside of the structs completely. If you're a masochist, I fiddled
around a bit with using a hash instead in this thread:

  https://lore.kernel.org/git/20250826055210.GA1031277@coredump.intra.peff.net/

It's sadly (but not surprisingly) quite slow. I do wonder how a slab
would work there, but it would take a bit more surgery. We only allocate
slab ids for commits, and we'd have to do so for all objects if we want
to hold flags.

Probably a dead-end, but it would be neat if all of these flag
allocation worries just went away.

-Peff

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff King wrote on the Git mailing list (how to reply to this email):

On Sun, May 24, 2026 at 05:42:18PM +0000, Kristofer Karlsson via GitGitGadget wrote:

> +static void maybe_enqueue(struct prio_queue *queue, struct commit *c)
> +{
> +	if (c->object.flags & ENQUEUED)
> +		return;
> +	c->object.flags |= ENQUEUED;
> +	prio_queue_put(queue, c);
> +}

OK, so we mark each commit with ENQUEUED when we queue it, and then...

> @@ -83,6 +92,8 @@ static int paint_down_to_common(struct repository *r,
>  		int flags;
>  		timestamp_t generation = commit_graph_generation(commit);
>  
> +		commit->object.flags &= ~ENQUEUED;
> +

...clear that when we pop it. But the loop may terminate early before
popping everything, and we get to this cleanup code at the end:

	clear_prio_queue(&queue);

When we drop all of those queue elements, they'll all be left with the
ENQUEUED flag set. Should we clear those?

The ahead_behind() variant doesn't have the same problem, because it
uses PARENT2 to check for queueing, and then does:

	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
	repo_clear_commit_marks(r, PARENT2 | STALE);

So it's cleaning up both flags, whereas paint_down_to_common() is
already leaving the STALE flag set. I'm not sure how much that matters
(or if it is even an intentional thing communicated to the caller). But
now we'd be adding ENQUEUED.

-Peff

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff King wrote on the Git mailing list (how to reply to this email):

On Mon, May 25, 2026 at 03:01:14AM -0400, Jeff King wrote:

> When we drop all of those queue elements, they'll all be left with the
> ENQUEUED flag set. Should we clear those?
> 
> The ahead_behind() variant doesn't have the same problem, because it
> uses PARENT2 to check for queueing, and then does:
> 
> 	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
> 	repo_clear_commit_marks(r, PARENT2 | STALE);
> 
> So it's cleaning up both flags, whereas paint_down_to_common() is
> already leaving the STALE flag set. I'm not sure how much that matters
> (or if it is even an intentional thing communicated to the caller). But
> now we'd be adding ENQUEUED.

Ah, hmm. We do clear flags in the callers using clear_commit_marks(),
which walks down parent pointers until nobody has a flag we care about
(from all_flags). I'm not 100% sure that ENQUEUED flags will always be
caught that way, but I think the reasoning is roughly: every thing we
queue will either have PARENT1 or PARENT2 set, so we'll keep walking and
clearing flags until we stop seeing those.

-Peff

#define STALE (1u<<18)
#define RESULT (1u<<19)
#define ENQUEUED (1u<<20)

static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT | ENQUEUED);

static int compare_commits_by_gen(const void *_a, const void *_b)
{
Expand All @@ -39,14 +40,60 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
return 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 5/24/26 1:42 PM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> > paint_down_to_common() terminates when every commit remaining in its
> priority queue is STALE. This was checked by queue_has_nonstale(),
> which performed an O(n) linear scan of the entire queue on every
> iteration, resulting in O(n*m) total overhead where n is the queue
> size and m is the number of commits processed.
> > Replace this with an O(1) nonstale_count that tracks the number of
> non-stale commits currently in the queue. The counter is incremented
> by maybe_enqueue() and decremented on dequeue and by mark_stale()
> when a commit transitions to STALE while still in the queue. Since
> each commit appears at most once (guaranteed by the ENQUEUED flag
> from the previous commit), the counter is exact.

This idea has a lot of merit, but I'm a bit concerned about the
organization of data. My ideas of how to improve things may also
impact patch 1's use of ENQUEUED.

> -static void maybe_enqueue(struct prio_queue *queue, struct commit *c)
> +static void maybe_enqueue(struct prio_queue *queue, struct commit *c,
> +			  int *nonstale_count)
>   {
>   	if (c->object.flags & ENQUEUED)
>   		return;
>   	c->object.flags |= ENQUEUED;
>   	prio_queue_put(queue, c);
> +	if (!(c->object.flags & STALE))
> +		(*nonstale_count)++;
> +}
> +
> +static void mark_stale(struct commit *c, unsigned queued_flag,
> +		       int *nonstale_count)
> +{
> +	if (!(c->object.flags & STALE)) {
> +		if (c->object.flags & queued_flag)
> +			(*nonstale_count)--;
> +		c->object.flags |= STALE;
> +	}
>   }

These two methods have some concerns on my end:

1. We need to store the nonstale count somewhere other than the
   priority queue, even though it's necessarily representing a
   subset of the commits within the queue.

2. mark_stale() needs a queued_flag. (I need to check to see if
   this is indeed changing in multiple callers or should always
   be ENQUEUED).

>   static int queue_has_nonstale(struct prio_queue *queue)
> @@ -68,6 +81,7 @@ static int paint_down_to_common(struct repository *r,
>   {
>   	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
>   	int i;
> +	int nonstale_count = 0;

My preference would be to create a new struct that contains a
prio_queue as a member _and_ a nonstale_count. It could initialize
with compare_commits_by_gen_then_commit_date by default.

The important thing is that consumers of such a "stale-tracking"
queue would not be setting the STALE or ENQUEUED bits themselves,
but instead the queue would be responsible for that.

This could allow us to simplify callers by always assuming we can
"add" an element to the queue and the queue will use its ENQUEUED
bit to prevent duplicates from reaching its internal prio_queue.

Such a data structure could be private to commit-reach.c for now,
since all the methods that would use it seem to be colocated there.

This is a big ask, but I'm interested to see if such an approach
would simplify things here.

Here's a potential breakdown of how to build such a thing in
"small" patches:

1. Create the data structure and update paint_down_to_common and
   ahead_behind to use that structure, but still use the existing
   prio_queue methods on its internal member.

2. Add the ENQUEUED bit and methods on the new struct that add
   that bit as it adds commits to the inner prio_queue. It would
   also ignore commits that already have that bit. (Should it
   also remove the bit as commits are removed from the queue?)

3. Now add the nonstale_count (or stale count?) to the struct and
   have it control the STALE bit modifications, with increasing
   the stale count when ENQUEUED is live, and decreasing the stale
   count as such a STALE object is dequeued.

I like the idea of this being encapsulated within the struct and
its helper methods. But the proof will be in the implementation.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kristofer Karlsson wrote on the Git mailing list (how to reply to this email):

I have been thinking a bit about encapsulation too - the problem is twofold:
1. ENQUEUED is a tag on the commit object but it represents membership
   inside the queue and so we already have an implicit assumption that it only
   matches one queue (at a time).
2. The counter is touched on enqueue/dequeue BUT also when mutating objects -
   that last part is tricky to encapsulate as a part of the queue.

That said, I think if we go in the direction of Jeff's idea with an amortized
O(1) staleness check, this becomes simpler - and we can perhaps do something
to structure _that_ code instead. Something like this perhaps:

struct stale_prio_queue {
  prio_queue pq;
  commit *nonstale_cache;
}

and add the corresponding wrapper functions.

I think the encapsulation idea becomes even stronger with that approach than
with the counter based approach.


- Kristofer

On Mon, 25 May 2026 at 03:59, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/24/26 1:42 PM, Kristofer Karlsson via GitGitGadget wrote:
> > From: Kristofer Karlsson <krka@spotify.com>
> >
> > paint_down_to_common() terminates when every commit remaining in its
> > priority queue is STALE. This was checked by queue_has_nonstale(),
> > which performed an O(n) linear scan of the entire queue on every
> > iteration, resulting in O(n*m) total overhead where n is the queue
> > size and m is the number of commits processed.
> >
> > Replace this with an O(1) nonstale_count that tracks the number of
> > non-stale commits currently in the queue. The counter is incremented
> > by maybe_enqueue() and decremented on dequeue and by mark_stale()
> > when a commit transitions to STALE while still in the queue. Since
> > each commit appears at most once (guaranteed by the ENQUEUED flag
> > from the previous commit), the counter is exact.
>
> This idea has a lot of merit, but I'm a bit concerned about the
> organization of data. My ideas of how to improve things may also
> impact patch 1's use of ENQUEUED.
>
> > -static void maybe_enqueue(struct prio_queue *queue, struct commit *c)
> > +static void maybe_enqueue(struct prio_queue *queue, struct commit *c,
> > +                       int *nonstale_count)
> >   {
> >       if (c->object.flags & ENQUEUED)
> >               return;
> >       c->object.flags |= ENQUEUED;
> >       prio_queue_put(queue, c);
> > +     if (!(c->object.flags & STALE))
> > +             (*nonstale_count)++;
> > +}
> > +
> > +static void mark_stale(struct commit *c, unsigned queued_flag,
> > +                    int *nonstale_count)
> > +{
> > +     if (!(c->object.flags & STALE)) {
> > +             if (c->object.flags & queued_flag)
> > +                     (*nonstale_count)--;
> > +             c->object.flags |= STALE;
> > +     }
> >   }
>
> These two methods have some concerns on my end:
>
> 1. We need to store the nonstale count somewhere other than the
>     priority queue, even though it's necessarily representing a
>     subset of the commits within the queue.
>
> 2. mark_stale() needs a queued_flag. (I need to check to see if
>     this is indeed changing in multiple callers or should always
>     be ENQUEUED).
>
> >   static int queue_has_nonstale(struct prio_queue *queue)
> > @@ -68,6 +81,7 @@ static int paint_down_to_common(struct repository *r,
> >   {
> >       struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> >       int i;
> > +     int nonstale_count = 0;
>
> My preference would be to create a new struct that contains a
> prio_queue as a member _and_ a nonstale_count. It could initialize
> with compare_commits_by_gen_then_commit_date by default.
>
> The important thing is that consumers of such a "stale-tracking"
> queue would not be setting the STALE or ENQUEUED bits themselves,
> but instead the queue would be responsible for that.
>
> This could allow us to simplify callers by always assuming we can
> "add" an element to the queue and the queue will use its ENQUEUED
> bit to prevent duplicates from reaching its internal prio_queue.
>
> Such a data structure could be private to commit-reach.c for now,
> since all the methods that would use it seem to be colocated there.
>
> This is a big ask, but I'm interested to see if such an approach
> would simplify things here.
>
> Here's a potential breakdown of how to build such a thing in
> "small" patches:
>
> 1. Create the data structure and update paint_down_to_common and
>     ahead_behind to use that structure, but still use the existing
>     prio_queue methods on its internal member.
>
> 2. Add the ENQUEUED bit and methods on the new struct that add
>     that bit as it adds commits to the inner prio_queue. It would
>     also ignore commits that already have that bit. (Should it
>     also remove the bit as commits are removed from the queue?)
>
> 3. Now add the nonstale_count (or stale count?) to the struct and
>     have it control the STALE bit modifications, with increasing
>     the stale count when ENQUEUED is live, and decreasing the stale
>     count as such a STALE object is dequeued.
>
> I like the idea of this being encapsulated within the struct and
> its helper methods. But the proof will be in the implementation.
>
> Thanks,
> -Stolee
>

}

static int queue_has_nonstale(struct prio_queue *queue)
/*
* A prio_queue with O(1) termination check. 'max_nonstale' tracks
* the lowest-priority non-stale commit enqueued so far; once it is
* popped, every remaining entry is known to be STALE.
*/
struct nonstale_queue {
struct prio_queue pq;
struct commit *max_nonstale;
};

static void nonstale_queue_put(struct nonstale_queue *queue,
struct commit *c)
{
for (size_t i = 0; i < queue->nr; i++) {
struct commit *commit = queue->array[i].data;
if (!(commit->object.flags & STALE))
return 1;
}
return 0;
struct commit *old = queue->max_nonstale;

prio_queue_put(&queue->pq, c);
if (c->object.flags & STALE)
return;
if (!old || queue->pq.compare(old, c, queue->pq.cb_data) <= 0)
queue->max_nonstale = c;
}

static struct commit *nonstale_queue_get(struct nonstale_queue *queue)
{
struct commit *commit = prio_queue_get(&queue->pq);

if (commit == queue->max_nonstale)
queue->max_nonstale = NULL;

return commit;
}

static void clear_nonstale_queue(struct nonstale_queue *queue)
{
clear_prio_queue(&queue->pq);
queue->max_nonstale = NULL;
}

static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
struct commit *c)
{
if (c->object.flags & ENQUEUED)
return;
c->object.flags |= ENQUEUED;
nonstale_queue_put(queue, c);
}

static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
{
struct commit *commit = nonstale_queue_get(queue);

if (commit)
commit->object.flags &= ~ENQUEUED;
return commit;
}

/* all input commits in one and twos[] must have been parsed! */
Expand All @@ -57,28 +104,30 @@ static int paint_down_to_common(struct repository *r,
enum merge_base_flags mb_flags,
struct commit_list **result)
{
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct nonstale_queue queue = {
{ compare_commits_by_gen_then_commit_date }
};
int i;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
struct commit_list **tail = result;

if (!min_generation && !corrected_commit_dates_enabled(r))
queue.compare = compare_commits_by_commit_date;
queue.pq.compare = compare_commits_by_commit_date;

one->object.flags |= PARENT1;
if (!n) {
commit_list_append(one, result);
return 0;
}
prio_queue_put(&queue, one);
nonstale_queue_put_dedup(&queue, one);

for (i = 0; i < n; i++) {
twos[i]->object.flags |= PARENT2;
prio_queue_put(&queue, twos[i]);
nonstale_queue_put_dedup(&queue, twos[i]);
}

while (queue_has_nonstale(&queue)) {
struct commit *commit = prio_queue_get(&queue);
while (queue.max_nonstale) {
struct commit *commit = nonstale_queue_get_dedup(&queue);
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);
Expand Down Expand Up @@ -116,7 +165,7 @@ static int paint_down_to_common(struct repository *r,
if ((p->object.flags & flags) == flags)
continue;
if (repo_parse_commit(r, p)) {
clear_prio_queue(&queue);
clear_nonstale_queue(&queue);
commit_list_free(*result);
*result = NULL;
/*
Expand All @@ -132,11 +181,11 @@ static int paint_down_to_common(struct repository *r,
oid_to_hex(&p->object.oid));
}
p->object.flags |= flags;
prio_queue_put(&queue, p);
nonstale_queue_put_dedup(&queue, p);
}
}

clear_prio_queue(&queue);
clear_nonstale_queue(&queue);
commit_list_sort_by_date(result);
return 0;
}
Expand Down Expand Up @@ -1040,11 +1089,11 @@ struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
define_commit_slab(bit_arrays, struct bitmap *);
static struct bit_arrays bit_arrays;

static void insert_no_dup(struct prio_queue *queue, struct commit *c)
static void insert_no_dup(struct nonstale_queue *queue, struct commit *c)
{
if (c->object.flags & PARENT2)
return;
prio_queue_put(queue, c);
nonstale_queue_put(queue, c);
c->object.flags |= PARENT2;
}

Expand All @@ -1069,7 +1118,9 @@ void ahead_behind(struct repository *r,
struct commit **commits, size_t commits_nr,
struct ahead_behind_count *counts, size_t counts_nr)
{
struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
struct nonstale_queue queue = {
{ .compare = compare_commits_by_gen_then_commit_date }
};
size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);

if (!commits_nr || !counts_nr)
Expand All @@ -1092,8 +1143,8 @@ void ahead_behind(struct repository *r,
insert_no_dup(&queue, c);
}

while (queue_has_nonstale(&queue)) {
struct commit *c = prio_queue_get(&queue);
while (queue.max_nonstale) {
struct commit *c = nonstale_queue_get(&queue);
struct commit_list *p;
struct bitmap *bitmap_c = get_bit_array(c, width);

Expand Down Expand Up @@ -1135,10 +1186,10 @@ void ahead_behind(struct repository *r,

/* STALE is used here, PARENT2 is used by insert_no_dup(). */
repo_clear_commit_marks(r, PARENT2 | STALE);
for (size_t i = 0; i < queue.nr; i++)
free_bit_array(queue.array[i].data);
for (size_t i = 0; i < queue.pq.nr; i++)
free_bit_array(queue.pq.array[i].data);
clear_bit_arrays(&bit_arrays);
clear_prio_queue(&queue);
clear_nonstale_queue(&queue);
}

struct commit_and_index {
Expand Down
7 changes: 4 additions & 3 deletions object.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,22 +67,23 @@ void object_array_init(struct object_array *array);
* revision.h: 0---------10 15 23--------28
* fetch-pack.c: 01 67
* negotiator/default.c: 2--5
* negotiator/skipping.c: 2--5
* walker.c: 0-2
* upload-pack.c: 4 11-----14 16-----19
* builtin/blame.c: 12-13
* bisect.c: 16
* bundle.c: 16
* http-push.c: 11-----14
* commit-graph.c: 15
* commit-reach.c: 16-----19
* commit-reach.c: 16-------20
* builtin/last-modified.c: 1617
* sha1-name.c: 20
* object-name.c: 20
* list-objects-filter.c: 21
* bloom.c: 2122
* builtin/fsck.c: 0--3
* builtin/index-pack.c: 2021
* reflog.c: 10--12
* builtin/show-branch.c: 0-------------------------------------------26
* builtin/show-branch.c: 0-----------------------------------------------28
* builtin/unpack-objects.c: 2021
* pack-bitmap.h: 2122
*/
Expand Down
Loading