Efficiently sidestep the performance hit using yield return with nested loops?

Question

0.00/5 (No votes)

See more:

Some background - and why I avoid yield in certain library code I'm writing:

All About Iterators – Yet Another Language Geek[^]

The problem is this:

Say I need to do simple set combinations using loops. In LINQ-ish fashion I'll simply be treating a forward iterable sequence as a collection or a col, below.

so for a union of col a and col b I do a|b

to yield that would look like
Union(a,b) implemented as Figure 1, whose performance has now decreased significantly due to the internals of how this works.

it's bad enough that it has been suggested by brighter minds than mine that a feature be added to C#: yield foreach, as in Figure 2, to both increase readability and sidestep the performance hit required (because it can compose the iterator classes into one)

Right now I hand roll a lot of enumeration stuff, because LINQ performance is unacceptable for extensive use in things like GLR parsers, and part of it is this limitation of yield.

I've nearly wanted to move off of .NET altogether because F# is an odd duck and c# just is causing me a lot of work. To the point where I've considered moving back to my good old C++

This isn't about bit twiddling. I'm not looking to shave clock cycles. This is a design issue that impacts where I can even use LINQ and yield within C# specifically - it limits where I am able to practically use it, because of the nature of the projects I work on which deal in a lot of parsing, decision trees and pattern recognition, and use these with wild abandon everywhere, and nesting is the typical case.

I work around this right now by hand coding quite a bit.

Is anyone else familiar with this problem, and if so do they have any good suggestions where I can continue to use C# with or without LINQ and the modern C# paradigms without hitting performance barriers once I want to employ set based functional programming?

What I have tried:

// Figure 1: union a|b PERFORMANCE SUCKS
IEnumerable Union(IEnumerable a,IEnumerable b) {
  foreach(var x in a) yield return x;
  foreach(var y in b) yield return b;
}

// Figure 2: union a|b basically as good as
// hand rolled, but requires something C# doesn't have
IEnumerable Union(IEnumerable a,IEnumerable b) {
  yield foreach a;
  yield foreach b; 
}

Posted 17-Jan-18 14:59pm

honey the codewitch

Updated 17-Jan-18 15:07pm

v3

Add a Solution

Comments

PIEBALDconsult 17-Jan-18 21:27pm

Don't blame the framework or the language. I avoid Linq, foreach, and many other wasteful things that are designed to keep inexperienced practitioners from becoming proficient.
You can't beat a for loop -- particularly for the kind of stuff you mention.

honey the codewitch 17-Jan-18 22:25pm

this isn't about blaming anything.

it's about not writing enumerator classes by hand anymore if there's a better way to avoid it.

consider me a nihilist when it comes to frameworks and blame.

i don't care who is at fault. i'm just looking for a way to keep using C# even as i basically have to hand-roll a significant subset of what linq and yield provides from scratch an awful lot more than I'd like to.

Alex Schunk 19-Jan-18 8:03am

if you look at the IL code you can see that foreach has an overhead. If you want performance, use for(;;) because it is basically just jumps.
Everything that makes your code nice, clean and easy to read usually comes with overhead.
You have to always decide between maintainability, reusability and performance. You can't have one without impacting the other.
yield shows its power on big collections where you are not going through all of its elements.
You may find what you want in the "unsafe" world of C#.

honey the codewitch 19-Jan-18 11:00am

I don't care about bit twiddling overhead.

Look at the article in the link. This isn't about a little bit of overhead.

It's about time complexity of O(m+n) where m is the number of items in the first sequence and n is the number of items in the second sequence. And it gets worse as you nest, the outermost call is O(m+1). The next call has O((m-1)+1), then O((m-2)+1), ... O(1+1). There are m of these calls so the running time should be O(m^2). Essentially, composing concats together like this causes O(m^2) yield returns to be executed.

(from the article, with performance graphs)

Alex Schunk 19-Jan-18 11:43am

Well... The implementation of the yield looks like this.. This is the overhead I am talking about...

bool IEnumerator.MoveNext()
{
	bool result;
	try
	{
		switch (this.<>1__state)
		{
		case 0:
			this.<>1__state = -1;
			this.<>s__1 = this.sequence1.GetEnumerator();
			this.<>1__state = -3;
			break;
		case 1:
			this.<>1__state = -3;
			this.<item>5__2 = default(T);
			break;
		case 2:
			this.<>1__state = -4;
			this.<item>5__4 = default(T);
			goto IL_101;
		default:
			return false;
		}
		if (this.<>s__1.MoveNext())
		{
			this.<item>5__2 = this.<>s__1.Current;
			this.<>2__current = this.<item>5__2;
			this.<>1__state = 1;
			return true;
		}
		this.<>m__Finally1();
		this.<>s__1 = null;
		this.<>s__3 = this.sequence2.GetEnumerator();
		this.<>1__state = -4;
		IL_101:
		if (!this.<>s__3.MoveNext())
		{
			this.<>m__Finally2();
			this.<>s__3 = null;
			result = false;
		}
		else
		{
			this.<item>5__4 = this.<>s__3.Current;
			this.<>2__current = this.<item>5__4;
			this.<>1__state = 2;
			result = true;
		}
	}
	catch
	{
		this.System.IDisposable.Dispose();
		throw;
	}
	return result;
}

But from looking at this code I can't really explain where the O(m²) come from.

honey the codewitch 19-Jan-18 11:55am

you're more than welcome to argue with guy who on the C# dev team that wrote the article I linked to.

but i don't see the need to reargue it.

Alex Schunk 19-Jan-18 12:32pm

Why should I do that... You asked the question... You are basically saying you want an alternative but no don't because you want yield... I don't know what you really want.
Every sugar you get, you get some downsides... Because yield is nothing else than syntactic sugar.
If you want it effective; create your own.
It's like the ObservableCollection in WPF... It is very slow and adding 1000000 items to it will kill your application. I created my own which can handle that much. No problem.

honey the codewitch 19-Jan-18 12:51pm

you were arguing with Wes Dyer's conclusions, not mine. That's why I said argue it with him.

honey the codewitch 19-Jan-18 11:58am

also the reason for the overhead is that those two inner iterators - also yields, could have been rolled into this class. instead you have 3 classes each running their own state machines. it leads to spikes in the perf graph as Wes Dyer already shows at the link.

unless you want to call his perf graphs a lie

Richard Deeming 19-Jan-18 12:16pm

If you look at the graph, it's fairly flat until you start concatenating ~1500 single-element sequences.

How often does your code do that?

honey the codewitch 19-Jan-18 12:28pm

like is typical with lots of LINQ queries, similarly my code nests these operations deeply.

so yes, it's an issue.

oh, and depending on the collection, we're dealing with things like LR rules so 1500 isn't even uncommon

Alex Schunk 19-Jan-18 12:38pm

It is common knowledge that creating lots of enumerator hits performance hard.
That is why you design your code wisely. In places where you need performance you mostly do it yourself.
Recursive calls for instance with LINQ is not a good idea most of the time.

honey the codewitch 19-Jan-18 12:41pm

i don't use LINQ precisely because of this.

> Right now I hand roll a lot of enumeration stuff, because LINQ performance is unacceptable for extensive use in things like GLR parsers, and part of it is this limitation of yield.

But it's just a lot of work *not* to use it.

Which is why it's looking more and more appealing for me to move away from C# and back to unmanaged code.

meh

Alex Schunk 19-Jan-18 12:48pm

You could actually create a generic extension that deals with this.

honey the codewitch 19-Jan-18 12:50pm

not without rewriting pretty much all of linq AND doing code gen on all iterators instead of relying on yield.

i've considered it.

Alex Schunk 19-Jan-18 12:57pm

So your option would be... Wait for a new C# release... Or move to another more performant languages.

honey the codewitch 19-Jan-18 12:58pm

that's what I assumed.

the question was in service to checking that assumption.

i prefer asking questions before deciding on a conclusion.

Richard Deeming 19-Jan-18 12:15pm

Never mind the performance - your method doesn't do what it claims to do. A union of two sequences is supposed to return the unique elements from both sequences. Your method returns all elements, and should be called Concat.

honey the codewitch 19-Jan-18 12:25pm

fine, it's called concat.

the point still stands. because Union isn't the point. perf is the point.

rename it concat if it helps you answer the question i was asking.

and if you don't have an answer that's fine.

Richard Deeming 19-Jan-18 12:30pm

As I said in my other comment, it's quite a niche case. Most code doesn't concatenate 1500+ small sequences, so it's not generally a problem.

If your code is hitting performance problems because of this, there's not a lot you can do about it. You could see if the C# team[^] are interested in picking up the yield foreach idea; but even if they are, it could be many months before that made it into the main compiler.

Otherwise, you're stuck writing specific iterators to alleviate the performance bottlenecks in your code.

honey the codewitch 19-Jan-18 12:36pm

which is what I do. currently.

and it's a side-effect of programming things that rely heavily on set based operations, like most parser generators, just for example.

and yeah, whether it's LALR(1), LL(x) or even FAs there's going to be a lot of iteration and set functions.

nature of the beast.

typically, i'd be using STL in C++ to do things like this.

but for reasons that are outside the scope of this, I've been moving a lot of research code to C#. That includes compiler/parser stuff.

Like I said, if you don't have an answer, that's okay.

I honestly didn't expect anyone to have one.

But it's a good idea to check expectations, especially give i don't usually muck about with the newer C# language features.

hence why i asked the question.

Alex Schunk 19-Jan-18 12:41pm

Well... I use LINQ a lot and I compensate the performance loss in some places with "Parallel.Foreach".

honey the codewitch 19-Jan-18 12:48pm

if i was writing business logic or something they'd probably be fine, even on the server end.

but what i'm doing falls closer into compiler/parsing tools and learning systems (together) - which unfortunately, tend to nest a lot more, and tend to require a bit more "real time" responsiveness compared to displaying a web search or something.

(different types of performance basically, because of different problem domains)

and while my code isn't doing something like DSP/Digital Signal Processing where truly fast streaming would be necessary, it's basically somewhere between that and the larger, chunkier performance issues in dealing with most applications, especially biz-data based applications like many web apps, which is what a lot of .NETs features are geared for.

Alex Schunk 19-Jan-18 12:53pm

Wouldn't it be better to use Rust or another language that does not use a VM instead?

honey the codewitch 19-Jan-18 12:59pm

i don't think the VM is the problem.

it's more just the way iterator/yield and linq work.

stuff performs fine when i hand roll it.

honey the codewitch 19-Jan-18 13:22pm

part of the reason i'm asking the question is to determine how useful it would be to release the generation tools i built and use to work around this. they're terribly unpolished and work for my narrow scenarios. but they help me get around this issue.

the other reason i'm asking the question is I don't like to use the generated code for that in certain areas of my source, so I've developed some patterns to work around using foreach/yield everwhere, but the patterns look "stupid" unless you know why I do it. they look like anti-patterns. I don't like code that looks like anti-patterns. I get suspicious - even of my own code. =) I also tend to assume I am kinda stupid compared to the people that designed the language. it keeps me humble enough to make assumptions in the right order, to wit, first assume error or ignorance on my part since it's most likely.

Richard Deeming 19-Jan-18 12:36pm

Looks like this has already been requested:
Feature Request: Recursive Iterators (non-quadratic)[^]

Another iterator performance related issue:
ValueEnumerables (fast to code and run)[^]

honey the codewitch 19-Jan-18 12:42pm

helpful! thank you

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)