Click here to Skip to main content
15,886,362 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
For example we want to calculate mean of a list of numbers where the list is so long. and that numbers when sorted are nearly linear (or we can find a linear Regression Model for data). Mathematically we can aggregate mean by

((arr[0] + arr[length(arr)]) / 2 ) + intercept
Or in the case, linear model is nearly constant (slope coefficient is nearly 1). we can calculate approximately:

mean(arr[n/const]) = mean(arr)
The same concept is applied for the two cases. and is so basic. Is there a way: pattern, function (hopefully in python), or any studies to suggest and that can help will be gratefully welcome; of course such a pattern if exists should be general and not only for the mean case (probably any function or at least aggregate functions like: sum, mean ...). (as I don't have a strong mathematical background, and I'm new to machine learning, please tolerate my ignorance). Please let me know if anything is not clear.

What I have tried:

I have been studying extrapolation, but I can't make up my mind on a working solution, or at least even partially similar research done.
so any hint would be appreciated.
Posted
Updated 25-Aug-17 5:24am
Comments
Richard MacCutchan 24-Aug-17 10:50am    
This is mathematics, not programming.
Dave Kreskowiak 24-Aug-17 11:14am    
I'm not even sure what you're asking. Are you asking is there is a method to avoid looking at every value in an array to generate an approximate mean?
Rick York 24-Aug-17 11:32am    
In my opinion, the complexity of such an algorithm and the sorting would make it less efficient than just calculating the mean itself. That is pretty simple and can be optimized pretty well.

1 solution

Quote:
Is there a form of lazy evaluation where a function (like mean) returns an approximate value when operating on arrays

NO, it is a matter of common sense.
Mean calculation
the algorithm adds all values and divide total by number of values. And it gives exact answer, always, no matter what values are used.
In big O notation, for n values, it takes O(n)+ 1 division.
Quote:
when sorted are nearly linear (or we can find a linear Regression Model for data).

Sorting the dataset takes up to O(n²), so no.
Checking that is 'nearly linear' takes more than O(n), so no.
finding a regression model takes much more than O(n), so no.
Instead of sorting, just find minimum and maximum takes O(2n), so no. And you don't even know if data is 'nearly linear'.
There is plenty of ways to calc a mean, but since all of these algorithm takes more than simply adding all values, they are not used.
They takes much more work and result is wrong.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900