
You shouldn't kick s  they have long memories and will kick back. And they are bigger than you!
"I have no idea what I did, but I'm taking full credit for it."  ThisOldTony
"Common sense is so rare these days, it should be classified as a super power"  Random Tshirt
AntiTwitter: @DalekDave is now a follower!





I'm reckless like that.
Real programmers use butterflies





Color is just a pigment of your imagination.





Cervantes' Don Quixote renamed (reimagined as a warhorse fit for his new identity) the wornout horse he rode in his knighterrant adventures 'Rocinante' [1] ... today, that name came to mind as i wrestled with the DataGridView control whose endless quirks, and gargantuan [2] smorgasbord of settings, properties, and, events, induce vertigo, or worse ...
Each of the old WinForm controls, back in the neolithicperiod wrappers around a COM core, is a little universe of idiosyncrasies only partially connected to its housemates by semantic consistencies.
Perhaps the flipside of their virtual virtue is that their limitations catalyzed a thriving 3rd. party controls market ? Cue: "Just enough for the City" [^]. And, truth be told, the expensive suites from Telerik, et. al., also have very steep learning curves.
But, try as i might, i never succeeded in imagining myself transformed from codedrudge to knight
Usual disclaimer: cup is halffull.
~
[1] A complex pun [^] ... also the name of the Martian ship in 'The Expanse.' My enthusiasm for the Quixote has been magnified by Edith Grossman's modern translation (see what A.S. Byatt says about that: [^]). fyi: there is an audiobook version of Grossman's translation read by the talented George Guidall: [^].
[2] If you read the story of Gargantua's birth from the ear of Madam Gargamelle [^] ... after you throwup, you may appreciate the depth of queasiness i imply here.
«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali





BillWoodruff wrote: also the name of the Martian ship in 'The Expanse.'
Not Martian: "Legitimate Salvage"[^]
"I have no idea what I did, but I'm taking full credit for it."  ThisOldTony
"Common sense is so rare these days, it should be classified as a super power"  Random Tshirt
AntiTwitter: @DalekDave is now a follower!





"In the novel series The Expanse and its TV series adaptation, the Rocinante is the new name given to a Martian gunship that becomes the primary setting for much of the series." Wikipedia
my favorite sendup of Wikipedia (scene from 'Jennifer's Body'): [^]
«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali





Now just hold up a second. Of all the folks on CP, Bill is our resident literati (literatus?). Anyway, he's good with words.
If he omits an adverbBillWoodruff should have wrote: also the name of the originally Martian ship in 'The Expanse' we should give him the benefit of the doubt.
Software Zen: delete this;





If you ever wondered why your nose is located in the middle of your face, it's because it's your scenter.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
Zachris Topelius






Is the processor just that slow, or do you think that profiling your code might uncover things that could be sped up to reach an acceptable level?





It's difficult to profile because it's cross compiled to run on an IoT device, where I don't have access to profiling.
I *can* do it but it involves setting up a lot of test code in order to use GFX from my PC.
Besides, I know where it's taking the time, and there's not much I can do about it.
The source machine is running at between 160MHz and 240MHz and the operation cannot be readily parallelized without using RAM i don't have.
Algorithmically, I've optimized it about as much as I can. The mixing plan for finding two dither colors is like O(log N) which isn't that bad. The problem is I have to do it for 200x200 pixels. And for my color display I have to do it twice per frame because it's a 3 color display organized in two monochrome planes  one white, and one red, and I don't have the RAM to store the frame between rendering the two planes.
On a superscalar PC running at GHz speeds this is no problem, and I *thought* it shouldn't be a problem even for this machine, but I guess I dramatically underestimated the time it takes this algorithm to run.
I actually have two algos. The first one is similar to the one used by photoshop (but different enough that it avoids patent infringement)
The second one is a much faster, simpler algorithm.
Color dithers are apparently not as easy as I thought.
Real programmers use butterflies





I knew nothing of dithering, so I just read about it. Interesting. Eventually you'll be a numerical analysis weenie.
It also means that what I was going to ask, namely whether you could cache frequently used results, seems to make no sense.





Yeah. My current plan is to stick with nearest matching color only for color displays. I'm going to try dithering once again on black and white since those algorithms run much faster.
Edit: Black and white dithering is fast fast fast, so I've added support for it to b&w epaper displays. I don't do it for monochrome displays yet, and I'm not sure I will since they can update in real time and dithering interferes with that, but since you can turn it off maybe I'll add support for it. I only do nearest color matching for color eink displays. The color dithering as I said, was just too expensive.
Real programmers use butterflies
modified 18Jun21 1:35am.





Best to begin without morals.





Well.. look at it this way, the code you took 1 day to write is almost as good as this PhD code the guy spent a few years perfecting!





I guess you do a bit of rounding then? If you do, it may be that it is possible to improve that by a lot of bitfiddling …
On x64 I've seen significant performance improvement, between 20% and 300%, for the following:
isnan, isinf, signbit, frexp, min, max, trunc, round, clamp and lerp
I have no idea about how well this will work out for an ARM cpu, but here is the core of my implementation (Sorry about the formatting, paste and encode as HTML doesn't work well for C++ code anymore ):
template <typename T>
struct FractionWidth;
template <>
struct FractionWidth<float>
{
static constexpr UInt32 value = 23;
};
template <>
struct FractionWidth<double>
{
static constexpr UInt32 value = 52;
};
template <typename T>
struct ExponentWidth;
template <>
struct ExponentWidth<float>
{
static constexpr UInt32 value = 8;
};
template <>
struct ExponentWidth<double>
{
static constexpr UInt32 value = 11;
};
template <typename T>
struct ExponenBias;
template <>
struct ExponenBias<float>
{
static constexpr UInt32 value = _FBIAS;
};
template <>
struct ExponenBias<double>
{
static constexpr UInt32 value = _DBIAS;
};
template <typename T>
struct InfinityUnsignedValue;
template <>
struct InfinityUnsignedValue<float>
{
static constexpr UInt32 value = 0X7F800000UL;
};
template <>
struct InfinityUnsignedValue<double>
{
static constexpr UInt64 value = 0x7FF0000000000000ULL;
};
template <typename T>
struct NegativeInfinityUnsignedValue;
template <>
struct NegativeInfinityUnsignedValue<float>
{
static constexpr UInt32 value = 0xFF800000UL;
};
template <>
struct NegativeInfinityUnsignedValue<double>
{
static constexpr UInt64 value = 0xFFF0000000000000ULL;
};
template <typename T>
struct QuietNaNUnsignedValue;
template <>
struct QuietNaNUnsignedValue<float>
{
static constexpr UInt32 value = 0XFFC00001UL;
};
template <>
struct QuietNaNUnsignedValue<double>
{
static constexpr UInt64 value = 0x7FF0000000000001ULL;
};
#pragma pack(push,1)
template<typename T>
struct FloatingPoint
{
using ValueType = std::remove_cvref_t<T>;
using UIntType = MakeUnsigned<ValueType>;
static constexpr Int32 FractionWidth = static_cast<Int32>( Internal::FractionWidth<ValueType>::value );
static constexpr Int32 ExponentWidth = static_cast<Int32>( Internal::ExponentWidth<ValueType>::value );
static constexpr Int32 ExponentBias = ( 1 << ( ExponentWidth  1 ) )  1;
static constexpr Int32 MaxExponentValue = ( 1 << ExponentWidth )  1;
static constexpr UIntType MaxExponent = static_cast<UIntType>( MaxExponentValue ) << FractionWidth;
static constexpr UIntType MinSubnormal = UIntType( 1 );
static constexpr UIntType MaxSubnormal = ( UIntType( 1 ) << FractionWidth )  1;
static constexpr UIntType MinNormal = ( UIntType( 1 ) << FractionWidth );
static constexpr UIntType MaxNormal = ( ( UIntType( MaxExponentValue )  1 ) << FractionWidth )  MaxSubnormal;
static constexpr UIntType FractionMask = FractionMask<ValueType, UIntType>;
static constexpr UIntType ExponentMask = ExponentMask<ValueType, UIntType>;
static constexpr UIntType SignMask = ~( FractionMask  ExponentMask );
static constexpr UIntType InfinityValue = InfinityUnsignedValue<ValueType>::value;
static constexpr UIntType NegativeInfinityValue = NegativeInfinityUnsignedValue<ValueType>::value;
static constexpr UIntType QuietNaNValue = QuietNaNUnsignedValue<ValueType>::value;
static constexpr UIntType ZeroValue = static_cast<UIntType>( 0 );
static constexpr UIntType NegativeZeroValue = SignMask;
UIntType value_;
constexpr FloatingPoint( ) noexcept
: value_( std::bit_cast<UIntType>( static_cast<ValueType>( 0.0 ) ) )
{
}
constexpr explicit FloatingPoint( ValueType value ) noexcept
: value_( std::bit_cast<UIntType>( value ) )
{
}
constexpr explicit FloatingPoint( UIntType value, bool ) noexcept
: value_( value )
{
}
constexpr explicit FloatingPoint( UIntType fraction, Int32 exponent, bool sign) noexcept
: value_( (fraction & FractionMask ) 
(( static_cast<UIntType>( exponent ) << FractionWidth ) & ExponentMask) 
( sign? SignMask : 0 ) )
{
}
constexpr FloatingPoint& operator = ( ValueType value ) noexcept
{
value_ = std::bit_cast<UIntType>( value );
return *this;
}
constexpr bool Sign( ) const noexcept
{
return ( value_ & SignMask ) != 0;
}
constexpr void SetSign( bool value = true ) noexcept
{
if ( value )
{
value_ = SignMask;
}
else
{
value_ &= ~SignMask;
}
}
constexpr Int32 Exponent( ) const noexcept
{
return static_cast<Int32>( ( value_ & ExponentMask ) >> FractionWidth )  ExponentBias;
}
private:
constexpr void SetExponent( UIntType value ) noexcept
{
value_ = ( value << FractionWidth ) & ExponentMask;
}
public:
constexpr UIntType Fraction( ) const noexcept
{
return value_ & FractionMask;
}
private:
constexpr void SetFraction( UIntType value ) noexcept
{
value_ = value & FractionMask;
}
public:
constexpr bool IsZero( ) const noexcept
{
return (value_ & ( ExponentMask  FractionMask )) == 0;
}
constexpr bool IsInf( ) const noexcept
{
return ( value_ & FractionMask ) == 0 && ( ( value_ & ExponentMask ) == MaxExponent );
}
constexpr bool IsNaN( ) const noexcept
{
return ( ( value_ & ExponentMask ) == MaxExponent ) && ( ( value_ & FractionMask ) != 0 );
}
constexpr bool IsInfOrNaN( ) const noexcept
{
return ( value_ & ExponentMask ) == MaxExponent;
}
static constexpr ValueType MakeNaN( UIntType value ) noexcept
{
UIntType result;
result = MaxExponent  (value & FractionMask);
return std::bit_cast<ValueType>( result );
}
constexpr ValueType AsFloatingPoint( ) const noexcept
{
return std::bit_cast<ValueType>( value_ );
}
constexpr UIntType AsUnsigned( ) const noexcept
{
return value_;
}
static constexpr FloatingPoint Zero( ) noexcept
{
return FloatingPoint( );
}
static constexpr FloatingPoint NegZero( ) noexcept
{
FloatingPoint result;
result.value_ = SignMask;
return result;
}
static constexpr FloatingPoint Inf( ) noexcept
{
FloatingPoint result;
result.value_ = MaxExponent;
return result;
}
static constexpr FloatingPoint NegInf( ) noexcept
{
FloatingPoint result;
result.value_ = MaxExponent  SignMask;
return result;
}
constexpr ValueType Trunc( ) const noexcept
{
if ( IsInfOrNaN( ) )
{
return std::bit_cast<ValueType>(value_);
}
Int32 exponent = Exponent( );
if ( exponent >= static_cast<Int32>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent <= 1 )
{
return Sign() ? static_cast<ValueType>( 0.0 ) : static_cast<ValueType>( 0.0 );
}
Int32 trimSize = FractionWidth  exponent;
UIntType result = (value_ & (SignMask  ExponentMask))  (( (value_ & FractionMask) >> trimSize ) << trimSize);
return std::bit_cast<ValueType>( result );
}
constexpr ValueType Ceil( ) const noexcept
{
if ( IsInfOrNaN( )  IsZero( ) )
{
return std::bit_cast<ValueType>( value_ );
}
Int32 exponent = Exponent( );
if ( exponent >= static_cast<Int32>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent <= 1 )
{
return Sign() ? ValueType( 0.0 ) : ValueType( 1.0 );
}
Int32 trimSize = FractionWidth  exponent;
UIntType result = ( value_ & ( SignMask  ExponentMask ) )  ( ( ( value_ & FractionMask ) >> trimSize ) << trimSize );
if ( result == value_ )
{
return std::bit_cast<ValueType>( value_ );
}
return Sign( ) ? std::bit_cast<ValueType>( result ) : std::bit_cast<ValueType>( result ) + static_cast<ValueType>( 1.0 );
}
constexpr ValueType Floor( ) const noexcept
{
if ( Sign() )
{
FloatingPoint tmp( value_ & ( ExponentMask  FractionMask ), true );
return tmp.Ceil( );
}
else
{
return Trunc( );
}
}
constexpr ValueType Round( ) const noexcept
{
if ( IsInfOrNaN( )  IsZero( ) )
{
return std::bit_cast<ValueType>(value_);
}
int exponent = Exponent( );
if ( exponent >= static_cast<int>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent == 1 )
{
bool isNegative = Sign( );
if ( isNegative )
{
return static_cast<ValueType>( 1.0 );
}
else
{
return static_cast<ValueType>( 1.0 );
}
}
if ( exponent <= 2 )
{
bool isNegative = Sign( );
if ( isNegative )
{
return static_cast<ValueType>( 0.0 );
}
else
{
return static_cast<ValueType>( 0.0 );
}
}
UInt32 trimSize = FractionWidth  exponent;
bool middleBitSet = (value_ & FractionMask) & ( UIntType( 1 ) << ( trimSize  1 ) );
UIntType result = ( value_ & ( SignMask  ExponentMask ) )  ( ( ( value_ & FractionMask ) >> trimSize ) << trimSize );
if ( result == value_ )
{
return std::bit_cast<ValueType>( value_ );
}
if ( !middleBitSet )
{
return std::bit_cast<ValueType>( result );
}
else
{
bool isNegative = Sign( );
return isNegative ?
std::bit_cast<ValueType>( result )  static_cast<ValueType>( 1.0 ) :
std::bit_cast<ValueType>( result ) + static_cast<ValueType>( 1.0 );
}
}
};
#pragma pack(pop)
Espen Harlinn
Senior Architect  Ulriken Consulting AS
The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.Edsger W.Dijkstra
modified 21Jun21 19:41pm.





If it was closer to performing like I need I'd twiddle with optimizations like this, but this sort of improvement isn't going to change the code from taking minutes to taking seconds, and that's what I need.
I've basically abandoned color dithering for this project.
Real programmers use butterflies





You made me curious. It looks like this worked. There were a couple quirk texts in your paste I had to eliminate, but the main thing was using Notepad++ to convert it to ANSI before pasting here. Weird  it took a bit of work, like 5 mins puttering...
For my own code, I do a twostep process. First paste as HTML and encode, then copy everything, delete it, and repaste as C++.
template <typename T>
struct FractionWidth;
template <>
struct FractionWidth<float>
{
static constexpr UInt32 value = 23;
};
template <>
struct FractionWidth<double>
{
static constexpr UInt32 value = 52;
};
template <typename T>
struct ExponentWidth;
template <>
struct ExponentWidth<float>
{
static constexpr UInt32 value = 8;
};
template <>
struct ExponentWidth<double>
{
static constexpr UInt32 value = 11;
};
template <typename T>
struct ExponenBias;
template <>
struct ExponenBias<float>
{
static constexpr UInt32 value = _FBIAS;
};
template <>
struct ExponenBias<double>
{
static constexpr UInt32 value = _DBIAS;
};
template <typename T>
struct InfinityUnsignedValue;
template <>
struct InfinityUnsignedValue<float>
{
static constexpr UInt32 value = 0X7F800000UL;
};
template <>
struct InfinityUnsignedValue<double>
{
static constexpr UInt64 value = 0x7FF0000000000000ULL;
};
template <typename T>
struct NegativeInfinityUnsignedValue;
template <>
struct NegativeInfinityUnsignedValue<float>
{
static constexpr UInt32 value = 0xFF800000UL;
};
template <>
struct NegativeInfinityUnsignedValue<double>
{
static constexpr UInt64 value = 0xFFF0000000000000ULL;
};
template <typename T>
struct QuietNaNUnsignedValue;
template <>
struct QuietNaNUnsignedValue<float>
{
static constexpr UInt32 value = 0XFFC00001UL;
};
template <>
struct QuietNaNUnsignedValue<double>
{
static constexpr UInt64 value = 0x7FF0000000000001ULL;
};
pragma pack(push,1);
template<typename T>
struct FloatingPoint
{
using ValueType = std::remove_cvref_t<T>;
using UIntType = MakeUnsigned<ValueType>;
<pre>
static constexpr Int32 FractionWidth = static_cast<Int32>( Internal::FractionWidth<ValueType>::value );
static constexpr Int32 ExponentWidth = static_cast<Int32>( Internal::ExponentWidth<ValueType>::value );
static constexpr Int32 ExponentBias = ( 1 << ( ExponentWidth  1 ) )  1;
static constexpr Int32 MaxExponentValue = ( 1 << ExponentWidth )  1;
static constexpr UIntType MaxExponent = static_cast<UIntType>( MaxExponentValue ) << FractionWidth;
static constexpr UIntType MinSubnormal = UIntType( 1 );
static constexpr UIntType MaxSubnormal = ( UIntType( 1 ) << FractionWidth )  1;
static constexpr UIntType MinNormal = ( UIntType( 1 ) << FractionWidth );
static constexpr UIntType MaxNormal = ( ( UIntType( MaxExponentValue )  1 ) << FractionWidth )  MaxSubnormal;
static constexpr UIntType FractionMask = FractionMask<ValueType, UIntType>;
static constexpr UIntType ExponentMask = ExponentMask<ValueType, UIntType>;
static constexpr UIntType SignMask = ~( FractionMask  ExponentMask );
static constexpr UIntType InfinityValue = InfinityUnsignedValue<ValueType>::value;
static constexpr UIntType NegativeInfinityValue = NegativeInfinityUnsignedValue<ValueType>::value;
static constexpr UIntType QuietNaNValue = QuietNaNUnsignedValue<ValueType>::value;
static constexpr UIntType ZeroValue = static_cast<UIntType>( 0 );
static constexpr UIntType NegativeZeroValue = SignMask;
UIntType value_;
constexpr FloatingPoint( ) noexcept
: value_( std::bit_cast<UIntType>( static_cast<ValueType>( 0.0 ) ) )
{
}
constexpr explicit FloatingPoint( ValueType value ) noexcept
: value_( std::bit_cast<UIntType>( value ) )
{
}
constexpr explicit FloatingPoint( UIntType value, bool ) noexcept
: value_( value )
{
}
constexpr explicit FloatingPoint( UIntType fraction, Int32 exponent, bool sign) noexcept
: value_( (fraction & FractionMask ) 
(( static_cast<UIntType>( exponent ) << FractionWidth ) & ExponentMask) 
( sign? SignMask : 0 ) )
{
}
constexpr FloatingPoint& operator = ( ValueType value ) noexcept
{
value_ = std::bit_cast<UIntType>( value );
return *this;
}
constexpr bool Sign( ) const noexcept
{
return ( value_ & SignMask ) != 0;
}
constexpr void SetSign( bool value = true ) noexcept
{
if ( value )
{
value_ = SignMask;
}
else
{
value_ &= ~SignMask;
}
}
constexpr Int32 Exponent( ) const noexcept
{
return static_cast<Int32>( ( value_ & ExponentMask ) >> FractionWidth )  ExponentBias;
}
private:
constexpr void SetExponent( UIntType value ) noexcept
{
value_ = ( value << FractionWidth ) & ExponentMask;
}
public:
constexpr UIntType Fraction( ) const noexcept
{
return value_ & FractionMask;
}
private:
constexpr void SetFraction( UIntType value ) noexcept
{
value_ = value & FractionMask;
}
public:
constexpr bool IsZero( ) const noexcept
{
return (value_ & ( ExponentMask  FractionMask )) == 0;
}
constexpr bool IsInf( ) const noexcept
{
return ( value_ & FractionMask ) == 0 && ( ( value_ & ExponentMask ) == MaxExponent );
}
constexpr bool IsNaN( ) const noexcept
{
return ( ( value_ & ExponentMask ) == MaxExponent ) && ( ( value_ & FractionMask ) != 0 );
}
constexpr bool IsInfOrNaN( ) const noexcept
{
return ( value_ & ExponentMask ) == MaxExponent;
}
static constexpr ValueType MakeNaN( UIntType value ) noexcept
{
UIntType result;
result = MaxExponent  (value & FractionMask);
return std::bit_cast<ValueType>( result );
}
constexpr ValueType AsFloatingPoint( ) const noexcept
{
return std::bit_cast<ValueType>( value_ );
}
constexpr UIntType AsUnsigned( ) const noexcept
{
return value_;
}
static constexpr FloatingPoint Zero( ) noexcept
{
return FloatingPoint( );
}
static constexpr FloatingPoint NegZero( ) noexcept
{
FloatingPoint result;
result.value_ = SignMask;
return result;
}
static constexpr FloatingPoint Inf( ) noexcept
{
FloatingPoint result;
result.value_ = MaxExponent;
return result;
}
static constexpr FloatingPoint NegInf( ) noexcept
{
FloatingPoint result;
result.value_ = MaxExponent  SignMask;
return result;
}
constexpr ValueType Trunc( ) const noexcept
{
if ( IsInfOrNaN( ) )
{
return std::bit_cast<ValueType>(value_);
}
Int32 exponent = Exponent( );
if ( exponent >= static_cast<Int32>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent <= 1 )
{
return Sign() ? static_cast<ValueType>( 0.0 ) : static_cast<ValueType>( 0.0 );
}
Int32 trimSize = FractionWidth  exponent;
UIntType result = (value_ & (SignMask  ExponentMask))  (( (value_ & FractionMask) >> trimSize ) << trimSize);
return std::bit_cast<ValueType>( result );
}
constexpr ValueType Ceil( ) const noexcept
{
if ( IsInfOrNaN( )  IsZero( ) )
{
return std::bit_cast<ValueType>( value_ );
}
Int32 exponent = Exponent( );
if ( exponent >= static_cast<Int32>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent <= 1 )
{
return Sign() ? ValueType( 0.0 ) : ValueType( 1.0 );
}
Int32 trimSize = FractionWidth  exponent;
UIntType result = ( value_ & ( SignMask  ExponentMask ) )  ( ( ( value_ & FractionMask ) >> trimSize ) << trimSize );
if ( result == value_ )
{
return std::bit_cast<ValueType>( value_ );
}
return Sign( ) ? std::bit_cast<ValueType>( result ) : std::bit_cast<ValueType>( result ) + static_cast<ValueType>( 1.0 );
}
constexpr ValueType Floor( ) const noexcept
{
if ( Sign() )
{
FloatingPoint tmp( value_ & ( ExponentMask  FractionMask ), true );
return tmp.Ceil( );
}
else
{
return Trunc( );
}
}
constexpr ValueType Round( ) const noexcept
{
if ( IsInfOrNaN( )  IsZero( ) )
{
return std::bit_cast<ValueType>(value_);
}
int exponent = Exponent( );
if ( exponent >= static_cast<int>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent == 1 )
{
bool isNegative = Sign( );
if ( isNegative )
{
return static_cast<ValueType>( 1.0 );
}
else
{
return static_cast<ValueType>( 1.0 );
}
}
if ( exponent <= 2 )
{
bool isNegative = Sign( );
if ( isNegative )
{
return static_cast<ValueType>( 0.0 );
}
else
{
return static_cast<ValueType>( 0.0 );
}
}
UInt32 trimSize = FractionWidth  exponent;
bool middleBitSet = (value_ & FractionMask) & ( UIntType( 1 ) << ( trimSize  1 ) );
UIntType result = ( value_ & ( SignMask  ExponentMask ) )  ( ( ( value_ & FractionMask ) >> trimSize ) << trimSize );
if ( result == value_ )
{
return std::bit_cast<ValueType>( value_ );
}
if ( !middleBitSet )
{
return std::bit_cast<ValueType>( result );
}
else
{
bool isNegative = Sign( );
return isNegative ?
std::bit_cast<ValueType>( result )  static_cast<ValueType>( 1.0 ) :
std::bit_cast<ValueType>( result ) + static_cast<ValueType>( 1.0 );
}
}
};
pragma pack(pop)





I am surprised it is that bad. I thought something like the ESP32 at 240Mhz would do a FloydSteinberg reasonable well, at least keeping up with the 680x0's and 86x86's I used to run FloydSteinberg on 25 odd years ago. I know it is a different architecture, but it is also a 200Mhz speed advantage. But maybe it was just a lot slower than I recall it  back then we where amazed it displayed something at all. I am quite sure it never took a minute though  but if it was ½ or 20 seconds who knows.
I wonder if it is memory access or something slowing it down.
Takes ages to develop these kind of things though. As soon as it displays something, you loose the next couple of hours looking at it before moving on.





I can't do floyd steinberg because of the memory requirements.
I do a similar style as Thomas Knoll's adobe photoshop grid dithering method for my "slow" dithering, and an optimized Yliluoma algorithm for my "fast" dithering. Both are far too slow.
Real programmers use butterflies





Adding were you doing color dithering? My black and white bayer dithering is quite fast.
Also if you were doing color dithering there are much faster algos you can use when simply simulating a higher bit depth, but I actually have to do color matching to a palette.
Here's how I have to choose two colors to blend:
template<typename PaletteType>
gfx_result dither_mixing_plan_fast(const PaletteType* palette, typename PaletteType::mapped_pixel_type color, dither_mixing_plan_data_fast* plan) {
gfx_result rr ;
if(nullptr==plan  nullptr==palette) {
return gfx_result::invalid_argument;
}
rgb_pixel<24> rgb888;
rr = convert(color,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
const unsigned r= rgb888.template channel<channel_name::R>(),
g=rgb888.template channel<channel_name::G>(),
b=rgb888.template channel<channel_name::B>();
*plan = { {0,0}, 0.5 };
double least_penalty = 1e99;
for(unsigned index1 = 0; index1 < 16; ++index1)
for(unsigned index2 = index1; index2 < 16; ++index2)
{
typename PaletteType::mapped_pixel_type mpx1;
rr=palette>map(typename PaletteType::pixel_type(index1),&mpx1);
if(gfx_result::success!=rr) {
return rr;
}
typename PaletteType::mapped_pixel_type mpx2;
rr=palette>map(typename PaletteType::pixel_type(index2),&mpx1);
if(gfx_result::success!=rr) {
return rr;
}
rr = convert(mpx1,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
unsigned r1= rgb888.template channel<channel_name::R>(),
g1=rgb888.template channel<channel_name::G>(),
b1=rgb888.template channel<channel_name::B>();
rr = convert(mpx2,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
unsigned r2= rgb888.template channel<channel_name::R>(),
g2=rgb888.template channel<channel_name::G>(),
b2=rgb888.template channel<channel_name::B>();
int ratio = 32;
if(mpx1.native_value != mpx2.native_value)
{
ratio = ((r2 != r1 ? 299*64 * int(r  r1) / int(r2r1) : 0)
+ (g2 != g1 ? 587*64 * int(g  g1) / int(g2g1) : 0)
+ (b1 != b2 ? 114*64 * int(b  b1) / int(b2b1) : 0))
/ ((r2 != r1 ? 299 : 0)
+ (g2 != g1 ? 587 : 0)
+ (b2 != b1 ? 114 : 0));
if(ratio < 0) ratio = 0; else if(ratio > 63) ratio = 63;
}
unsigned r0 = r1 + ratio * int(r2r1) / 64;
unsigned g0 = g1 + ratio * int(g2g1) / 64;
unsigned b0 = b1 + ratio * int(b2b1) / 64;
double penalty = dither_mixing_error(
r,g,b, r0,g0,b0, r1,g1,b1, r2,g2,b2,
ratio / double(64));
if(penalty < least_penalty)
{
least_penalty = penalty;
plan>colors[0] = index1;
plan>colors[1] = index2;
plan>ratio = ratio / double(64);
}
}
return gfx_result::success;
}
It's not easy. I know it could be faster, but I don't think I can make many algorithmic improvements and that's the sort of improvement I need to achieve orders of magnitude reduction in time requirements  that's what I need right now.
Real programmers use butterflies





It was color. As far as I recall, we had a 16 color fixed palette, and anything available above that would be "allocated" as images where displayed (I think the Amiga supported 32, 64, 128, 256  while our Windows support would go directly to 256). We also supported 16/24 bit, but I believe we just did nearest color on 16 bit.
Palette entries where allocated based on one or another algorithm based on distance to nearest "existing" color, the number of pixels using the color, and the number of free palette entries.
But most likely our images where just small enough that we did not encounter memory issues. I recall we did support Amiga 500, but we might have had restrictions on features there. Most systems would have had at least 1MB, and then it is no problem keeping an extra scanline or two in memory for dithering.





lmoelleb wrote: Palette entries where allocated based on one or another algorithm based on distance to nearest "existing" color, the number of pixels using the color, and the number of free palette entries.
I'm not sure what you mean by nearest existing color, as my algo has to find *two* colors in order to determine what to blend with what. I have a KD tree implementation waiting in the wings for larger palettes since it sorts in such a way as to speed up distance based matching, but it's not helpful for say, 16 colors. I may "preexpand" the palette, mixing colors beforehand, so a 16 color palette becomes (16*15)/2 colors, and then trying throwing that into a kd_tree and see what happens.
But that's my biggest issue, is finding the two colors to blend. The rest is fast.
Real programmers use butterflies





We where mainly (maybe only) loading images with a palette. So we where dithering one palette to another, meaning there was a max of 256 colors in the source image (and often less than that). I This allowed us to quickly calculate the total number of a given color, and for each color calculate how close it as to existing colors in our palette.
I guess it is pretty useless these days as no one works with palettes (I think we had a jpg decoder for fun, but probably was sticking to gif and various other formats for images we really needed).
Once the palette was locked in for an image, we could just run the FloydSteinberg. It only requires finding the nearest color per pixel as "blending" is done by pushing the error ahead of the calculations (at the cost of one scanline extra memory consumed  though I guess you could stamp it into the bitmap data directly). No problem on a 500KB system with relative small images.
We had the advantage advantage that low CPU spec systems where typically also running lower colors (so we only had to search for nearest color in 16 or 32 target colors), while systems running 256 colors typically also had more CPU power.





Actually palettes are very useful these days for epaper displays, which are either monochrome, or have a *fixed palette* of a handful of colors.
That's primarily why my GFX library supports it.
Real programmers use butterflies




