1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308
//! Cosine function implemented using a parabolic approximation.
//!
//! Note that while this method is inspired by a [Taylor series] approximation,
//! it provides both better performance and accuracy.
//!
//! Most of the other trigonometric functions in this crate are implemented in
//! terms of this approximation.
//!
//! Method described: <https://stackoverflow.com/posts/28050328/revisions>
//!
//! Originally based on this approximation by Nick Capen:
//!
//! <https://web.archive.org/web/20180105200343if_/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648>
//!
//! Post quoted below for posterity as the original site is down:
//!
//! > In some situations you need a good approximation of sine and cosine that
//! > runs at very high performance. One example is to implement dynamic
//! > subdivision of round surfaces, comparable to those in Quake 3. Or to
//! > implement wave motion, in case there are no vertex shaders 2.0 available.
//! >
//! > The standard C `sinf()` and `cosf()` functions are terribly slow and offer
//! > much more precision than what we actually need. What we really want is an
//! > approximation that offers the best compromise between precision and
//! > performance. The most well known approximation method is to use a
//! > [Taylor series] about 0 (also known as a Maclaurin series), which for
//! > sine becomes:
//! >
//! > ```text
//! > x - 1/6 x^3 + 1/120 x^5 - 1/5040 x^7 + ...
//! > ```
//! >
//! > When we plot this we get: taylor.gif. The green line is the real sine,
//! > the red line is the first four terms of the Taylor series. This seems
//! > like an acceptable approximation, but lets have a closer look:
//! > taylor_zoom.gif. It behaves very nicely up till `pi/2`, but after that it
//! > quickly deviates. At pi, it evaluates to -0.075 instead of 0. Using this
//! > for wave simulation will result in jerky motion which is unacceptable.
//! >
//! > We could add another term, which in fact reduces the error significantly,
//! > but this makes the formula pretty lengthy. We already need 7
//! > multiplications and 3 additions for the 4-term version. The Taylor series
//! > just can't give us the precision and performance we're looking for.
//! >
//! > We did however note that we need `sin(pi) = 0`. And there's another thing
//! > we can see from taylor_zoom.gif: this looks very much like a parabola!
//! > So let's try to find the formula of a parabola that matches it as closely
//! > as possible. The generic formula for a parabola is `A + B x + C x²`. So
//! > this gives us three degrees of freedom. Obvious choices are that we want
//! > `sin(0) = 0`, `sin(pi/2) = 1` and `sin(pi) = 0`. This gives us the following
//! > three equations:
//! >
//! > ```text
//! > A + B 0 + C 0² = 0
//! > A + B pi/2 + C (pi/2)² = 1
//! > A + B pi + C pi² = 0
//! > ```
//! >
//! > Which has as solution `A = 0, B = 4/pi, C = -4/pi²`. So our parabola
//! > approximation becomes `4/pi x - 4/pi² x²`. Plotting this we get:
//! > parabola.gif. This looks worse than the 4-term Taylor series, right?
//! > Wrong! The maximum absolute error is 0.056. Furthermore, this
//! > approximation will give us smooth wave motion, and can be calculated
//! > in only 3 multiplications and 1 addition!
//! >
//! > Unfortunately it's not very practical yet. This is what we get in the
//! > `[-pi, pi]` range: negative.gif. It's quite obvious we want at least a
//! > full period. But it's also clear that it's just another parabola,
//! > mirrored around the origin. The formula for it is `4/pi x + 4/pi² x²`.
//! > So the straightforward (pseudo-C) solution is:
//! >
//! > ```text
//! > if(x > 0)
//! > {
//! > y = 4/pi x - 4/pi² x²;
//! > }
//! > else
//! > {
//! > y = 4/pi x + 4/pi² x²;
//! > }
//! > ```
//! >
//! > Adding a branch is not a good idea though. It makes the code
//! > significantly slower. But look at how similar the two parts really are.
//! > A subtraction becomes an addition depending on the sign of x. In a naive
//! > first attempt to eliminate the branch we can 'extract' the sign of x
//! > using `x / abs(x)`. The division is very expensive but look at the resulting
//! > formula: `4/pi x - x / abs(x) 4/pi² x²`. By inverting the division we can
//! > simplify this to a very nice and clean `4/pi x - 4/pi² x abs(x)`. So for
//! > just one extra operation we get both halves of our sine approximation!
//! > Here's the graph of this formula confirming the result: abs.gif.
//! >
//! > Now let's look at cosine. Basic trigonometry tells us that
//! > `cos(x) = sin(pi/2 + x)`. Is that all, add pi/2 to x? No, we actually get
//! > the unwanted part of the parabola again: shift_sine.gif. What we need to
//! > do is 'wrap around' when x > pi/2. This can be done by subtracting 2 pi.
//! > So the code becomes:
//! >
//! > ```text
//! > x += pi/2;
//! >
//! > if(x > pi) // Original x > pi/2
//! > {
//! > x -= 2 * pi; // Wrap: cos(x) = cos(x - 2 pi)
//! > }
//! >
//! > y = sin(x);
//! > ```
//! >
//! > Yet another branch. To eliminate it, we can use binary logic, like this:
//! >
//! > ```text
//! > x -= (x > pi) & (2 * pi);
//! > ```
//! >
//! > Note that this isn't valid C code at all. But it should clarify how this
//! > can work. When x > pi is false the & operation zeroes the right hand part
//! > so the subtraction does nothing, which is perfectly equivalent. I'll
//! > leave it as an excercise to the reader to create working C code for this
//! > (or just read on). Clearly the cosine requires a few more operations than
//! > the sine but there doesn't seem to be any other way and it's still
//! > extremely fast.
//! >
//! > Now, a maximum error of 0.056 is nice but clearly the 4-term Taylor series
//! > still has a smaller error on average. Recall what our sine looked like:
//! > abs.gif. So isn't there anything we can do to further increase precision
//! > at minimal cost? Of course the current version is already applicable for
//! > many situations where something that looks like a sine is just as good as
//! > the real sine. But for other situations that's just not good enough.
//! >
//! > Looking at the graphs you'll notice that our approximation always
//! > overestimates the real sine, except at 0, pi/2 and pi. So what we need is
//! > to 'scale it down' without touching these important points. The solution
//! > is to use the squared parabola, which looks like this: squared.gif. Note
//! > how it preserves those important points but it's always lower than the
//! > real sine. So we can use a weighted average of the two to get a better
//! > approximation:
//! >
//! > ```text
//! > Q (4/pi x - 4/pi² x²) + P (4/pi x - 4/pi² x²)²
//! > ```
//! >
//! > With `Q + P = 1`. You can use exact minimization of either the absolute
//! > or relative error but I'll save you the math. The optimal weights are
//! > `Q = 0.775`, `P = 0.225` for the absolute error and `Q = 0.782`,
//! > `P = 0.218` for the relative error. I will use the former. The resulting
//! > graph is: average.gif. Where did the red line go? It's almost entirely
//! > covered by the green line, which instantly shows how good this
//! > approximation really is. The maximum error is about 0.001, a 50x
//! > improvement! The formula looks lengthy but the part between parenthesis
//! > is the same value from the parabola, which needs to be computed only
//! > once. In fact only 2 extra multiplications and 2 additions are required
//! > to achieve this precision boost.
//! >
//! > It shouldn't come as a big surprise that to make it work for negative x
//! > as well we need a second abs() operation. The final C code for the sine
//! > becomes:
//! >
//! > ```text
//! > float sin(float x)
//! > {
//! > const float B = 4/pi;
//! > const float C = -4/(pi*pi);
//! >
//! > float y = B * x + C * x * abs(x);
//! >
//! > #ifdef EXTRA_PRECISION
//! > // const float Q = 0.775;
//! > const float P = 0.225;
//! >
//! > y = P * (y * abs(y) - y) + y; // Q * y + P * y * abs(y)
//! > #endif
//! > }
//! > ```
//! >
//! > So we need merely 5 multiplications and 3 additions; still faster than
//! > the 4-term Taylor if we neglect the abs(), and much more precise! The
//! > cosine version just needs the extra shift and wrap operations on x.
//! >
//! > Last but not least, I wouldn't be Nick if I didn't include the SIMD
//! > optimized assembly version. It allows to perform the wrap operation very
//! > efficiently so I'll give you the cosine:
//! >
//! > ```asm
//! > // cos(x) = sin(x + pi/2)
//! > addps xmm0, PI_2
//! > movaps xmm1, xmm0
//! > cmpnltps xmm1, PI
//! > andps xmm1, PIx2
//! > subps xmm0, xmm1
//! >
//! > // Parabola
//! > movaps xmm1, xmm0
//! > andps xmm1, abs
//! > mulps xmm1, xmm0
//! > mulps xmm0, B
//! > mulps xmm1, C
//! > addps xmm0, xmm1
//! >
//! > // Extra precision
//! > movaps xmm1, xmm0
//! > andps xmm1, abs
//! > mulps xmm1, xmm0
//! > subps xmm1, xmm0
//! > mulps xmm1, P
//! > addps xmm0, xmm1
//! > ```
//! >
//! > This code computes four cosines in parallel, resulting in a peak
//! > performance of about 9 clock cycles per cosine for most CPU architectures.
//! > Sines take only 6 clock cycles ideally. The lower precision version can
//! > even run at 3 clock cycles per sine... And lets not forget that all input
//! > between -pi and pi is valid and the formula is exact at -pi, -pi/2, 0,
//! > pi/2 and pi.
//! >
//! > So, the conclusion is don't ever again use a Taylor series to approximate
//! > a sine or cosine! To add a useful discussion to this article, I'd love to
//! > hear if anyone knows good approximations for other transcendental functions
//! > like the exponential, logarithm and power functions.
use super::F32;
use core::f32::consts::FRAC_1_PI;
impl F32 {
/// Approximates `cos(x)` in radians with a maximum error of `0.002`.
pub fn cos(self) -> Self {
let mut x = self;
x *= FRAC_1_PI / 2.0;
x -= 0.25 + (x + 0.25).floor().0;
x *= 16.0 * (x.abs() - 0.5);
x += 0.225 * x * (x.abs() - 1.0);
x
}
}
#[cfg(test)]
pub(crate) mod tests {
use super::F32;
/// Maximum error in radians
pub(crate) const MAX_ERROR: f32 = 0.002;
/// Cosine test vectors - `(input, output)`
const TEST_VECTORS: &[(f32, f32)] = &[
(0.000, 1.000),
(0.140, 0.990),
(0.279, 0.961),
(0.419, 0.914),
(0.559, 0.848),
(0.698, 0.766),
(0.838, 0.669),
(0.977, 0.559),
(1.117, 0.438),
(1.257, 0.309),
(1.396, 0.174),
(1.536, 0.035),
(1.676, -0.105),
(1.815, -0.242),
(1.955, -0.375),
(2.094, -0.500),
(2.234, -0.616),
(2.374, -0.719),
(2.513, -0.809),
(2.653, -0.883),
(2.793, -0.940),
(2.932, -0.978),
(3.072, -0.998),
(3.211, -0.998),
(3.351, -0.978),
(3.491, -0.940),
(3.630, -0.883),
(3.770, -0.809),
(3.910, -0.719),
(4.049, -0.616),
(4.189, -0.500),
(4.328, -0.375),
(4.468, -0.242),
(4.608, -0.105),
(4.747, 0.035),
(4.887, 0.174),
(5.027, 0.309),
(5.166, 0.438),
(5.306, 0.559),
(5.445, 0.669),
(5.585, 0.766),
(5.725, 0.848),
(5.864, 0.914),
(6.004, 0.961),
(6.144, 0.990),
(6.283, 1.000),
];
#[test]
fn sanity_check() {
for &(x, expected) in TEST_VECTORS {
let cos_x = F32(x).cos();
let delta = (cos_x - expected).abs();
assert!(
delta <= MAX_ERROR,
"delta {} too large: {} vs {}",
delta,
cos_x,
expected
);
}
}
}