Skip to content
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ba61414
Implement floating point conversion with ryu
0xlwu Feb 22, 2021
7d80321
Use manual strictness
0xlwu Feb 22, 2021
356ed6c
Use checked shifts
0xlwu Feb 23, 2021
c0a0583
Use builtin float-to-word conversion functions
0xlwu Feb 25, 2021
ff2210d
Use builtin conversion to Bool
0xlwu Feb 25, 2021
0bb5a01
Remove dependency on array package
0xlwu Feb 25, 2021
de1d174
Handle non-exhaustive patterns
0xlwu Feb 25, 2021
8dd7f16
Try using prim conversions directly
0xlwu Feb 25, 2021
23d5cfe
Revert "Try using prim conversions directly"
0xlwu Feb 28, 2021
755f58f
Dispatch to slow cast when builtin unavailable
0xlwu Feb 28, 2021
4635e2b
Try bumping min version to 8.4.x
0xlwu Feb 28, 2021
76b5e2e
Fix log10pow5 approximation and add unit test
0xlwu Aug 8, 2021
b5f7086
Re-export floatDec and doubleDec to maintain public API
0xlwu Aug 8, 2021
648bfae
Improve documentation and fixes for initial code review
0xlwu Aug 8, 2021
d172abf
Improve table generation documentation and clean-up
0xlwu Aug 8, 2021
5f3dce5
Improve documentation of f2s and d2s and cleanup
0xlwu Aug 8, 2021
f1c6275
Use stricter integral types and annotate fromIntegral usages
0xlwu Sep 18, 2021
c2c2c87
Add module descriptions and fix typos
0xlwu Sep 18, 2021
f6497c2
Use internal FloatFormat instead of GHC.Float.FFFormat
0xlwu Sep 18, 2021
60db980
Use monomorphic helpers for remaining integral conversions used by Re…
0xlwu Sep 23, 2021
6ec7e2d
Remove usage of TemplateHaskell in RealFloat
0xlwu Sep 23, 2021
a73c84e
Fix LUT usage on big-endian systems
0xlwu Sep 24, 2021
3ccfd47
Add header for endianness detection
0xlwu Sep 27, 2021
6b3eaaa
Fix big-endian word16 packing in fast digit formatting
0xlwu Sep 29, 2021
abf6e04
Fix big-endian word128 read from raw addr
0xlwu Sep 29, 2021
f771cd5
Clean up unused functions
0xlwu Oct 5, 2021
b394896
Fix incorrect reciprocal function usage
0xlwu Oct 5, 2021
9815597
Add more test coverage and fix doc example
0xlwu Oct 5, 2021
c0648bb
Use quickcheck equality property in tests
0xlwu Oct 6, 2021
d87d3ae
Format haddock headers more similarly to existing ones
0xlwu Oct 6, 2021
5500d59
Use simpler reciprocal math for 32-bit words
0xlwu Oct 7, 2021
7d7d7fa
Use boxed arithmetic in logic flow
0xlwu Oct 12, 2021
906d6db
Support ghc 9.2 prim word changes
0xlwu Oct 12, 2021
046a42b
Fix 32-bit support
0xlwu Oct 12, 2021
8fafed4
Skip conversion to Double before fixed Float formatting
0xlwu Nov 7, 2021
dde95e2
Tweak doc wording and add examples
0xlwu Nov 7, 2021
415ac6f
Rename FExponent to FScientific
0xlwu Nov 15, 2021
0474332
Use an opaque FloatFormat type for compatibility
0xlwu Nov 15, 2021
9be8170
Rename float fixed-format to standard-format and other naming tweaks
0xlwu Nov 17, 2021
f67df50
Encourage inlining by removing partial application
0xlwu Nov 17, 2021
a01cb00
Fix some haddock links and accidental monospacing
0xlwu Nov 22, 2021
0cc5417
Add explanation about difference between implementation and reference…
0xlwu Nov 22, 2021
12436a2
Clarify default precision
0xlwu Nov 22, 2021
d8dac2a
Point to ryu paper for more details
0xlwu Nov 22, 2021
b70918b
Fix non-exhaustive warning for ghc 9.2
0xlwu Nov 28, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Data/ByteString/Builder.hs
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ module Data.ByteString.Builder
, stringUtf8

, module Data.ByteString.Builder.ASCII
, module Data.ByteString.Builder.RealFloat

) where

Expand All @@ -261,6 +262,7 @@ import Data.ByteString.Builder.Internal
import qualified Data.ByteString.Builder.Prim as P
import qualified Data.ByteString.Lazy.Internal as L
import Data.ByteString.Builder.ASCII
import Data.ByteString.Builder.RealFloat

import Data.String (IsString(..))
import System.IO (Handle, IOMode(..), withBinaryFile)
Expand Down
27 changes: 1 addition & 26 deletions Data/ByteString/Builder/ASCII.hs
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ import Data.ByteString.Lazy as L
import Data.ByteString.Builder.Internal (Builder)
import qualified Data.ByteString.Builder.Prim as P
import qualified Data.ByteString.Builder.Prim.Internal as P
import Data.ByteString.Builder.RealFloat (floatDec, doubleDec)

import Foreign
import Foreign.C.Types
Expand All @@ -89,16 +90,6 @@ import Foreign.C.Types
-- Decimal Encoding
------------------------------------------------------------------------------


-- | Encode a 'String' using 'P.char7'.
{-# INLINE string7 #-}
string7 :: String -> Builder
string7 = P.primMapListFixed P.char7

------------------------------------------------------------------------------
-- Decimal Encoding
------------------------------------------------------------------------------

-- Signed integers
------------------

Expand Down Expand Up @@ -163,22 +154,6 @@ wordDec :: Word -> Builder
wordDec = P.primBounded P.wordDec


-- Floating point numbers
-------------------------

-- TODO: Use Bryan O'Sullivan's double-conversion package to speed it up.

-- | /Currently slow./ Decimal encoding of an IEEE 'Float'.
{-# INLINE floatDec #-}
floatDec :: Float -> Builder
floatDec = string7 . show

-- | /Currently slow./ Decimal encoding of an IEEE 'Double'.
{-# INLINE doubleDec #-}
doubleDec :: Double -> Builder
doubleDec = string7 . show


------------------------------------------------------------------------------
-- Hexadecimal Encoding
------------------------------------------------------------------------------
Expand Down
269 changes: 269 additions & 0 deletions Data/ByteString/Builder/RealFloat.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
-- |
-- Module : Data.ByteString.Builder.RealFloat
-- Copyright : (c) Lawrence Wu 2021
-- License : BSD-style
-- Maintainer : [email protected]
--
-- Floating point formatting for @Bytestring.Builder@
--
-- This module primarily exposes `floatDec` and `doubleDec` which do the
-- equivalent of converting through @'Data.ByteString.Builder.string7' . 'show'@.
--
-- It also exposes `formatFloat` and `formatDouble` with a similar API as
-- `GHC.Float.formatRealFloat`.
--
-- NB: The float-to-string conversions exposed by this module match `show`'s
-- output (specifically with respect to default rounding and length). In
-- particular, there are boundary cases where the closest and \'shortest\'
-- string representations are not used. Mentions of \'shortest\' in the docs
-- below are with this caveat.
--
-- For example, for fidelity, we match `show` on the output below.
--
-- >>> show (1.0e23 :: Float)
-- "1.0e23"
-- >>> show (1.0e23 :: Double)
-- "9.999999999999999e22"
-- >>> floatDec 1.0e23
-- "1.0e23"
-- >>> doubleDec 1.0e23
-- "9.999999999999999e22"
--
-- Simplifying, we can build a shorter, lossless representation by just using
-- @"1.0e23"@ since the floating point values that are 1 ULP away are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "ULP"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! Could you link to the article?

Note BTW that haddock supports Markdown-style hyperlinks, e.g. [ULP](https://en.wikipedia.org/wiki/Unit_in_the_last_place).

--
-- >>> showHex (castDoubleToWord64 1.0e23) []
-- "44b52d02c7e14af6"
-- >>> castWord64ToDouble 0x44b52d02c7e14af5
-- 9.999999999999997e22
-- >>> castWord64ToDouble 0x44b52d02c7e14af6
-- 9.999999999999999e22
-- >>> castWord64ToDouble 0x44b52d02c7e14af7
-- 1.0000000000000001e23
--
-- In particular, we could use the exact boundary if it is the shortest
-- representation and the original floating number is even. To experiment with
-- the shorter rounding, refer to
-- `Data.ByteString.Builder.RealFloat.Internal.acceptBounds`. This will give us
--
-- >>> floatDec 1.0e23
-- "1.0e23"
-- >>> doubleDec 1.0e23
-- "1.0e23"
--
-- For more details, please refer to the
-- <https://dl.acm.org/doi/10.1145/3192366.3192369 Ryu paper>.
Comment on lines +15 to +55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is a bit too technical for most users. If we want to keep it, we should probably move it out of the module header and to the bottom of the page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there such a thing as a footer? This was added per one of our discussions above about clarifying 'shortest'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can insert chunks of documentation at selected places in the export list. See e.g. Data.ByteString.Lazy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



module Data.ByteString.Builder.RealFloat
( floatDec
, doubleDec

-- * Custom formatting
, formatFloat
, formatDouble
, FloatFormat
, standard
, standardDefaultPrecision
, scientific
, generic
) where

import Data.ByteString.Builder.Internal (Builder)
import qualified Data.ByteString.Builder.RealFloat.Internal as R
import qualified Data.ByteString.Builder.RealFloat.F2S as RF
import qualified Data.ByteString.Builder.RealFloat.D2S as RD
import qualified Data.ByteString.Builder.Prim as BP
import GHC.Float (roundTo)
import GHC.Word (Word64)
import GHC.Show (intToDigit)

-- | Returns a rendered Float. Matches `show` in displaying in standard or
-- scientific notation
--
-- @
-- floatDec = 'formatFloat' 'generic'
-- @
{-# INLINABLE floatDec #-}
floatDec :: Float -> Builder
floatDec = formatFloat generic

-- | Returns a rendered Double. Matches `show` in displaying in standard or
-- scientific notation
--
-- @
-- doubleDec = 'formatDouble' 'generic'
-- @
{-# INLINABLE doubleDec #-}
doubleDec :: Double -> Builder
doubleDec = formatDouble generic

-- | Format type for use with `formatFloat` and `formatDouble`.
data FloatFormat = MkFloatFormat FormatMode (Maybe Int)

-- | Standard notation with `n` decimal places
standard :: Int -> FloatFormat
standard n = MkFloatFormat FStandard (Just n)

-- | Standard notation with the \'default precision\' (decimal places matching `show`)
standardDefaultPrecision :: FloatFormat
standardDefaultPrecision = MkFloatFormat FStandard Nothing

-- | Scientific notation with \'default precision\' (decimal places matching `show`)
scientific :: FloatFormat
scientific = MkFloatFormat FScientific Nothing

-- | Standard or scientific notation depending on the exponent. Matches `show`
generic :: FloatFormat
generic = MkFloatFormat FGeneric Nothing

-- | ByteString float-to-string format
data FormatMode
= FScientific -- ^ scientific notation
| FStandard -- ^ standard notation with `Maybe Int` digits after the decimal
| FGeneric -- ^ dispatches to scientific or standard notation based on the exponent
deriving Show

-- TODO: support precision argument for FGeneric and FScientific
-- | Returns a rendered Float. Returns the \'shortest\' representation in
-- scientific notation and takes an optional precision argument in standard
-- notation. Also see `floatDec`.
--
-- With standard notation, the precision argument is used to truncate (or
-- extend with 0s) the \'shortest\' rendered Float. The \'default precision\' does
-- no such modifications and will return as many decimal places as the
-- representation demands.
--
-- e.g
--
-- >>> formatFloat (standard 1) 1.2345e-2
-- "0.0"
-- >>> formatFloat (standard 10) 1.2345e-2
-- "0.0123450000"
-- >>> formatFloat standardDefaultPrecision 1.2345e-2
-- "0.01234"
-- >>> formatFloat scientific 12.345
-- "1.2345e1"
-- >>> formatFloat generic 12.345
-- "12.345"
{-# INLINABLE formatFloat #-}
formatFloat :: FloatFormat -> Float -> Builder
formatFloat (MkFloatFormat fmt prec) = \f ->
let (RF.FloatingDecimal m e) = RF.f2Intermediate f
e' = R.int32ToInt e + R.decimalLength9 m in
case fmt of
FGeneric ->
case specialStr f of
Just b -> b
Nothing ->
if e' >= 0 && e' <= 7
then sign f `mappend` showStandard (R.word32ToWord64 m) e' prec
else BP.primBounded (R.toCharsScientific (f < 0) m e) ()
FScientific -> RF.f2s f
FStandard ->
case specialStr f of
Just b -> b
Nothing -> sign f `mappend` showStandard (R.word32ToWord64 m) e' prec

-- TODO: support precision argument for FGeneric and FScientific
-- | Returns a rendered Double. Returns the \'shortest\' representation in
-- scientific notation and takes an optional precision argument in standard
-- notation. Also see `doubleDec`.
--
-- With standard notation, the precision argument is used to truncate (or
-- extend with 0s) the \'shortest\' rendered Float. The \'default precision\'
-- does no such modifications and will return as many decimal places as the
-- representation demands.
--
-- e.g
--
-- >>> formatDouble (standard 1) 1.2345e-2
-- "0.0"
-- >>> formatDouble (standard 10) 1.2345e-2
-- "0.0123450000"
-- >>> formatDouble standardDefaultPrecision 1.2345e-2
-- "0.01234"
-- >>> formatDouble scientific 12.345
-- "1.2345e1"
-- >>> formatDouble generic 12.345
-- "12.345"
{-# INLINABLE formatDouble #-}
formatDouble :: FloatFormat -> Double -> Builder
formatDouble (MkFloatFormat fmt prec) = \f ->
let (RD.FloatingDecimal m e) = RD.d2Intermediate f
e' = R.int32ToInt e + R.decimalLength17 m in
case fmt of
FGeneric ->
case specialStr f of
Just b -> b
Nothing ->
if e' >= 0 && e' <= 7
then sign f `mappend` showStandard m e' prec
else BP.primBounded (R.toCharsScientific (f < 0) m e) ()
FScientific -> RD.d2s f
FStandard ->
case specialStr f of
Just b -> b
Nothing -> sign f `mappend` showStandard m e' prec

-- | Char7 encode a 'Char'.
{-# INLINE char7 #-}
char7 :: Char -> Builder
char7 = BP.primFixed BP.char7

-- | Char7 encode a 'String'.
{-# INLINE string7 #-}
string7 :: String -> Builder
string7 = BP.primMapListFixed BP.char7

-- | Encodes a `-` if input is negative
sign :: RealFloat a => a -> Builder
sign f = if f < 0 then char7 '-' else mempty

-- | Special rendering for Nan, Infinity, and 0. See
-- RealFloat.Internal.NonNumbersAndZero
specialStr :: RealFloat a => a -> Maybe Builder
specialStr f
| isNaN f = Just $ string7 "NaN"
| isInfinite f = Just $ sign f `mappend` string7 "Infinity"
| isNegativeZero f = Just $ string7 "-0.0"
| f == 0 = Just $ string7 "0.0"
| otherwise = Nothing

-- | Returns a list of decimal digits in a Word64
digits :: Word64 -> [Int]
digits w = go [] w
where go ds 0 = ds
go ds c = let (q, r) = R.dquotRem10 c
in go ((R.word64ToInt r) : ds) q

-- | Show a floating point value in standard notation. Based on GHC.Float.showFloat
showStandard :: Word64 -> Int -> Maybe Int -> Builder
showStandard m e prec =
case prec of
Nothing
| e <= 0 -> char7 '0'
`mappend` char7 '.'
`mappend` string7 (replicate (-e) '0')
`mappend` mconcat (digitsToBuilder ds)
| otherwise ->
let f 0 s rs = mk0 (reverse s) `mappend` char7 '.' `mappend` mk0 rs
f n s [] = f (n-1) (char7 '0':s) []
f n s (r:rs) = f (n-1) (r:s) rs
in f e [] (digitsToBuilder ds)
Just p
| e >= 0 ->
let (ei, is') = roundTo 10 (p' + e) ds
(ls, rs) = splitAt (e + ei) (digitsToBuilder is')
in mk0 ls `mappend` mkDot rs
| otherwise ->
let (ei, is') = roundTo 10 p' (replicate (-e) 0 ++ ds)
(b:bs) = digitsToBuilder (if ei > 0 then is' else 0:is')
Copy link
Contributor

@Bodigrim Bodigrim Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises Pattern match(es) are non-exhaustive with GHC 9.2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm is there a way to convince GHC that the list is non-empty? Looking at the implementation of roundTo, the non-throwing return values are (0, _) and (1, 1:_). Before the call to digitsToZero, we prepend 0 to the former case's snd value so AFIACT this is 'correct'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just mimic the case above or add some redundant matches e.g

diff --git a/Data/ByteString/Builder/RealFloat.hs b/Data/ByteString/Builder/RealFloat.hs
index cd0d39d..0dadf45 100644
--- a/Data/ByteString/Builder/RealFloat.hs
+++ b/Data/ByteString/Builder/RealFloat.hs
@@ -258,8 +258,8 @@ showStandard m e prec =
            in mk0 ls `mappend` mkDot rs
       | otherwise ->
           let (ei, is') = roundTo 10 p' (replicate (-e) 0 ++ ds)
-              (b:bs) = digitsToBuilder (if ei > 0 then is' else 0:is')
-           in b `mappend` mkDot bs
+              (ls, rs) = splitAt 1 $ digitsToBuilder (if ei > 0 then is' else 0:is')
+           in mk0 ls `mappend` mkDot rs
           where p' = max p 0
   where
     mk0 ls = case ls of [] -> char7 '0'; _ -> mconcat ls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying that your code is wrong, I'm saying that we need to get rid of a warning. Boot packages cannot have warnings. Throwing an error for an empty list would be fine, for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this warning needs to be silenced. Maybe this would be a good fit for Data.List.NonEmpty?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the snippet above. A lot of the code in GHC.Float makes the same assumption though so do they just turn the warning off?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That module uses the pragma

{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}

in b `mappend` mkDot bs
where p' = max p 0
where
mk0 ls = case ls of [] -> char7 '0'; _ -> mconcat ls
mkDot rs = if null rs then mempty else char7 '.' `mappend` mconcat rs
ds = digits m
digitsToBuilder = fmap (char7 . intToDigit)

Loading