Skip to content

Conversation

@ironcev
Copy link
Member

@ironcev ironcev commented Jan 8, 2026

Description

This PR adds len method to std::string::String by exposing the length of the underlying Bytes.

Currently, the least expensive way to obtain a String's length is to call some_string.as_str().len() which is a considerable performance overhead for a very common operation of getting the length. It is also a regular part of string APIs and as such expected.

This assumes that we want to have the same semantics as the Rust's std::string::String where len returns the number of bytes and not characters or graphems: https://doc.rust-lang.org/std/string/struct.String.html#method.len

Checklist

  • I have linked to any relevant issues.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation where relevant (API docs, the reference, and the Sway book).
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
  • I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
  • I have requested a review from the relevant team or maintainers.

@ironcev ironcev self-assigned this Jan 8, 2026
@ironcev ironcev added the lib: std Standard library label Jan 8, 2026
@ironcev ironcev requested review from a team and bitzoic January 8, 2026 23:00
@ironcev
Copy link
Member Author

ironcev commented Jan 8, 2026

👍

@ironcev
Copy link
Member Author

ironcev commented Jan 9, 2026

Keeping this PR in draft before we decide if we want the len semantics like in Rust.

String does not have len() as String does not guarantee correct UTF-8 encoding and len() would expect the number of characters, not number of bytes to be returned.

I would argue here against returning number of characters and returning number of bytes, as Rust is doing it. Here are my reasons for that approach:

  • Considering all the complexity around Unicode, we would need to pick what we mean by "characters" - code points or graphemes. There is no clear reason to favor one or another, and in both cases, getting len will be expensive. Rust way of keeping both behind separate, clearly distinguishable and opt-in APIs looks to me as the cleanest approach to encapsulating Unicode complexity.
  • Consistency and equivalence with str which returns number of bytes as its len. There should be no difference between some_string.len() and some_string.as_str().len(). str is by definition a view into UTF8 string. (We will need to implement safe slicing etc. but that's other topic.)
  • Same with string arrays, str[N]. Their lengths represent byte lengths and not character lengths. It would be very confusing for str[N]::len to return something different than N. And it would again be inconsistent with str.
  • Consistency with other parts of String interface like e.g. PartialEq/Eq which is bytes based.
  • Staying as close to Rust as possible. This is IMO in general the preferable approach to Sway std unless a strong reason to diverge. Having same named but different behaving API from Rust just brings confusion to devs coming from Rust.

@bitzoic @xunilrj Having your views here would be highly appreciated.

@bitzoic
Copy link
Member

bitzoic commented Jan 12, 2026

Using len() would diverge from Rust and lead to confusion as the value returned would differ from Rust or any other language that implement a string length getter. number_of_bytes() or size() would fit better.

@ironcev
Copy link
Member Author

ironcev commented Jan 12, 2026

Using len() would diverge from Rust and lead to confusion as the value returned would differ from Rust

@bitzoic I am a bit confused here. As discussed in the comment above as well in the PR description, len() returning number of bytes and not characters (code points or graphemes) is exactly what Rust does: https://doc.rust-lang.org/std/string/struct.String.html#method.len

Returning code points or graphemes would diverge from Rust and IMO cause other issues listed in the comment above.

or any other language that implement a string length getter

The semantics of a length of a string varies among languages. As far as I remember C# and JavaScript return number of code unites. Python returns number of code points. Etc. I am not aware of any language returning number of graphemes.

My overall reasoning is that we primarily borrow from Rust, both in the language and std design.

I am fine with having num_of_bytes() method instead of len(), but I still think introducing len() one day to return anything but number of bytes will be confusing and inconsistent for the reasons listed in the above comment.

@xunilrj
Copy link
Contributor

xunilrj commented Jan 13, 2026

I am for len() returning length of bytes:
1 - it is the same as Rust. Which makes sense for sway;
2 - it is good enough for ascii and a huge chunk of utf8;
3 - it is the correct thing to do when allocating a buffer and moving the string there.

Returning number of chars/graphemes is O(n) and it is more explicit that it costs more when you force users to do .chars().count().

@bitzoic
Copy link
Member

bitzoic commented Jan 14, 2026

@bitzoic I am a bit confused here. As discussed in the comment above as well in the PR description, len() returning number of bytes and not characters (code points or graphemes) is exactly what Rust does: https://doc.rust-lang.org/std/string/struct.String.html#method.len

Interesting I thought len() returned the string length in Rust. In that case then yes len() is fine.

@ironcev ironcev deployed to fuel-sway-bot January 14, 2026 15:20 — with GitHub Actions Active
@ironcev ironcev marked this pull request as ready for review January 14, 2026 15:20
@ironcev ironcev requested review from a team as code owners January 14, 2026 15:20
@cursor
Copy link

cursor bot commented Jan 14, 2026

PR Summary

Introduces a byte-length accessor for String.

  • Adds String::len() delegating to underlying Bytes::len() in std/string.sw and documents behavior (bytes, not chars/graphemes)
  • Updates/extends string_inline_tests:
    • New string_len test and added len() assertions across existing tests (as_bytes, clear, from_ascii(_str), is_empty, new, with_capacity, from_bytes, clone) to ensure len() matches expectations and as_bytes().len()

Written by Cursor Bugbot for commit 6a55bb8. This will update automatically on new commits. Configure here.

@ironcev ironcev enabled auto-merge (squash) January 14, 2026 15:20
@ironcev ironcev requested review from bitzoic and xunilrj January 14, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib: std Standard library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants