Some clients might not be using the API to render HTML. In this case they must strip the content and replace with their own formatting. Even if they are rendering html maybe their prefered style is different, maybe they decided <h1> from the api should really be <h2>.
HTML is also not structured enough to represent information in a schema see how twitter sends structured text formatting:
see https://dev.twitter.com/overview/api/entities-in-twitter-objects#dms
Marking this as planned since there are plans to move core overall away from HTML.
For HTML: Don't think of it like a string, think of it like a DOM Node/s
The node "heres my <subnode>" is italic
The subnode "text" is a link to "http:.."
For Native we can directly Map ranges and text formatting to strings: "apply bold to characters in range 4-5" which is a separate data structure that doesn't affect underlying string length.
If you send us markdown we will have to parse it 4 times on a mobile CPU: JSON data to markdown to html to sub documents to string format ranges -> final native richly formatted string.
If you send us the above format we parse once JSON data -> string format ranges -> final native richly formatted string.
string format ranges -> final native richly formatting string. is not a parsing step it just pushes the data into a displayable API via text views.
Shouldn't the link's range be [10,14] in the example's code?
This solution is straight forward when there are no nested tags, but once tags are nested it becomes interesting. From all the examples on the twitter's page, I did not see anything nested. Means it is a simple sort and sub-string solution.
But once you introduce nested tags, it becomes a bit more processing work. This solution could work, but the amount of looping that is needed on doing the replacements to update indices as you insert is pretty crazy.
Is there a reason we do not do something like markdown (aka stackoverflow) or any others that have been used for decades with message boards/forums? Seems like we are going from a simple regular expression solution with a markdown to a heavy looping solution. Maybe my lack of coffee is not letting me see the "simpler" answer.
FYI: I have no clue how you need to do styling on native apps.
Not encoding it into HTML makes ANS more pure also. Once the style is encoded into a format that is decoupled from "view" it will never need to change no matter how a consuming API view chooses to render it.
So the Facebook Instance API or the Apple News API or the Story Builder API is a simple mapping of this data to the proper format, no need to parse strip and map.
Only two types are needed for now (style and link). Link is always a URL, style is an enum (italic, bold, bold-italic, and possibly other weights). Range is a two item array of location and length.
Other HTML entities—such as h1 and pullquote—are separate types, since their range extends the full distance of the item.
Two markup items of the same type (style or link) can not overlap or behavior would be undefined (the last item would likely win).
Example: