Some websites allow users to post content with links to other sites and have previews for these links displayed alongside with the posts.
Those are typically social media sites, and we’ll refer to them as “social media” in this diary entry.
Examples of such sites include:
- forums based on Discourse, particularly OpenStreetMap Community Forum
- microblogging platforms based on Mastodon, particularly en.osm.town
There are many more of course, with some more widely known examples, such as Facebook and Twitter.
We’re primarily interested in the listed above because they are directly related to OpenStreetMap.
A preview usually includes a page title followed by a longer description and an image.
Typically when a submitter makes a post with a link, the social media site runs a bot to download the linked web page and scan it for metadata.
The metadata usually includes textual attributes and links to preview images.
The bot then downloads the images and assembles the final preview to be shown to other users.
Here we’re interested in previews of pages on the OpenStreetMap website.
Usually the links someone posts are to pages of map locations, editable osm elements (nodes, ways and relations), changesets and diary entries.
All of those pages include metadata in one of the formats suitable to generate previews, The Open Graph protocol.
The Open Graph metadata consists of several <meta>
html elements inside the web page <head>
.
The degree to which metadata on OpenStreetMap website pages allows generation of a useful previews however varies.
Metadata on diary pages is usually sufficient to build reasonable previews.
As an example, we can look at the html source of this diary entry:
<meta property="og:site_name" content="OpenStreetMap">
<meta property="og:title" content="The Côte de Blubberhouses and the Pacific Ocean">
<meta property="og:type" content="website">
<meta property="og:url" content="https://www.openstreetmap.org/user/SomeoneElse/diary/407065">
<meta property="og:description" content="Kex Gill (humorously named the Côte de Blubberhouses for a stage of the 2014 Tour de France) is a road in Yorkshire between Harrogate and Skipton. Part of it is gradually sliding down the valley that it is built half-way up the side of and is being rebuilt; it was the access tags on bridleways there that caught my eye in the first place.">
<meta property="og:image" content="https://map.atownsend.org.uk/tmp/Screenshot_20250712_114206.png">
<meta property="og:image:alt" content="Kex Gill, west of Harrogate">
<meta property="article:published_time" content="2025-07-12T11:18:21Z">
We can see a post title (<meta property="og:title" ...>
), a description taken from the beginning of the diary entry (<meta property="og:description" ...>
) and a link to its first image (<meta property="og:image" ...>
).
If this diary entry didn’t have any images, the image link would point to the OpenStreetMap logo instead.

A preview build from that metadata by Discourse on OpenStreetMap Community Forum.
The logo is something we often find in previews of other kinds of osm pages, even when we expect something more representative of their contents.
When linking to a map location or a osm data element, we’d like to see a map fragment instead.
Seeing the logo is not very useful because is doesn’t tell you anything about the location or the element.
In fact many users find this annoying enough to argue for disabling such previews.
For example, this GitHub issue tells about how this logo is redundant and even obnoxious.
Titles and descriptions could also be often improved, but here we’ll focus on images.

Not a very useful preview of an osm way, as displayed in this forum post. The logo is displayed twice, once as an og:image
, once as a favicon.

A preview of the same place on Wikimapia, as displayed in the next post. At least they have a photo. We could also try adding an image from an image=*
tag, but most of osm elements don’t have it.
Now we know that some parts of metadata for previews are present but mostly useless and we might wonder why the OpenStreetMap website doesn’t omit useless metadata instead.
The reason for that is to prevent social media sites from filling the missing parts on their own, using their custom logic.
Sometimes this process results in completely wrong information included in previews.
So it makes sense to spam the osm logo where no better image is readily available.
But we want a better image to be available.
Some osm pages have technical reasons for not providing a better image.
Links to map locations often have the coordinates stored in their fragment or hash, the #
character and everything after it.
For example, https://www.openstreetmap.org/#map=12/50.0769/14.4036 has #map=12/50.0769/14.4036
as a fragment.
Fragments are not send to the server when web pages are requested, they are handled by the client, that is by the web browser.
In this case the javascript code executed in the browser points the slippy map interface to the location stored in the fragment.
Since the server doesn’t get to see the fragment, it can’t respond with a map image of that location.
For other kinds of pages there are legal obstacles to providing an image.
Changesets are considered to be metadata on osm geographic data.
A preview image would be metadata on metadata on geodata.
However there are steps taken to minimize the amount of metadata on geodata disseminated to unrestricted amount of users, which would also impact preview images as that secondary derivative metadata.
In other words, we don’t want the wide public to know about changesets, therefore we won’t show them any pictures of changesets.
But for elements, preview images would be based on the geographic data which we distribute to everyone.
There should be no serious obstacles preventing the generation of preview images, right?
Nonetheless the images are not generated in this case either, and the logo is still served instead.
I believe it’s possible to change that.
Initial plan
Here’s what we want:
When we open an osm element page such as www.openstreetmap.org/node/2564671338
or www.openstreetmap.org/way/8063926
or www.openstreetmap.org/relation/1067556,
we want it to include this piece of html with a link to a preview image:
<meta property="og:image" content="(link to a preview image)">
instead of what we get now:
<meta property="og:image" content="https://www.openstreetmap.org/assets/osm_logo...">
And to do this we need some service that generates preview images for osm elements.
Thus our plan for getting more useful previews of element pages is this:
- Make a service that generates osm element preview images. This service can be anywhere on the internet, it just has to be reachable from out Discourse or Mastodon or any other site that generates previews. It could be a part of www.openstreetmap.org or it could be an external service, possibly not even operated by the OSMF. This is the major part of the plan.
- Modify the openstreetmap-website, the application behind www.openstreetmap.org, to include links to the preview image service instead of default logo links. Besides making this change, this also requires convincing the maintainers of openstreetmap-website to accept the change, which should be easy if a reasonably performing service is operating on an OSMF-controlled infrastructure.
Preview images
What kind of a preview image do we want?
First of all, it should be a raster image because not all social media sites support vector images like svgs.
So it should be something like a raster map tile.
At this point we consider using the same raster tiles that are shown on a slippy map as element preview images. We just need to pick a correct tile and link to it from openstreetmap-website.
The service providing tiles already exists, it’s the rendering-serving pipeline behind the Standard OSM OpenStreetMap Carto style.
There are some obvious problems with using the standard tiles.
First of all, we want an image with preview of an osm element.
However we’ll be able to clearly see the element only if it is the only thing that’s rendered on a tile we pick.
We may not know what else, if anything, is rendered right next to our element.
If the element is a building in a city, it’s probably surrounded by a lot of things, and we won’t know by looking at the image which thing it is supposed to represent.
Additionally there’s an opposite problem: we don’t know if our element is rendered at all.
Even if the element is something that’s supposed to be rendered at a known zoom level, the rendering may be omitted.
For example this happens for shops in a place with many shops.
Some shops just won’t fit on a rendered tile.
So we want to draw an element outline or a highlighted icon on top of a tile.
This already prevents us from linking to the ready-made tile directly.
Doing this is still sort of an option, but only if all else fails.
It would still be better than showing a logo probably, so we’re not completely ruling this option out.
While we’re looking at tiles, we may notice another problem that prevents us from picking a tile, drawing an element on top of it and serving the result.
The location of our element may be too close to the edge of the tile on the desired zoom level, or the element may not fit on one tile at all.
We can try picking a lower zoom level but we may get unlucky and the element will still be too close to the edge.
We may zoom out another time but then the element may become too small.
The solution to the problem above is to assemble our own tile out of several existing tiles by tile stitching.
We’ll have to get several tiles on a desired zoom level, stitch them together and crop the new tile with our element in the center.
At this point we may decide that our new tile doesn’t have to be of the same shape as the original tiles (256 x 256 pixels), so we may want to do stitching and cropping anyway, but we won’t go that far yet.
Now we know what kind of image we want to generate.
But before we’re able get the tiles and draw an element shape, we have to know where the element is located, what its shape is and possibly other things such as tags that may affect what we’re about to draw.
What our service is going to know is that it is accessed by a bot from Discourse or elsewhere using a link from <meta property="og:image" ...>
.
We can decide what kind of link it is when we insert it on the openstreetmap-website side.
We have different options here, and we’ll look at them later, but whichever we pick, we’ll have to get the necessary information about the osm element out of this link.
Let’s look at our updated plan:
- openstreetmap-website should:
- Provide links to our preview image service from element pages
- Our preview image service should:
- Get information about the element from the url used for accessing the service.
- Download the necessary tiles, likely from the standard tile servers.
- Combine the tiles into the new base map fragment for our image.
- Draw an element shape on top of the image.
- Serve the resulting image to the bot that requested it.
- We should:
- Implement the preview image service and host it somewhere.
- Implement changes to openstreetmap-website, convince the maintainers to accept these changes and deploy the new version of the site.
We know that our service is going to respond to HTTP GET requests following some link in <meta property="og:image" content="(link)">
. Since the URLs of osm elements pages look like https://www.openstreetmap.org/(osm element type)/(osm element id)
(for example, https://www.openstreetmap.org/node/2564671338
), the most obvious method to construct a link to our image is to use a similar form: (service location)/(osm element type)/(osm element id)/image.png
. Our service, wherever it’s running, will listen to requests with such links, and will determine element types and ids by extracting them from requests.
The service still wouldn’t know what tiles to fetch and what kind of element shape to draw over them just from type+id.
The next step is to get the element geometry with coordinates of nodes.
Getting the tags of the element is also helpful, and in some cases it is necessary to determine if the element is an area object or not.
Tags also allow us to figure out the starting zoom level of tiles when the element is still visible.
Some elements are not rendered on higher zoom levels or, in some cases, it makes no sense to look at them on higher possible zoom levels.
For example, a place element represented by a node, like a village or a town, would “fit” on an image at a highest possible zoom level, because it’s a node with no area.
However it makes no sense to looks at it at that zoom level because it won’t be rendered.
Even if it was rendered, the image wouldn’t be a good representation for a preview because we probably want a surrounding area outside of our place to also be included.
This is achieved by picking a lower zoom level.
We can’t know for sure which level is optimal, but we can make a better guess if we know the tags.
The obvious thing to do for our service is to use the OSM API to get the information about the element. Here you can see the biggest implementation problems starting to emerge, fetches over the network. Also you might question the existence of the service as something separate from openstreetmap-website. If openstreetmap-website was serving the preview images by itself, it wouldn’t need to do API requests because it already knows about the elements, right? Well, openstreetmap-website would still need to get the information from the database. Our service can do the same thing, if it runs somewhere where it can access the osm database and if it’s granted the read rights to a few tables in the database. Reading from the database is something that can be used instead of the API.
Another option for getting the element information is encoding it in the preview image link. Instead of (service location)/(osm element type)/(osm element id)/image.png
, the link might look like (service location)/(sequence of characters encoding the information)/image.png
. That sequence may get fairly long in case of elements with complicated geometries, and it might not be the most efficient thing to do. We’d have to make openstreetmap-website construct the sequence for every view of osm element page, despite most of these views are not performed by preview bots, in which cases this information is useless and extracting it from the database is a waste of time. Although later we’ll see about every view, because maybe we can save on that.
Let’s suppose that our service gets the information about the element by one of the methods described above. Now we have to figure out which tiles we need to download and actually download them. We have to make more requests over the network to another service. Now we can look again at the comment about tile stitching and read the objections that have immediately followed: the performance and scalability issues because who knows how many thousands or millions of views something might get once it is shared on social media networks.
What exactly are the performance and scalability issues here? One obvious issue is latency. First we have to fetch the element information, then to fetch multiple tiles, and all of that takes time. But the good news is that we can afford a larger latency here. Most of the users who are going to see the preview image are going to get it from the social media website where someone has posted a link. It’s going to be Discourse, Mastodon and likes. The image will be cached there. Under normal circumstances it won’t get downloaded directly from our service by anyone other than bots of these sites. That means that millions of views are probably not going to happen on the side of our service.
The objection about millions of views contained the part about providing a caching infrastructure to handle these millions of views. Now we know that caching is already happening on social media sites and however many views happen, most of them are going to hit the caches there. Then maybe we don’t need any caching infrastructure on our side. This could be true with the same caveat, under normal circumstances.
What do I mean by normal circumstances? I mean that users are sharing the OpenStreetMap website pages by posting links to the pages, and not by trying to hotlink to preview images directly. But why would they hotlink? The direct link to the preview image service is not going to be displayed to users, they’d have to look for it in the web page html source code. Isn’t it easier to post a link to the page and let your social media server of choice do the job of getting the image?
A possible reply could be that this is only true if a user is posting links on some social media site that is able to generate previews. What if they are not doing that? What if they want osm element images on some other website that doesn’t run a bot to fetch and cache the images? Maybe people will use it to embed thousands of location maps on corporate web sites and the like.
Still, it’s not obvious why anyone would hotlink when there’s a more straightforward method of embedding a map using the Share Panel.
With hotlinking they can’t control the image size, it’s going to be whatever size we pick.
The image is also going to be static, with no possibility to zoom in or out.
But I guess some people won’t be deterred and will try to hotlink. Is it going to be a significant amount of people that can cause performance issues? We don’t know right now. It is claimed that there are, or have been, various such things running on the dev server and even as non-heavily promoted dev things they become popular and problematic. What I can say is that the image preview service haven’t become popular despite running for months (yes, there is a prototype running). On the other hand, I’m not promoting it at all.
Another kind of abnormal circumstances are aggressive scraper bots. We have hopefully benign bots from social media sites that hopefully won’t try requesting the same images again and again. But scraper bots, who knows what are they going to do. On the other hand it’s stupid for them to request the same image over and over too. To know whether this or another kind of heavy usage will emerge and become a challenge, we have to run an experiment.
There are other objections in the GitHub issue I’m referring to that are related to linking to a general map location. As noted in the first section, we’re not looking at this kind of links, not as our first goal at least. For now we’re interested only in links to osm element pages, despite the first post in the issue focusing on map locations and changesets.
Previously in our plans you might have noticed how convincing the maintainers is mentioned. You might have thought it’s not a big deal, after all I am one of the maintainers. Yes, I am, but I still have to convince other maintainers for any changes to openstreetmap-website to happen. And I’m not in the OWG who decides what runs on the servers. But now after reading this section, which consists mostly of responses to objections of a maintainer, you can see how convincing the maintainers is a significant part of a solution and one of the reasons for me to write this post.
Prototype
Did I mention that there is a working prototype deployed? I’m not yet going to tell where it is because I don’t want to disrupt the “background level” data collection. Bots visit it few times a week. The source code is available at GitHub. The name implies that it’s going to produce metadata for osm elements or maybe other osm things. Some of those are backup plans and/or future plans. Currently we’re mainly interested in it producing preview images. If you look at the repository, you’ll notice a couple of things:
- It is written in PHP.
- I haven’t updated it in months.
Why PHP? Aren’t we writing in Ruby here? As we can see, another tile-stitching service mentioned in the GitHub issue of interest is also written in PHP. That’s not surprising. Code written in PHP is very easy to install. You just drop it into a subdirectory of public_html
on your server and it works. It doesn’t require any external dependencies. For example, the libraries to manipulate images are built-in. Technically they are in optional modules, but I’ve never seen a server without them. If necessary, we can rewrite everything in Ruby or something else, after we prove that the approach described here works.
What is already implemented? Most of what was described here. When we open a link (service location)/osm-meta-emitter/(type)/(id)/image.png
we’ll get an image like this:

A preview image for an osm way.
This image is supposed to be linked from the content
attribute of <meta property="og:image" content="...">
on the element page. Currently it’s not linked of course. However we’re not supposed to use this link directly or even be aware of it. We’re supposed to be able to post a link to the element page on our social media and get their bot to discover the image link. Now this is not going to happen because there’s no link to the image we want on the element page. That’s why the service is also able to generate fake element pages for testing with urls in the form of (service location)/osm-meta-emitter/(type)/(id)
. Those pages include metadata appropriate for social media bots. Posting links to those pages should make social media sites generate previews as if they are for osm elements.
What is already done again?
- getting the element data with two different implementations:
- via the OSM API
- by reading from the postgres database
- downloading of OSM Carto tiles
- stitching+cropping the tiles into a preview image of a specified size
- drawing outlines of elements or circle markers for nodes (this part may require more tuning, especially for relations)
- serving the resulting images with client-side caching (not server-side for now, more on this later)
What about caching? As I’ve said client-side caching is implemented, but what is it? When anyone tries to access the preview image link, they get appropriate caching headers along with it. Basically they also get some magic string known as ETag, and then later, when they want the same image again, they can ask: “I got this image with this ETag. Did the ETag change? If the ETag didn’t change, don’t send me the image.” And if there’s a cache hit, they wouldn’t need to redownload the image. ETags are computed based on everything that can affect preview images. First this is element data. A new version of an element will produce a different ETag. That means our service still has to redownload the element data for client-side cached requests.
Unfortunately that’s not all. A preview image also depends on everything else around the element that gets rendered on the map tiles. So now our service has to act as a client of tile servers and ask if their ETags have changed since the tiles were downloaded. Here we unfortunately still get network latency because we need to wait for tile servers to respond even if it’s not necessary to download anything. That means client-side caching is not as useful without server-side caching. And it also turns out that social media bots don’t make use of ETags. This kind of caching maybe will help only with hotlinking because the clients are going to be web browsers, and web browsers use ETags.
Server-side caching, if implemented, would involve storing either Carto tiles on our server to avoid downloading them again or storing complete preview images to avoid downloading anything again and rendering anything again. This would come at the cost of resulting images not being entirely up-to-date, which doesn’t matter much because social media sites will cache the images too and also serve not entirely up-to-date images to their end users. The more important cost is disk space or RAM if the cache is there.
Now the question is whether it’s worth spending that extra space to achieve fewer requests to tile servers and lower latencies. We already know that latency by itself is not as important because end users are not going to experience it. It’s still a problem to some extent during a preview image generation because a thread of our service is going to hang around for longer. As for extra requests for tiles, the main cost would be rendering the tiles again, but it’s likely not going to happen because tile servers got their own cache.
So we a running our service between cached tiles on tile servers and cached previews on social media sites. Now we can see that inserting another layer of caching would only make sense if our service serves multiple social media sites. That would be a nice thing to have, but we have priorities to mainly serve osm contributors. Our initial implementation doesn’t have to serve previews to Facebook etc. In this case we can try running it without implementing server-side caching. Of course this is going to be an experiment that may produce an unexpected result, and it will turn out that we need caching. But we need to run the experiment first to find out.
Restricting the service
I propose limiting the preview service to Discourse at community.openstreetmap.org first. We’re running that server and it serves the osm community as its name suggests. Then, if this succeeds, we can think about opening previews to other clients. So the answer to do we try and restrict it so it can only be used for this [internal service] (but how - it’s hard to see how to do that) or do we just accept that it will be more generally used is we’ll start by restricting it. Now let’s get to how.
Can we do this based on user agent? Probably not because anyone can fake it and pretend to be our Discourse bot, but let’s look at the actual requests it makes. I’ve redacted the addresses of the service to avoid polluting its logs.
A. This is an example of a request to scan the link you post for its metadata:
GET [redacted]/relation/19376753 HTTP/1.1
accept-encoding: gzip
host: [redacted]
accept-language: *
accept: text/html,*/*
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15
B. Right after doing A the bot is going to find about a preview image and follow up with this request:
GET [redacted]/relation/19376753/image.png HTTP/1.1
host: [redacted]
connection: close
user-agent: Ruby
accept: */*
accept-encoding: identity
Those were for a submitted post. As you can see, user agents are totally bogus.
C. For previews displayed to you as you write a post, another kind of request is made:
GET [redacted]/way/1202392270 HTTP/1.1
host: [redacted]
accept: */*
accept-encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
accept-language: en;q=0.9, *;q=0.5
user-agent: Discourse Forum Onebox v3.4.6
It’s not followed by an image request, your browser is directed to fetch the image instead. User agent is more reasonable here. But in general we can’t trust user agent, even for our own clients.
We don’t need to check user agent for our Discourse server however. We know its ip address. All of those requests came from 87.252.214.112
, fume.openstreetmap.org. We can limit the preview image generation to this ip address, or to a range of OSMF-owned ip addresses. Everyone else will get a logo, like they get now.
Final plan
Now I can say what exactly I plan to do, if I get an opportunity.
The plan involves small updates to openstreetmap-website, the software behind the OpenStreetMap website, and osm-meta-emitter, my preview image and test metadata page generator.
- Add an option for osm-meta-emitter to limit image generation to requests coming from a specified ip address (or a range of addresses). If a request comes from some other ip address, serve the default logo image without connecting to the OSM API, the OSM database or tile servers.
- Put the new version of osm-meta-emitter somewhere on the dev server. Have it configured to
- generate previews only for requests coming from fume.openstreetmap.org, our Discourse server
- get element data from master.apis.dev.osm.org, the sandbox server for editing
- Add an option for openstreetmap-website to output
<meta property="og:image" content="...">
tags with a link to a specified osm-meta-emitter instance (or a specified url pattern, any string which can have element types and ids substituted) when an element page is opened from a specified ip address (or a range of addresses). This is going to be a pull request.
- Convince the maintainers to merge the pull request.
- Ask the admins to configure openstreetmap-website at master.apis.dev.osm.org to output image preview meta tags for requests from fume.openstreetmap.org (or make a pull request to the chef repo doing this).
After that it will be possible to see preview images for osm elements on the sandbox server when links are posted to the forum. The one caveat is that the images won’t work in previews that are displayed while posts are being written, that will cause metadata requests of type C from the previous section. The image is going to be requested by the web browser and the request won’t come from fume.openstreetmap.org. Users will see osm logos instead of true previews while they write their posts. But after the post is complete, the image will be fetched by a request of type B, which will come from fume.openstreetmap.org, and everyone will see the actual preview image.
- Create a forum topic about this feature and invite people to post links to elements and provide feedback.
- Watch for a while (a week or two) if everything is working smoothly. Fix whatever need fixing (see below for what can possibly go wrong).
If everything goes well, we can proceed to enabling previews for the main osm server.
- Put another instance of the new version of osm-meta-emitter somewhere on the dev server. Configure it similarly except for:
- get element data from api.openstreetmap.org, the actual OSM API
- Ask the admins to configure openstreetmap-website at www.openstreetmap.org similarly.
After this is done, we have a minimal successful implementation of preview images. We can do more of course, but let’s first check the potential problems.
Risks
What can possibly go wrong with our plan?
- Other maintainers might refuse to merge the changes to openstreetmap-website. Possible reasons are:
- they might insist on implementing server-side caching first;
- they might object to adding an option for a preview image generator as an external service and insist
on it being internal to the openstreetmap-website codebase.
- The admins might refuse to use the dev server instance and insist on osm-meta-emitter hosted elsewhere.
- osm-meta-emitter will make too many requests to the OSM API.
- osm-meta-emitter will make too many requests to tile servers.
- The dev server won’t be able to keep up with requests for any other reason, with possible consequences:
- the requests will get dropped;
- the requests will take too long and Discourse bot will give up on waiting for the result;
- the previews will work but will cause too high load on the dev server, risking other things on the dev server.
- There are unexpected bugs in osm-meta-emitter.
Let’s look at the possible problems and their fixes in more detail.
osm-meta-emitter may get rate-limited on API requests. Then none of requests will succeed for some time, osm-meta-emitter wouldn’t know anything about the requested osm elements and wouldn’t be able to render any images. This is unlikely to happen during step 7 unless we specifically do stress-testing, so maybe we should add that to our plan. If we manage to hit the API rate limit under expected the expected load, we can configure osm-meta-emitter to read from the database directly.
Reading from the database is already implemented in osm-meta-emitter. It will require granting read access to a few tables. The question is then will it be possible to read from the production database if the production instance of osm-meta-emitter is still on the dev server as step 8 says. If not, the production instance will have to be installed elsewhere. And all of this will preclude running osm-meta-emitter externally to the OSMF infrastructure, unless a mirror of osm data is also maintained.
Again I don’t expect this to happen after step 7. Now the question is whether these excessive requests are going for the same tiles or for different tiles. If they are for the same tiles, implementing server-side caching is going to help. Although again I’m not expecting it to help much because what are the main costs of tile requests? Probably rendering. But tile servers have their own cache and won’t re-render tiles just because they are requested multiple times. And why would repeated requests for the same tiles happen? Likely because Discourse will request previews for the same element. But doesn’t Discourse also have its own cache? That’s going to mean that I’m wrong about its cache, or maybe that cache needs extra tuning, if that’s possible.
If server-side caching needs to be done and if it’s because of the same preview image requests, maybe no coding would be required because existing Apache modules might do the job. But if the tile requests go for different tiles, no caching is likely going to help. Whether that’s going to happen depends on access patterns in production, so we’re not going to know that until the plan is complete.
If caching won’t do the trick, maybe getting some kind of privileged access to tiles is possible. osm-meta-emitter, unlike Discourse, doesn’t send bogus user agents. It also runs on a known location. Or maybe osm-meta-emitter could be run alongside one of the tile servers and only make use of already rendered tiles, which can be read directly. But that probably can’t be easily combined with reading from the database during steps 2-7 because the database on tile servers has production data. And the data is likely in a different schema, that means more programming required.
However I don’t think expect any serious problems happening here as long as we’re only supporting previews for our Discourse server. So what’s written here is likely going to be of use only we try to scale beyond Discourse. And before doing anything from this section, it could be worth trying the next section.
At this point we might want to find out how many requests we can serve. This is directly related to controlling the number of requests. If we want to limit the number of requests to osm-meta-emitter that result in image generation with necessary OSM API and tile server reads, we can do something as simple as checking the ids of requested elements. If the ids fulfill a condition like L <= (id mod B) && (id mod B) < H
where B
, L
and H
are some constants fixed in our settings, we try to generate a preview, otherwise we serve a logo. If our condition is 42 <= (id mod 256) && (id mod 256) < 43
, which is the same as (id mod 256) == 42
, we’ll be generating images only roughly for 1 out of 256 requests. That way we’ll be able to find some level of service that we can handle, and then we’ll be able to optimize the request handling.
A lot of time is spent waiting for the API/tiles responses. The latency of any single request to osm-meta-emitter is not that important unless it’s in tens of seconds. The bigger problem could be all of the threads that are running concurrently. Some kind of async/await event loop might be necessary to cut down their numbers. Each thread also takes memory. The most straightforward implementation requires about a megabyte of RAM for images alone. Hopefully it’s not a lot, but maybe other things will take up more space.
Here we may find that optimizing the existing PHP code is cumbersome and a rewrite using some other technology is necessary. I don’t expect this happening for our Discourse-limited service. Anything I can write in this section is a speculation that’s not being based on any real world usage pattern. But maybe you’re sure that the existing PHP code is going to fail even for this level of load. In this case I’d like to see it fail and find out why. And we can turn osm-meta-emitter off any moment because nothing critical depends on it.
The maintainers/admins refuse to go along with the plan
Hopefully they’ll put forward some requirements which can be fulfilled. For the likely requirements we can look again at the GitHub issue.
In case it’s caching […] because […] millions of views, I’ll first try to argue against implementing caching because according to the plan with a Discourse-limited deployment, millions of views are not going to happen. We don’t have millions of active mappers discussing osm elements on the forum. Millions of views can still happen in case of a DOS attack, or in case of some popular site trying to hotlink to preview images. However in this case we’re not serving the actual preview images because the requests to osm-meta-emitter don’t come from our Discourse server. We’re not even trying to render them and not making any API/tile requests from osm-meta-emitter. There’s nothing to cache in this case. If that argument fails, some kind of caching will need to be implemented.
Another kind of possible is requirement is either making the service entirely internal to openstreetmap-website (Why would we link to something outside? It’s more difficult to maintain the code if we start scattering it around.) or entirely external to OSMF infrastructure (Why would we want to run this experimental code on our servers before it has proven its value?). In the first case preview image generation will have to be rewritten in Ruby as a part of openstreetmap-website. That’s some more work, but it’s not a difficult task because I already wrote it once in one language, I can do it again in another language. There aren’t many possible unexpected problems unlike those discussed in the previous section. In the second case I guess I’ll have to find some hosting for osm-meta-emitter. I don’t know if it will cost me.
Nowadays raster tiles are the default on the OpenStreetMap website. However the integration of vector tiles has already started. At some point in the future everyone might switch to vector tiles and vector tiles might become the default. After that a question could be raised, whether to keep raster tiles around. A possible decision to retire the entire raster tile infrastructure might happen. osm-meta-emitter relies on raster tiles being available. If all available raster tile servers are gone, something else will have to fill that role. This is a very remote scenario, many other changes are likely to happen before it plays out. We can’t plan for those today.
Backup plans
Discourse patch
The worst case scenario is that for whatever reason the deployment doesn’t happen and we don’t get to step 7 of our plan. We’re not going to know any real usage patterns and we won’t even know about people using it with sandbox data. The most likely place for the plan to get halted is step 4 when the openstreetmap-website maintainers either refuse to merge the pull request of put it off for too long. If we still have a goal of having previews on Discourse, we may try leaving out patching openstreetmap-website and patch Discourse instead. Discourse already has a set of specialized preview engines for sites like GitHub or Wikipedia in addition to generic OpenGraph web pages. We can add one for OpenStreetMap.
If the reason for step 4 not succeeding was increased maintenance, patching Discourse is unlikely to help. Now the task, aside from programming, is to convince admins to run the patched version of Discourse, and this is going to be even more difficult if it’s the same people that have to be convinced. The chances of pushing the patch upstream to the Discourse code are slim while preview images on osm are experimental. The patched version will have to be maintained and tested before any Discourse update deployment.
Map tiles as review images
If we’re still stuck at step 4 and Discourse patch is a no-go, we have a better-than-nothing option of showing unmodified map tiles. Remember that in the Preview images section we looked at this option and concluded that in general a tile that acceptably represents a given element may not exist. But we didn’t completely give up this approach and now we can use it. Here’s what we can still get by using it:
- We get a rough representation of the element location. How good it is depends on what’s get shown on the tile we pick. Unfortunately this is something we’re not going to know. Preferably we want some landmarks, the ones relative to whose you’d be looking for the element. We probably have to zoom out further than if we tried to render an element to get them. We can even run a study “Which of the following tiles best represents this osm element?” to find out a zoom level heuristic.
- We get to use caching on tile server side. That should solve most of the possible traffic issues of sharing links.
- We don’t need to run our PHP code at all. The calculation which tile works better as the preview image can be done entirely in the Rails codebase and the tile can be directly linked from <meta property="og:image" ...>
. This lets us avoid having arguments about where the service should be located. And it also lets us avoid objections about maintenance because the code to calculate the tile is going to be significantly smaller than tile stitcher with element shape renderer.
- However, if we want to, we can run our PHP code and generate tile redirects in it. Maybe we can do it as a fallback mechanism when there are too many simultaneous requests.
Link manually
If nothing helped us to get through step 4, we may have to settle on having to link to those fake element pages mentioned in Prototype section. Instead of posting a link to an element page on the osm website, we’ll have to post a link to a page generated by osm-meta-emitter. That page would have to be improved a little. We’d like it to redirect to the actual osm page but we want it to happen only when the page is opened in a browser. Social media bots still need to stay on the osm-meta-emitter page because it has the necessary metadata. A javascript redirect will probably do the trick because social media bots are not going to run javascript. At least Discourse is unlikely to run it.
This is all inconvenient of course and it would lead to a much lower usage of element preview images. I wouldn’t consider it a successful outcome. On the other hand you’ll still be able to easily run osm-meta-emitter yourself and limit its use to whatever sites you want, not just to our Discourse. Maybe more options to limit the requests would have to be added, like by user agent. We can still reject generic and web browser user agents. Anyone who’d like to abuse the service will have to go through the trouble of discovering which user agents it’s accepting. Hopefully nobody will be bothered to do that.
Stopping the service after deployment
What if we got to deployment stage, ran the service for a while and then decided to stop using it. Maybe we can’t afford it, maybe we don’t need it, maybe there’s a completely different better solution. In this case we can think about all of what we’ve done as of a research project. I’ll probably be able to write a report why the approach attempted here failed. We’ll still be able to salvage some results like described in the previous section.
Further plans
What if the plan succeeds? There are several possible next goals we might want to pursue:
- expanding the supported social media sites;
- expanding the types of supported openstreetmap-website pages;
- improving the rendering;
- optimizing the service.
For expanding the supported social media sites the next obvious goal is our Mastodon, en.osm.town. Why don’t I propose using it in our initial plan? We don’t host it ourselves, it’s on masto.host. Our plan includes limiting requests by ip address and that’s going to be less predictable. And then there’s ActivityPub protocol. Guess what is the first item in the Criticism section of that Wikipedia article? It’s too many requests from numerous Mastodon instances when links are shared.
In this scenario server-side caching makes sense, because it can provide cached preview images to these instances. The expected access pattern is a burst of requests for the same image, followed maybe by a few bursts later. Caching will actually help because those requests won’t go for random images. Falling back to redirecting to map tiles is still an option if caching is unable to help us for some reason.
Since we may want to tune osm-meta-emitter differently to support Mastodon, we may run a different instance of osm-meta-emitter for that. In fact, we can think about expanding further right away to all social media sites using yet another instance. That instance is going have map tiles for preview images, as described in Map tiles as review images above. Then we’ll have three different levels of service:
1. A service for our Discourse. This service won’t need caching, won’t require much disk space as a result, and will unlikely get shut down.
2. A service for our Mastodon, but more likely for other Mastodon servers too or even for a wider fediverse. This one will need caching and more space.
3. A service for every other OpenGraph consumer. Or more likely no service, because the most likely implementation of it is just a link to some tile in our Rails code.
Expanding the types of supported openstreetmap-website pages
The next type of osm page to expand this approach to is a single note page. Notes are almost like nodes from a geometric point of view, and that’s the easiest kind of geometry to support. The expected preview image of some note is a map fragment with a note marker on top of it. The problem here is convincing everyone that note coordinates and statuses which are required for these expected images are not something that needs to be hidden from the public because of GDPR. We don’t need any actual personal information like who opened a note or potential personal information like note description that can say “This is (Person’s Name)’s house” to generate a preview image.
If we look at the same GitHub issue again or at forum posts, for example this one, we’ll see other openstreetmap-website page types that could benefit from preview images. They are changesets and map locations. We looked at those types at the beginning of this diary post and we already know about the associated difficulties. In case of changesets it’s GDPR again. If we want preview images of changesets, we have to decide not to hide their bounding boxes from the public.
Making preview images for map locations is complicated by the most common format of their urls, https://www.openstreetmap.org/#map=z/y/x
, where the location is stored in the fragment or hash part (#map=z/y/x
) which web servers don’t get to see. Getting openstreetmap-website to output necessary metadata for these pages is likely impossible. However there are other kinds of links available to those who open the Share Panel. The most obvious one is Include marker links, https://www.openstreetmap.org/?mlat=y&mlon=x#map=z/y/x
. The server is going to see mlat=y&mlon=x
and will have an opportunity to output the location-specific metadata. Maybe something could be done about short links without markers too.
Improving the rendering
There are many possible improvements to the element shapes drawn on top of map tiles. Some general categories for them are:
- geometry (what’s the geometry of a turn restriction?)
- generalization (if there’s a ton of nodes, do we need them all?)
- areas (can we have area elements filled inside? but we have to figure out if it is an area element and where’s its “inside”, not a trivial task for multipolygons)
- symbols (can we represent nodes by something better than circle markers?)
Some of those improvements are going to be easier to make if the OSM API is modified accordingly. But we can’t justify extending the API to support previews. The API is mainly for editing, thus we’ll have to prove that whatever changes we make are also going to help if we write an editor.
Optimizing the service
There are many ways to optimize the preview image service besides caching. Here’s one method of cutting down the number of tile requests.
Waiting for tile server responses takes significant time because in most cases there’s four of them. However in a really lucky case it takes just one tile to get the base map for our preview image. Here’s what the already existing implementation does with default settings. It provides a window into a map view consisting of all map tiles put together on a given zoom level. The shape of the window matches the shape of tiles. It’s a 256 x 256 pixel square. The window is centered on our osm element of interest. Now you can easily see why there are usually four tile requests, that’s how many different tiles end up being partly inside the window. The really lucky case is when the window contain exactly one tile. “Really lucky” here is 1 out of 256 squared, which is not a large probability. This assumes uniformly distributed point elements; 256 is a tile size.
What if we don’t insist on centering the window exactly? If we allow it to be off by one pixel, the probability of getting the lucky case increases to 1 out of 128 squared, four times higher. We can go further and divide our window into nine squares like this:
| |
--+--+--
| |
--+--+--
| |
Then we can say that we’re fine if our point element ends up anywhere inside the middle square. That’s a probability of 1/9, significantly higher than what we had before. It’s going to be more complicated for non-point elements, but there’s some room for optimizing in these cases too.