No Possibility of Effectively Limiting Data Query Results When Using Time Dimensions #6952

MathewNWSH · 2023-10-19T14:36:48Z

Expected behavior and actual behavior.

This ticket is based on my experience described in the mailing archive: https://marc.info/?t=169770835500001&r=1&w=2

To summarize, I've discovered that it's not possible to limit the DATA query result when the TIME dimension is enabled. The layer data source is constrained when opening the data source, and by using the LIMIT clause as described here: https://mapserver.org/input/vector/postgis.html#data-access-connection-method, it limits the source data at the beginning. Subsequent filtration based on dimensions only operates on this already limited dataset.

I can think of two potential solutions:

Avoid using the TIME dimension and, instead, use two other dimensions: start and stop. Then, include them in the DATA query like this:

DATA "geom from (select * from plua_overview WHERE timestamp >= '%start%' AND timestamp <=  '%stop%') subquerry using unique unique_id"

However, this approach might limit the types of requests you can use, as mentioned here:

Another solution is to create a proxy, but I believe this problem can be more easily resolved for all useres on the MapServer side. You could utilize the existing warper, as mentioned in the point above.

Attach simple test case (drag/drop it here)

Mapfile example:

 DEBUG 5
 STATUS OFF
 NAME "time_idx"
 TYPE POLYGON

 CONNECTIONTYPE postgis
 CONNECTION "***"

 DATA 'geometry from (select * from mrc order by maxcc desc limit 10) as subquery using unique unique_id'

 PROJECTION
   "init=epsg:3857"
 END

 VALIDATION
    'maxCC' '^[0-9]{1,3}$'
    'default_maxCC' '100'
 END

 METADATA
   "wms_title" "tile-index-cloud"
   "wms_timeextent" "2022-02-01/2023-10-10/P1D"
   "wms_timeitem" "timestamp"
   "wms_timedefault" "2023-10-10"
   "wms_enable_request" "!*"
 END
END

example request:

&TIME=2023-08-01/2023-08-31&SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&BBOX=1559074.156322684372,6519475.858540875837,1573681.021033628611,6535087.428231937811&CRS=EPSG:3857&WIDTH=756&HEIGHT=808&LAYERS=Cloudless%20Mozaik&STYLES=&FORMAT=image/png&DPI=157&MAP_RESOLUTION=157&FORMAT_OPTIONS=dpi:157&TRANSPARENT=TRUE

And mapserver LOG which I get after receiving NoData layer:

[Thu Oct 19 10:53:43 2023].184724 CGI Request 1 on process 4107971
[Thu Oct 19 10:53:43 2023].186013 msDrawMap(): rendering using outputformat named png (AGG/PNG).
[Thu Oct 19 10:53:43 2023].186021 msDrawMap(): WMS/WFS set-up and query, 0.000s
[Thu Oct 19 10:53:43 2023].186030 msDrawRasterLayerLow(S2 Masking_02): entering.
[Thu Oct 19 10:53:43 2023].187613 msPostGISLayerOpen called: geometry from (select * from mrc order by maxcc desc limit 10) as subquerry using unique unique_id
[Thu Oct 19 10:53:43 2023].187619 msPostGISLayerOpen: No connection in pool, creating a fresh one.
[Thu Oct 19 10:53:43 2023].205466 msConnPoolRegister(***)
[Thu Oct 19 10:53:43 2023].211101 msPostGISLayerOpen: Got PostGIS version 30300.
[Thu Oct 19 10:53:43 2023].211106 msPostGISLayerOpen: Forcing 2D geometries: no.
[Thu Oct 19 10:53:43 2023].211110 msPostGISLayerFreeItemInfo called.
[Thu Oct 19 10:53:43 2023].211132 msPostGISLayerInitItemInfo called.
[Thu Oct 19 10:53:43 2023].217863 msPostGISParseData called.
[Thu Oct 19 10:53:43 2023].217874 msPostGISParseData: unique_column=unique_id, srid=, geom_column_name=geometry, table_name=(select * from mrc order by maxcc desc limit 10) as subquerry
[Thu Oct 19 10:53:43 2023].217878 msPostGISLayerTranslateFilter. String: (`[timestamp]` >= `2023-08-01` AND `[timestamp]` <= `2023-08-31`).
[Thu Oct 19 10:53:43 2023].217880 msPostGISLayerTranslateFilter. There are tokens to process... 
[Thu Oct 19 10:53:43 2023].218287 msPostGISLayerWhichShapes called.
[Thu Oct 19 10:53:43 2023].218290 msPostGISParseData called.
[Thu Oct 19 10:53:43 2023].218294 msPostGISParseData: unique_column=unique_id, srid=, geom_column_name=geometry, table_name=(select * from mrc order by maxcc desc limit 10) as subquerry
[Thu Oct 19 10:53:43 2023].218297 msPostGISBuildSQL called.
[Thu Oct 19 10:53:43 2023].218299 msPostGISBuildSQLItems called.
[Thu Oct 19 10:53:43 2023].218302 msPostGISBuildSQLItems: 3 items requested.
[Thu Oct 19 10:53:43 2023].218305 msPostGISBuildSQLFrom called.
[Thu Oct 19 10:53:43 2023].218308 msPostGISBuildSQLWhere called.
[Thu Oct 19 10:53:43 2023].218310 msPostGISBuildSQLSRID called.
[Thu Oct 19 10:53:43 2023].218312 msPostGISBuildSQLSRID: Building find_srid line.
[Thu Oct 19 10:53:43 2023].218315 msPostGISBuildSQLBox called.
[Thu Oct 19 10:53:43 2023].218327 msPostGISLayerWhichShapes query: select "timestamp"::text,"cloudless_60"::text,"epsg"::text,ST_AsBinary(("geometry"),'NDR') as geom,"unique_id"::text from (select * from mrc order by maxcc desc limit 10) as subquerry where "geometry" && ST_GeomFromText('POLYGON((2791286.85068837 5622573.79066471,2791286.85068837 5638166.03910615,2805874.3941497 5638166.03910615,2805874.3941497 5622573.79066471,2791286.85068837 5622573.79066471))',find_srid('','mrc','geometry')) and (("timestamp" >= date_trunc('day',date '2023-08-01') and "timestamp" <= (date_trunc('day',date '2023-08-31') + interval '1 day' - interval '1 second')))
[Thu Oct 19 10:53:43 2023].262344 msPostGISLayerWhichShapes query status: PGRES_TUPLES_OK (2)
[Thu Oct 19 10:53:43 2023].262350 msPostGISLayerWhichShapes got 0 records in result.
[Thu Oct 19 10:53:43 2023].262353 msPostGISLayerNextShape called.
[Thu Oct 19 10:53:43 2023].262355 msPostGISLayerFreeItemInfo called.
[Thu Oct 19 10:53:43 2023].262358 msPostGISLayerClose called: geometry from (select * from mrc order by maxcc desc limit 10) as subquerry using unique unique_id
[Thu Oct 19 10:53:43 2023].262361 msConnPoolRelease(***)
[Thu Oct 19 10:53:43 2023].262364 msConnPoolClose(***)
[Thu Oct 19 10:53:43 2023].262408 msDrawMap(): Layer 3 (S2 Masking_02), 0.076s
[Thu Oct 19 10:53:43 2023].262414 msDrawMap(): Drawing Label Cache, 0.000s
[Thu Oct 19 10:53:43 2023].262417 msDrawMap() total time: 0.077s
[Thu Oct 19 10:53:43 2023].273102 msSaveImage(stdout) total time: 0.011s
[Thu Oct 19 10:53:43 2023].274001 mapserv request processing time (msLoadMap not incl.): 0.089s
[Thu Oct 19 10:53:43 2023].274007 msFreeMap(): freeing map at 0x560808b493d0.
[Thu Oct 19 10:53:43 2023].274029 freeLayer(): freeing layer at 0x560808e75570.
[Thu Oct 19 10:53:43 2023].274032 msPostGISLayerIsOpen called.
[Thu Oct 19 10:53:43 2023].274041 freeLayer(): freeing layer at 0x560808e8b2c0.
[Thu Oct 19 10:53:43 2023].274044 msPostGISLayerIsOpen called.

Operating system

Ubuntu 20.04 LTS

MapServer version and installation method

Mapserver 7.6.4 and Mapserver 8

The text was updated successfully, but these errors were encountered:

MathewNWSH · 2023-10-19T14:44:51Z

I think that creating new class called LIMIT within Layer would be a good idea. Then users will be able to define maximal number of rendered features within layer based on data after filtrations.

geographika · 2023-10-19T14:47:29Z

Does setting MAXFEATURES to 10 give you the result you want?

MathewNWSH · 2023-10-20T07:03:11Z

No, I believe that MAXFEATURES defines the number of images to be drawn, so if I will have my data i descending cloud cover order then on top of my service I will have the Tenth worst image. If I will have my data i ascending cloud cover order then the one on top would be the Tenth best image.

t. In my case, as you can see:

DATA 'geometry from (select * from mrc order by maxcc desc) as subquery using unique unique_id'

I'm selecting all of the images, but I only need to render the top ten, meaning the last ten images in the order, with the one on top being the best.

geographika · 2023-10-20T07:17:56Z

It seems similar to an issue I had in #5008. To resolve it an ORDER BY was added to the outer query to guarantee order of the filtered data. This in combination with MAXFEATURES might work. The fix was only for the SQL Server driver, AFAIK this does not exist in the Postgres one: e653b9b

MathewNWSH · 2023-10-20T07:59:45Z

As I wrote, the "order by" in its current form returns the column in the following order:

maxcc
90%
90%
90%
90%
90%
90%
90%
85%
75%
65%
50%
10%
0%

Let's assume I need the last 5, which are:

75%
65%
50%
10%
0%

And they must be in the order so that the image with 0% cloud cover is at the top.

In the example you mentioned, the column is sorted by the column name. Let's assume that my column will be ordered in the following way:

Descending, just like above. In this case, MAXFEATURES 5 will return:

90%
90%
90%
90%
90%

Ascending. In this case, MAXFEATURES 5 will return:

0%
10%
50%
65%
75%

So in both situations, the best image will not be returned at the very top.
Am I right?

jratike80 · 2023-10-20T08:06:59Z

Is your use case somewhat similar than the one in #5899?

Would that serve also your needs? Start filling pixels of the output image starting from the best source image within the time window until all the pixels are filled. Then stop.

geographika · 2023-10-20T08:16:02Z

You currently have your ORDER BY within the subquery. The SQL Server driver allows this to be moved to the outer query, so records are ordered after the time filter has been applied and not before. In this case you'd have filtered out the 90% values and then records would be ordered as follows:

75%
65%
50%
10%
0%

MathewNWSH · 2023-10-20T08:48:27Z

@geographika
Okay but how to add ORDER BY to outer query? From what I understand config keyword SORTBY mentioned here #5041 (comment) is not implemented?

I guess for now this is how it should be done? #5008 (comment)

MathewNWSH · 2023-10-20T09:03:03Z

@jratike80
Yup saw this one before.
But from what I understand it is based on timestamp parameter passed to DATA query, but this parameter is not TIME dimension, so in fact I will not be able to use WMS-T full potential?
So how to let Mapserver know which image is the best one, and where is NoData to be filled by second best? -- Really don't get this part, please explain if possible :)

jratike80 · 2023-10-20T09:40:35Z

It is a feature request, not implemented. If it was implemented it would work with any sorted tileindex features. User decides what is the best just like you want to do by sorting the images with ORDER BY maxcc DESC. The difference would be that there would be no need to use LIMIT for shortening the list. Mapserver would take all pixels which are not nodata from 0% and then automatically try to fill the holes from 10%, 50% and so on.

Because this is not implemented we are not using the native TIME support of Mapserver. We have additional time attributes in the tile index. The WMS GetMaps from standard WMS-T clients contain the &TIME= parameter but we modify the request on-the-fly with a frontal service. &TIME= is dropped and new query parameters &START= and &END= are created. In the mapfile the DATA is something like this

DATA "SELECT path||location as location, crs, resolution, time, GEOMETRY FROM ortokuva where (end_year is null or end_year>=%END%) and (start_year>=%START% and start_year<=%END%) order by time,resolution desc"

Order by is still needed because there may be several images from the same time and then we want to render the one with smallest pixel size last. Isn't that the same use case that you have? We just prefer small pixels but you would like to prefer low cloud coverage.

MathewNWSH · 2023-10-20T09:52:51Z

This is practically the same situation.

Okay, now I fully understand the process of selecting images for rendering. You work around the lack of implementation of this functionality by using a time parameter that is not the TIME dimension. But from my experience with MapServer, the last rendered image is the one on top. I understand how to make MapServer recognize the best image, but how do you make it recognize NoData and use the second best image to fill in the gaps and also do not overlap actual data from the first/best image? Or Is it also an unimplemented feature, and just an idea?

geographika · 2023-10-20T09:55:25Z

@geographika Okay but how to add ORDER BY to outer query? From what I understand config keyword SORTBY mentioned here #5041 (comment) is not implemented?

I guess for now this is how it should be done? #5008 (comment)

Correct - the ORDER BY isn't as yet implemented in the PostGIS driver, It would need similar changes to e653b9b added.

jratike80 · 2023-10-20T10:33:43Z

But from my experience with MapServer, the last rendered image is the one on top.

My experience since this https://lists.osgeo.org/pipermail/mapserver-users/2008-April/055317.html is the opposite. Mapserver takes the first image that it gets from the tileindex (the one on top) renders it, takes the next image, renders that above the previous etc. What remain on the top of the final rendered image is the one that was on the bottom of the image list from the tileindex. order by time shows the newest image, not the oldest.

You can see Mapserver in action here https://kartta.paikkatietoikkuna.fi/?zoomLevel=8&coord=433861.1118164062_6947879.803305684&mapLayers=801+100+default,3400+100+ortokuva:indeksi&timeseries=1950&uuid=90246d84-3958-fd8c-cb2c-2510cccca1d3&noSavedState=true&showIntro=false

I can see from your debug info above that when time dimension is used with WMS-T Mapserver generates a query like this
...and (("timestamp" >= date_trunc('day',date '2023-08-01') and "timestamp" <= (date_trunc('day',date '2023-08-31') + interval '1 day' - interval '1 second')))

However, Mapserver does not sort the selected tileindex rows. I think it should do ORDER BY timestamp because now what gets rendered into the final image when the time range is large depends on the native order of entries in the data source.

MathewNWSH · 2023-10-20T12:14:04Z

Yes I agree thank you for the clarification. The query I should aim for should contain information on two parameters: start and stop. This would select the rasters acquired in the period I'm interested in (so I don't know if ORDER BY timestamp is necessary for this), and then I would need an order BY cloud cover desc/asc so that the raster with the least cloud cover is at the top.

Right?

But there is one thing that still puzzles me. Is it possible for Mapserver to know when it can stop rendering tiles? Let's assume that I have such a raster:

How to make the mapserver generate data only where there is NoData in first raster, and when all the pixels are full (based on the rasters in the right order using order by) it will stop rendering, even if there were other rasters (because all of the pixels for requested extent are already associated with values)?

jratike80 · 2023-10-20T13:29:58Z

There are two separate things: how Mapserver behave now and how it could behave for being more efficient. The current way is to use the painters algorithm.

so that the raster with the least cloud cover is at the top.

The pixels from this raster should go to the top of the resulting raster which means that by the painters method it must be rendered last, and that means that the image must appear last in the list that is returned by the "select" from the tileindex.

stop rendering, even if there were other rasters

It is not possible now. Painters algorithm keeps on rendering till the end of the image list. What Mapserver does is

Render the first image from the tileindex stack; nodata remains nodata in the result
Render the second image from the stack; pixels which are not nodata in image 2 are overwriting all the existing pixels in the rendered interim raster. Now the result contains pixels from image 2 where image 2 has valid data. In places which are nodata in image 2 the pixels originate from image 1. If the pixel was nodata in both image 1 and image 2 then the result has also nodata there.
Render the 3rd image on top of the rendered image etc.
Render the last image in the stack.

The painters algorithm is making a good end result: every pixel comes from the most valuable source image but it is heavy and slow if there are many overlapping images in the source image list that is selected from the tileindex.

jratike80 · 2023-10-20T17:14:44Z

For comparison, Geoserver has an option named ExcessGranuleRemoval for avoiding unnecessary rendering. I found documentation from https://docs.geoserver.geo-solutions.it/edu/en/raster_data/mosaic_pyramid.html. The system filters out images (granules in Geoserver language) based on their footprints. Documentation about how Geoserver handles the footprints in https://docs.geoserver.geo-solutions.it/edu/en/raster_data/imagemosaic_footprint.html#geoserver-imagemosaic-footprint.

MathewNWSH · 2023-10-31T12:36:13Z

@geographika Is there any possibility of implementing the LIMIT class in the near future?

MathewNWSH · 2023-10-31T12:37:42Z

For comparison, Geoserver has an option named ExcessGranuleRemoval for avoiding unnecessary rendering. I found documentation from https://docs.geoserver.geo-solutions.it/edu/en/raster_data/mosaic_pyramid.html. The system filters out images (granules in Geoserver language) based on their footprints. Documentation about how Geoserver handles the footprints in https://docs.geoserver.geo-solutions.it/edu/en/raster_data/imagemosaic_footprint.html#geoserver-imagemosaic-footprint.

The possibility of stopping rendering data when every pixel is covered seems like an amazing performance improvement opportunity. I'm rooting for this to happen someday!

geographika · 2023-11-01T09:15:16Z

@geographika Is there any possibility of implementing the LIMIT class in the near future?

@MathewNWSH - I'm not currently a user of the PostGIS drive so no plans on implementing this in the near future. You could try making the modifications and submitting a pull request, or sponsoring adding this feature to MapServer.

sdlime · 2023-11-01T14:18:18Z

Is the algorithm associated with the solution well known?

j-musial · 2023-11-02T19:54:05Z

Would it be possible to add MAXFEATURES_FIRST MAXFEATURES_LAST along with the MAXFEATURES keyword in order to select first/last N features? This would resolve the problem mentioned by @MathewNWSH

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Possibility of Effectively Limiting Data Query Results When Using Time Dimensions #6952

No Possibility of Effectively Limiting Data Query Results When Using Time Dimensions #6952

No Possibility of Effectively Limiting Data Query Results When Using Time Dimensions #6952

No Possibility of Effectively Limiting Data Query Results When Using Time Dimensions #6952

Comments

Expected behavior and actual behavior.

Attach simple test case (drag/drop it here)

Operating system

MapServer version and installation method