Always measure - fun with benchmarks

Mikołaj Kąkol
April 19, 2023 | Software development

In one of the previous posts, we explored the possibilities shaders give us. In the end, we tried to animate them. While doing that, I started to wonder if this solution was a viable option for production. In other words, is it fast enough? And the answer might be only one. Let’s have some fun with benchmarks.

Finding weaknesses

Here I modified the original code to make a minimal working example.

@Composable
fun ShaderPerformance1() = Column {
    val shader = remember {
        RuntimeShader(SHADER_ANIM_COLOR)
            .apply { setFloatUniform("iDuration", DURATION) }
    }
    val brush = remember { ShaderBrush(shader) }
    val time by timeAnimation()
    shader.setFloatUniform("iTime", time)

    Text(
        text = prefText,
        style = TextStyle(brush = brush),
        modifier = Modifier
            .onSizeChanged {
                shader.setFloatUniform(
                    "iResolution",
                    it.width.toFloat(),
                    it.height.toFloat()
                )
            }
            .alpha(1 - (time + 1) / 1000 / DURATION),
    )
}

What stands out here is this alpha manipulation to force redraw. I started trying to understand what causes text draw invalidation. This is a highly complex topic and I didn’t want to jump to the wrong conclusions, so I started seeking answers on Kotlin’s Slack. I got help from Halil Ozercan, who works at Google, precisely on that. He was super helpful and provided his solution to the problem. I tweaked it so it looks similar to my original code.

@Composable
fun ShaderPerformance2() {
    class AnimShaderBrush(val time: Float = -1f) : ShaderBrush() {
        private var internalShader: RuntimeShader? = null
        private var previousSize: Size? = null

        override fun createShader(size: Size): Shader {
            val shader = if (internalShader == null || previousSize != size) {
                RuntimeShader(SHADER_ANIM_COLOR).apply {
                    setFloatUniform("iResolution", size.width, size.height)
                    setFloatUniform("iDuration", DURATION)
                }
            } else {
                internalShader!!
            }
            shader.setFloatUniform("iTime", time)
            internalShader = shader
            previousSize = size
            return shader
        }

        fun setTime(newTime: Float): AnimShaderBrush {
            return AnimShaderBrush(newTime).apply {
                this@apply.internalShader = this@AnimShaderBrush.internalShader
                this@apply.previousSize = this@AnimShaderBrush.previousSize
            }
        }

        override fun equals(other: Any?): Boolean {
            if (other !is AnimShaderBrush) return false
            if (other.internalShader != this.internalShader) return false
            if (other.previousSize != this.previousSize) return false
            if (other.time != this.time) return false
            return true
        }
    }

    var brush by remember { mutableStateOf(AnimShaderBrush()) }
    val time by timeAnimation()

    LaunchedEffect(time) {
        brush = brush.setTime(time)
    }

    Text(
        text = prefText,
        style = TextStyle(brush = brush),
    )
}

The key in this solution is to invalidate Text composable by invalidating Shader. This should be very efficient due to internal caching mechanisms.

That being said, it still looks a bit hacky and most likely internal changes in Text will appear to make animations like that work out of the box. In the meantime, I wanted to create a more efficient way to do it.

@Composable
fun ShaderPerformance3() = Column {
    data class Info(val layout: TextLayoutResult, val width: Float, val height: Float)

    val shader = remember {
        RuntimeShader(SHADER_ANIM_COLOR)
            .apply { setFloatUniform("iDuration", DURATION) }
    }
    val brush = remember { ShaderBrush(shader) }
    val time by timeAnimation()

    val textMeasurer = rememberTextMeasurer()
    val info = remember(prefText) {
        textMeasurer.measure(
            text = AnnotatedString(prefText),
            style = TextStyle(brush = brush)
        ).let { textLayout ->
            val lines = (0 until textLayout.lineCount)
            val start = lines.minOf { textLayout.getLineLeft(it) }
            val end = lines.maxOf { textLayout.getLineRight(it) }
            val top = textLayout.getLineTop(lines.first)
            val bottom = textLayout.getLineBottom(lines.last)
            val width = abs(end - start)
            val height = bottom - top
            shader.setFloatUniform("iResolution", width, height)
            Info(textLayout, width, height)
        }
    }
    val wdp = with(LocalDensity.current) { info.width.toDp() }
    val hdp = with(LocalDensity.current) { info.height.toDp() }

    Canvas(
        Modifier
            .size(wdp, hdp)
    ) {
        shader.setFloatUniform("iTime", time)
        drawText(info.layout, brush)
    }
}

Here we are using TextMeasurer and drawing text directly on the canvas. We measure text, read its width and height, pass its size to the modifier and then render that text on canvas. For sure, this is not the most optimal solution, but the idea should be powerful enough.

Benchmark set up

To test animation efficiency, we will be using the macrobenchmark library. I made a few test runs and discovered that even though the benchmark library does its best to minimise fluctuations, it cannot give me a precise result. So, I end up creating tests like:

companion object {
    @Parameterized.Parameters(name = "anim{1} loop{0}")
    @JvmStatic
    fun initParameters() = buildList {
        repeat(100) {
            add(arrayOf(it, "1"))
            add(arrayOf(it, "2"))
            add(arrayOf(it, "3"))
        }
    }
}

@Test
fun animation() = benchmarkRule.measureRepeated(
    packageName = "com.mikolajkakol.myapplication",
    metrics = listOf(FrameTimingMetric()),
    compilationMode = CompilationMode.Full(),
    iterations = 10,
    startupMode = StartupMode.HOT,
    setupBlock = {
        startActivityAndWait()
        device.findObject(By.text("Shader performance $id"))?.click()
    }
) {
    Thread.sleep(100)
}

I make 10 iterations and repeated it 100 times. Also, rendering a single text is super-fast so I decided to stack text 30 times to slow down rendering.

NavHost(navController = navController, startDestination = "list") {
    composable("list") { Composables(navController) }
    composable("shaderPerf1") {
        repeat(30) {
            ShaderPerformance1()
        }
    }
    composable("shaderPerf2") {
        repeat(30) {
            ShaderPerformance2()
        }
    }
    composable("shaderPerf3") {
        repeat(30) {
            ShaderPerformance3()
        }
    }
}

 Is it best? Hard to tell. Anyway, let’s see the results! (I run it on Pixel 6 phone)

ShaderPerformanceTest_animation[anim1 loop0]
frameDurationCpuMs   P50   7,6,   P90   9,4,   P95   9,8,   P99  10,4
frameOverrunMs   P50  -7,7,   P90  -6,0,   P95  -5,6,   P99  -4,6
Traces: Iteration 0 1 2 3 4 5 6 7 8 9
ShaderPerformanceTest_animation[anim2 loop0]
frameDurationCpuMs   P50   8,6,   P90  10,1,   P95  10,6,   P99  11,8
frameOverrunMs   P50  -5,4,   P90   1,6,   P95   1,8,   P99   2,2
Traces: Iteration 0 1 2 3 4 5 6 7 8 9
ShaderPerformanceTest_animation[anim3 loop0]
frameDurationCpuMs   P50   6,1,   P90   8,4,   P95   9,6,   P99  10,8
frameOverrunMs   P50  -9,2,   P90  -6,8,   P95  -6,2,   P99  -4,9
Traces: Iteration 0 1 2 3 4 5 6 7 8 9
ShaderPerformanceTest_animation[anim1 loop1]
frameDurationCpuMs   P50   9,0,   P90  10,4,   P95  11,3,   P99  12,1
frameOverrunMs   P50  -4,7,   P90   1,7,   P95   1,8,   P99   2,3
Traces: Iteration 0 1 2 3 4 5 6 7 8 9
ShaderPerformanceTest_animation[anim2 loop1]
frameDurationCpuMs   P50   7,6,   P90   9,2,   P95   9,9,   P99  11,6
frameOverrunMs   P50  -7,7,   P90  -6,1,   P95  -5,4,   P99  -3,6
Traces: Iteration 0 1 2 3 4 5 6 7 8 9
ShaderPerformanceTest_animation[anim3 loop1]
frameDurationCpuMs   P50   6,5,   P90   8,5,   P95   9,1,   P99  10,3
frameOverrunMs   P50  -8,9,   P90  -7,0,   P95  -6,3,   P99  -5,5
Traces: Iteration 0 1 2 3 4 5 6 7 8 9
ShaderPerformanceTest_animation[anim1 loop2]
frameDurationCpuMs   P50   7,6,   P90   9,5,   P95   9,8,   P99  10,6
frameOverrunMs   P50  -7,8,   P90  -5,9,   P95  -5,6,   P99  -4,5
Traces: Iteration 0 1 2 3 4 5 6 7 8 9

Well, that tells us absolutely nothing.

Let’s do some graphs!

In order to visualize effects, we will be using Lets-Plot for Kotlin. At first, we did a line graph that shows iteration on the X-axis and the meantime for each percentile on the Y-axis.

We are now getting somewhere. It seems that the first solution is the slowest and the last the fastest, but is it? Thankfully our college Jadwiga found an even better representation of this kind of data.

I’m presenting you the violin plot.

Now we have a clear indication of what is fastest. Intriguingly, 3rd method has the most fluctuations, possibly a GC running, but why? Hard to tell. Anyway!

It was great fun and I gained excellent knowledge using macrobenchmark. If I was asked for a recommendation, I would say go with the Halil method it is super easy and most reliable. We learned that Text in compose is very fast. Considering that we had to increase the number of stacks to see meaningful numbers, we might come to the conclusion that we should measure only when we have a performance problem and measure real-life examples/problems.

Additional information:

If you want to meet us in person, click here and we’ll get in touch!