Login

System64 · 08-23-2023, 06:01 PM

(08-23-2023, 02:54 PM)megamarc Wrote: Hi!

Basic scaling (Neo-Geo style) is quite cheap, if fact zoomed-in layers have more pixel throughput than regular layers, because as each tile covers more screen space, fewer calls are required to fetch the next tile. Affine transformations (SNES-style) are the opposite: as the source and destination scanning are not parallel, every single pixel must fetch its parent tile, and what position occupies inside the tile. Thus these kind of layers offer the worst performance.

90º rotated tiles and sprites may have more cache misses than regular ones, but the overall performance is somewhat similar and it's a rarely used feature. Horizontal and vertical flips are free. When a tile is fetched, some checks and assignments must be done to determine scanning pointers (accounting for flips and 901 rotations), but once they're setup, all the pixels of the tile are output straight. By this rule, 16x16 tiles offer better performance than 8x8.

However if you check the benchmarks, even a Raspberry Pi 3 has enough power to run a standard game at 60 fps. Let's say you render a 16:9 240p game: 400x240 at 60 fps, that's 5.7 MPixels/s. Pixel throughput on a Pi 3 of regular layers and sprites is between 50 - 60 MPixels/s, that is 10x than needed. That leaves plenty of room to overdraw multiple scroll planes and sprites, blending, scaling, etc. Affine layers on a Pi 3 is about 8 Mpixels/s, enough to be used smoothly but don't overdraw a lot -the SNES had just one layer on Affine mode in mode 7-. And we're talking about a Pi 3, that is the humblest thing i have on my hands. The Pi 4 doubles that performance, and any low-end PC has even more power.

Oh alright, so it's pretty fast!
I tried to do a scanline rendering algorithm and I am around 4ms per frame, at 1000 FPS

This is the algorithm I wrote :

Code:
proc blendColor(layer: Layer, colorSrc: var ColorRGBX, colorResult: color.Color) =

    case layer.blend:

    of NONE:

        colorSrc = cast[ColorRGBX](colorResult)

    of ADD:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) + colorResult)

    of SUB:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) - colorResult)

    of MOD:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) mod colorResult)

    of MIX25:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) | colorResult)

    of MIX50:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) || colorResult)

    of MIX75:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) ||| colorResult)

    of OR:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) or colorResult)

    of XOR:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) xor colorResult)

    of AND:

        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) and colorResult)

proc paintScanlineBitmap(layer: Layer, linePtr: ptr ColorRGBX, width: int, line: int) =

    if(layer.bitmap == nil): return

    let bm = layer.bitmap

    var linePtr = linePtr

    for i in 0..<width:

        let

            idX = cast[uint](i + layer.position.x) mod cast[uint](bm.width)

            idY = cast[uint](line + layer.position.y) mod cast[uint](bm.height)

        let colIndex = bm[cast[int](idX), cast[int](idY)]

        if colIndex != 0:

            let color = bm.palette[cast[int](colIndex)]

            layer.blendColor(linePtr[], color)

            # amigafy(linePtr[])

            linePtr[].a = 255

        linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))

proc paintScanlineTilemap(layer: Layer, linePtr: ptr ColorRGBX, width: int, line: int) =

    if(layer.tilemap == nil): return

    var linePtr = linePtr

    let

        tmap = layer.tilemap

    for x in 0..<width:

        let idX = cast[uint](x + layer.position.x) mod cast[uint](tmap.widthPixels)

        let idY = cast[uint](line + layer.position.y) mod cast[uint](tmap.heightPixels)

        let tileX = idX div tmap.tileWidth.uint

        let tileY = idY div tmap.tileHeight.uint

        let tile = tmap[tileX.int, tileY.int]

        let t = tile.uint32

        if(tile.index == 0 or tile.masked):

            let xAddr = x.addr

            xAddr[].inc(tmap.tileWidth - 1)

            linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))

            continue

        let

            tileset = tmap.tilesets[tile.tileset]

        var

            pixX = idX.int mod tileset.tileWidth

            pixY = idY.int mod tileset.tileHeight

        # if(tile.rotate):

        #    let temp = pixY

        #    pixY = pixX

        #    pixX = temp

            # pixY = min(pixX, tileset.tileHeight - 1)

            # pixX = min(temp, tileset.tileWidth - 1)

            # pixY = tileset.tileHeight - 1 - pixY

            # pixX = tileset.tileWidth - 1 - pixX

        if(tile.flipV): pixY = tileset.tileHeight - 1 - pixY

        if(tile.flipH): pixX = tileset.tileWidth - 1 - pixX

        let

            tOffset = tile.index.int * tileset.tileWidth * tileset.tileHeight

            pOffset = pixY * tileset.tileWidth + pixX

            colIndex = tileset[tOffset + pOffset]

        # echo tile.tileset

        if(colIndex.int >= tileset.palette.size):

            echo tileset

        if colIndex != 0:

            let color = tileset.palette[colIndex.int]

            layer.blendColor(linePtr[], color)

            linePtr[].a = 255

        # window.screen.pix.data[index + x] = cast[ColorRGBX](color)

        # window.screen.pix.data[index + x].a = 255

        linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))

proc renderScanline(window: Window, line: int) {.inline.} =

    # echo line

    let index = line * context.width

    var myPtr = (window.screen.pix.data[index].addr)

    # Draw background color

    for i in 0..<context.width:

        myPtr[] = cast[ColorRGBX](context.backgroundColor)

        myPtr[].a = 255

        myPtr = cast[ptr ColorRGBX](cast[uint64](myPtr) + cast[uint64](sizeof(ColorRGBX)))

    myPtr = (window.screen.pix.data[index].addr)

    # Draw layers

    for l in context.layers:

        case l.layerType:

        of LAYER_BITMAP:

            l.paintScanlineBitmap(myPtr, context.width, line)

        of LAYER_TILEMAP:

            l.paintScanlineTilemap(myPtr, context.width, line)

    return

proc renderScreen(window: Window) {.inline.} =

    for j in 0..<context.height:

        if(context.lineCallback != nil): context.lineCallback(j)

        GC_disableMarkAndSweep()

        window.renderScanline(j)

        GC_enableMarkAndSweep()

I think it's fairly simple, but I wonder if I can optimize further, knowing most of my objects are reference types

Code:
e.lineCallback = (

    proc(line: int, myPtr = nil.pointer): void =

      e.layer(0).y = int(sin((TAU * line.float + offset)/32) * 4)

      e.layer(1).x = int(sin((TAU * line.float + offset)/32) * 3)

      if((line and 1) == 0):

        e.layer(0).x = int(sin((TAU * line.float + offset)/32) * 2 + x)

      else:

        e.layer(0).x = int(-sin((TAU * line.float + offset)/32) * 2 + x)

      e.backgroundColor = Color(r: line.byte, g: 255, b: 255 - line.byte)

      )

I also implemented a raster callback, however, I'm not sure if the userdata pointer is an useful feature. I took inspiration from SDL2's audio callback that allows you to pass data with a pointer.

Login
Username/Email:
Password:	Lost Password?
	Remember me