Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Vertical callback or per pixel callback : I understand/How does Tilengine work?
#7
(08-23-2023, 02:54 PM)megamarc Wrote: Hi!

Basic scaling (Neo-Geo style) is quite cheap, if fact zoomed-in layers have more pixel throughput than regular layers, because as each tile covers more screen space, fewer calls are required to fetch the next tile. Affine transformations (SNES-style) are the opposite: as the source and destination scanning are not parallel, every single pixel must fetch its parent tile, and what position occupies inside the tile. Thus these kind of layers offer the worst performance.

90º rotated tiles and sprites may have more cache misses than regular ones, but the overall performance is somewhat similar and it's a rarely used feature. Horizontal and vertical flips are free. When a tile is fetched, some checks and assignments must be done to determine scanning pointers (accounting for flips and 901 rotations), but once they're setup, all the pixels of the tile are output straight. By this rule, 16x16 tiles offer better performance than 8x8.

However if you check the benchmarks, even a Raspberry Pi 3 has enough power to run a standard game at 60 fps. Let's say you render a 16:9 240p game: 400x240 at 60 fps, that's 5.7 MPixels/s. Pixel throughput on a Pi 3 of regular layers and sprites is between 50 - 60 MPixels/s, that is 10x than needed. That leaves plenty of room to overdraw multiple scroll planes and sprites, blending, scaling, etc. Affine layers on a Pi 3 is about 8 Mpixels/s, enough to be used smoothly but don't overdraw a lot -the SNES had just one layer on Affine mode in mode 7-. And we're talking about a Pi 3, that is the humblest thing i have on my hands. The Pi 4 doubles that performance, and any low-end PC has even more power.

Oh alright, so it's pretty fast!
I tried to do a scanline rendering algorithm and I am around 4ms per frame, at 1000 FPS

This is the algorithm I wrote :
Code:
proc blendColor(layer: Layer, colorSrc: var ColorRGBX, colorResult: color.Color) =
    case layer.blend:
    of NONE:
        colorSrc = cast[ColorRGBX](colorResult)
    of ADD:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) + colorResult)
    of SUB:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) - colorResult)
    of MOD:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) mod colorResult)
    of MIX25:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) | colorResult)
    of MIX50:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) || colorResult)
    of MIX75:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) ||| colorResult)
    of OR:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) or colorResult)
    of XOR:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) xor colorResult)
    of AND:
        colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) and colorResult)

proc paintScanlineBitmap(layer: Layer, linePtr: ptr ColorRGBX, width: int, line: int) =
    if(layer.bitmap == nil): return
    let bm = layer.bitmap
    var linePtr = linePtr
    for i in 0..<width:
        let
            idX = cast[uint](i + layer.position.x) mod cast[uint](bm.width)
            idY = cast[uint](line + layer.position.y) mod cast[uint](bm.height)

        let colIndex = bm[cast[int](idX), cast[int](idY)]
        if colIndex != 0:
            let color = bm.palette[cast[int](colIndex)]
            layer.blendColor(linePtr[], color)
            # amigafy(linePtr[])
            linePtr[].a = 255
        linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))

proc paintScanlineTilemap(layer: Layer, linePtr: ptr ColorRGBX, width: int, line: int) =
    if(layer.tilemap == nil): return
    var linePtr = linePtr
    let
        tmap = layer.tilemap
    for x in 0..<width:
        let idX = cast[uint](x + layer.position.x) mod cast[uint](tmap.widthPixels)
        let idY = cast[uint](line + layer.position.y) mod cast[uint](tmap.heightPixels)
        let tileX = idX div tmap.tileWidth.uint
        let tileY = idY div tmap.tileHeight.uint

        let tile = tmap[tileX.int, tileY.int]
        let t = tile.uint32
        if(tile.index == 0 or tile.masked):
            let xAddr = x.addr
            xAddr[].inc(tmap.tileWidth - 1)
            linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))
            continue
        let
            tileset = tmap.tilesets[tile.tileset]
        var
            pixX = idX.int mod tileset.tileWidth
            pixY = idY.int mod tileset.tileHeight
       
        # if(tile.rotate):
        #    let temp = pixY
        #    pixY = pixX
        #    pixX = temp
            # pixY = min(pixX, tileset.tileHeight - 1)
            # pixX = min(temp, tileset.tileWidth - 1)
            # pixY = tileset.tileHeight - 1 - pixY
            # pixX = tileset.tileWidth - 1 - pixX
           
        if(tile.flipV): pixY = tileset.tileHeight - 1 - pixY
        if(tile.flipH): pixX = tileset.tileWidth - 1 - pixX
        let
            tOffset = tile.index.int * tileset.tileWidth * tileset.tileHeight
            pOffset = pixY * tileset.tileWidth + pixX

            colIndex = tileset[tOffset + pOffset]

        # echo tile.tileset
        if(colIndex.int >= tileset.palette.size):
            echo tileset

        if colIndex != 0:

            let color = tileset.palette[colIndex.int]

            layer.blendColor(linePtr[], color)
            linePtr[].a = 255

        # window.screen.pix.data[index + x] = cast[ColorRGBX](color)
        # window.screen.pix.data[index + x].a = 255
        linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))



proc renderScanline(window: Window, line: int) {.inline.} =
    # echo line
    let index = line * context.width
    var myPtr = (window.screen.pix.data[index].addr)

    # Draw background color
    for i in 0..<context.width:
        myPtr[] = cast[ColorRGBX](context.backgroundColor)
        myPtr[].a = 255
        myPtr = cast[ptr ColorRGBX](cast[uint64](myPtr) + cast[uint64](sizeof(ColorRGBX)))
       
    myPtr = (window.screen.pix.data[index].addr)
    # Draw layers
    for l in context.layers:
        case l.layerType:
        of LAYER_BITMAP:
            l.paintScanlineBitmap(myPtr, context.width, line)
        of LAYER_TILEMAP:
            l.paintScanlineTilemap(myPtr, context.width, line)

    return
       

proc renderScreen(window: Window) {.inline.} =
    for j in 0..<context.height:
        if(context.lineCallback != nil): context.lineCallback(j)
        GC_disableMarkAndSweep()
        window.renderScanline(j)
        GC_enableMarkAndSweep()

I think it's fairly simple, but I wonder if I can optimize further, knowing most of my objects are reference types
Code:
e.lineCallback = (
    proc(line: int, myPtr = nil.pointer): void =
      e.layer(0).y = int(sin((TAU * line.float + offset)/32) * 4)
      e.layer(1).x = int(sin((TAU * line.float + offset)/32) * 3)
      if((line and 1) == 0):
        e.layer(0).x = int(sin((TAU * line.float + offset)/32) * 2 + x)
      else:
        e.layer(0).x = int(-sin((TAU * line.float + offset)/32) * 2 + x)
      e.backgroundColor = Color(r: line.byte, g: 255, b: 255 - line.byte)

      )
I also implemented a raster callback, however, I'm not sure if the userdata pointer is an useful feature. I took inspiration from SDL2's audio callback that allows you to pass data with a pointer.
Reply


Messages In This Thread
RE: Vertical callback or per pixel callback : I understand/How does Tilengine work? - by System64 - 08-23-2023, 06:01 PM

Forum Jump:


Users browsing this thread: 1 Guest(s)