08-23-2023, 06:01 PM
(08-23-2023, 02:54 PM)megamarc Wrote: Hi!
Basic scaling (Neo-Geo style) is quite cheap, if fact zoomed-in layers have more pixel throughput than regular layers, because as each tile covers more screen space, fewer calls are required to fetch the next tile. Affine transformations (SNES-style) are the opposite: as the source and destination scanning are not parallel, every single pixel must fetch its parent tile, and what position occupies inside the tile. Thus these kind of layers offer the worst performance.
90º rotated tiles and sprites may have more cache misses than regular ones, but the overall performance is somewhat similar and it's a rarely used feature. Horizontal and vertical flips are free. When a tile is fetched, some checks and assignments must be done to determine scanning pointers (accounting for flips and 901 rotations), but once they're setup, all the pixels of the tile are output straight. By this rule, 16x16 tiles offer better performance than 8x8.
However if you check the benchmarks, even a Raspberry Pi 3 has enough power to run a standard game at 60 fps. Let's say you render a 16:9 240p game: 400x240 at 60 fps, that's 5.7 MPixels/s. Pixel throughput on a Pi 3 of regular layers and sprites is between 50 - 60 MPixels/s, that is 10x than needed. That leaves plenty of room to overdraw multiple scroll planes and sprites, blending, scaling, etc. Affine layers on a Pi 3 is about 8 Mpixels/s, enough to be used smoothly but don't overdraw a lot -the SNES had just one layer on Affine mode in mode 7-. And we're talking about a Pi 3, that is the humblest thing i have on my hands. The Pi 4 doubles that performance, and any low-end PC has even more power.
Oh alright, so it's pretty fast!
I tried to do a scanline rendering algorithm and I am around 4ms per frame, at 1000 FPS
This is the algorithm I wrote :
Code:
proc blendColor(layer: Layer, colorSrc: var ColorRGBX, colorResult: color.Color) =
case layer.blend:
of NONE:
colorSrc = cast[ColorRGBX](colorResult)
of ADD:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) + colorResult)
of SUB:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) - colorResult)
of MOD:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) mod colorResult)
of MIX25:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) | colorResult)
of MIX50:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) || colorResult)
of MIX75:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) ||| colorResult)
of OR:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) or colorResult)
of XOR:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) xor colorResult)
of AND:
colorSrc = cast[ColorRGBX](cast[color.Color](colorSrc) and colorResult)
proc paintScanlineBitmap(layer: Layer, linePtr: ptr ColorRGBX, width: int, line: int) =
if(layer.bitmap == nil): return
let bm = layer.bitmap
var linePtr = linePtr
for i in 0..<width:
let
idX = cast[uint](i + layer.position.x) mod cast[uint](bm.width)
idY = cast[uint](line + layer.position.y) mod cast[uint](bm.height)
let colIndex = bm[cast[int](idX), cast[int](idY)]
if colIndex != 0:
let color = bm.palette[cast[int](colIndex)]
layer.blendColor(linePtr[], color)
# amigafy(linePtr[])
linePtr[].a = 255
linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))
proc paintScanlineTilemap(layer: Layer, linePtr: ptr ColorRGBX, width: int, line: int) =
if(layer.tilemap == nil): return
var linePtr = linePtr
let
tmap = layer.tilemap
for x in 0..<width:
let idX = cast[uint](x + layer.position.x) mod cast[uint](tmap.widthPixels)
let idY = cast[uint](line + layer.position.y) mod cast[uint](tmap.heightPixels)
let tileX = idX div tmap.tileWidth.uint
let tileY = idY div tmap.tileHeight.uint
let tile = tmap[tileX.int, tileY.int]
let t = tile.uint32
if(tile.index == 0 or tile.masked):
let xAddr = x.addr
xAddr[].inc(tmap.tileWidth - 1)
linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))
continue
let
tileset = tmap.tilesets[tile.tileset]
var
pixX = idX.int mod tileset.tileWidth
pixY = idY.int mod tileset.tileHeight
# if(tile.rotate):
# let temp = pixY
# pixY = pixX
# pixX = temp
# pixY = min(pixX, tileset.tileHeight - 1)
# pixX = min(temp, tileset.tileWidth - 1)
# pixY = tileset.tileHeight - 1 - pixY
# pixX = tileset.tileWidth - 1 - pixX
if(tile.flipV): pixY = tileset.tileHeight - 1 - pixY
if(tile.flipH): pixX = tileset.tileWidth - 1 - pixX
let
tOffset = tile.index.int * tileset.tileWidth * tileset.tileHeight
pOffset = pixY * tileset.tileWidth + pixX
colIndex = tileset[tOffset + pOffset]
# echo tile.tileset
if(colIndex.int >= tileset.palette.size):
echo tileset
if colIndex != 0:
let color = tileset.palette[colIndex.int]
layer.blendColor(linePtr[], color)
linePtr[].a = 255
# window.screen.pix.data[index + x] = cast[ColorRGBX](color)
# window.screen.pix.data[index + x].a = 255
linePtr = cast[ptr ColorRGBX](cast[uint64](linePtr) + cast[uint64](sizeof(ColorRGBX)))
proc renderScanline(window: Window, line: int) {.inline.} =
# echo line
let index = line * context.width
var myPtr = (window.screen.pix.data[index].addr)
# Draw background color
for i in 0..<context.width:
myPtr[] = cast[ColorRGBX](context.backgroundColor)
myPtr[].a = 255
myPtr = cast[ptr ColorRGBX](cast[uint64](myPtr) + cast[uint64](sizeof(ColorRGBX)))
myPtr = (window.screen.pix.data[index].addr)
# Draw layers
for l in context.layers:
case l.layerType:
of LAYER_BITMAP:
l.paintScanlineBitmap(myPtr, context.width, line)
of LAYER_TILEMAP:
l.paintScanlineTilemap(myPtr, context.width, line)
return
proc renderScreen(window: Window) {.inline.} =
for j in 0..<context.height:
if(context.lineCallback != nil): context.lineCallback(j)
GC_disableMarkAndSweep()
window.renderScanline(j)
GC_enableMarkAndSweep()
I think it's fairly simple, but I wonder if I can optimize further, knowing most of my objects are reference types
Code:
e.lineCallback = (
proc(line: int, myPtr = nil.pointer): void =
e.layer(0).y = int(sin((TAU * line.float + offset)/32) * 4)
e.layer(1).x = int(sin((TAU * line.float + offset)/32) * 3)
if((line and 1) == 0):
e.layer(0).x = int(sin((TAU * line.float + offset)/32) * 2 + x)
else:
e.layer(0).x = int(-sin((TAU * line.float + offset)/32) * 2 + x)
e.backgroundColor = Color(r: line.byte, g: 255, b: 255 - line.byte)
)