<< return to Pixycam.com

1296X976 versus 316X208 : A Zoom-In Function?


I see that the actual resolution of the imaging chip is 1296X976 but when the Pixy2 is doing color tracking it uses only 316X208 resolution.

My first question is how does it down-grade the image from that high resolution to that very low resolution? Is it choosing every 4th pixel in every 4th row? Is it averaging pixels in some 4X4 set of pixels? Or what?

My other question is this: Could there be a function included, when driving it from an Arduino or similar processor, that allows the Pixy2 to be told WHICH 316X208 section of the overall 1296X976 array to be used for its image processing? I can envision a situation where the controlling program KNOWS approximately where in the field of view the desired object resides, and such a function call could effectively tell the Pixy2 to ZOOM IN on just that section of the image for processing. In fact, in my particular application that’s exactly what I need.


Hello Roger,
The downscaling of the image is done by averaging the pixels, not cropping the image. This way you still get a the same field-of-view as well as reduced noise.

Your suggestion about changing resolution and field-of-view is interesting. It may be possible through manipulating various registers on the camera chip, but would require a modification of the firmware. The short answer is that it’s not supported, but it might be possible. I’ll add to our list of potential future features. :slight_smile:




Thank you for replying to my post.

I would think that averaging the pixels could easily result in RGB values for the downscaled image that likely will not match with the color signatures that it’s looking for, even if a few of the original pixels were very good matches. Certainly just arbitrarily selecting one pixel (e.g. the central one) from each set of to-be-downscaled pixels isn’t a very good approach either. But I’d think you’d get a lot of missed signature matches.

As far as my suggestion about allowing for the selection of a specified area of the image to be used (UN-downscaled), I’ve been looking through the firmware source code to try to get a good handle on how it’s all being done, in hopes that I might actually be able to make a contribution to the project and add that capability. I could really, really use it for the project that I want to do with the Pixy2. I’ve got over 40 years experience as a software engineer, so it’s just possible that I might be able to do that. BUT, the source code doesn’t give a really good idea of what’s actually going on inside the firmware. Is there any documentation that gives a good overview of how it all works? For example, I could NOT find any place in the firmware where it actually does that downscaling or acquiring the pixel data from the image chip.

And in general, is there any documentation that describes how it’s doing signature-matching? My initial testing of the device seems to indicate that it’s pretty sensitive to the variations in the brightness of the object, that maybe it’s looking for specific RGB values and if the target pixel is brighter or dimmer, but with essentially the
same color, it misses it as a match.

Some time ago, before the Pixy products were available, I tried doing basically the same thing that Pixy is doing, but using just an Arduino and a camera. The “signatures” that I used essentially captured the RGB values as RELATIVE to one another, so that even if a target pixel was lighter or darker it could still recognize a match. So I’m curious as to how it’s actually being done in the Pixy products. Is there any documentation on that?

  • Roger Garrett


Hello Roger,
I’m sorry that the source code isn’t well documented. It’s been on our list of things to do, but it keeps getting pushed down.

Regarding the basic method, Pixy converts the RGB space into 2-space – (R-G)/Y and (B-G)/Y where Y is R+G+B. This 2-space is the detection space from which the signatures are constructed.

The code you’re referencing is mostly written in assembly.

frame_m0.c handles clock sync, grabbing raw frames, including downscaling
rls_m0.c handles the initial component differencing and doing a lookup to see if a given pixel could potentially be part of a signature. Blocks.cpp handles the rest – dividing by Y, final thresholding, connected components — I’m just talking about the CCC algorithm.

I’ve probed around the code quite a bit. It’s not well documented, agreed.



Thank you for all the info.

As a software engineer I’ve encountered lots and lots of code that is, shall we say, lacking in documentation. It often doesn’t seem to be a priority or a requirement. The engineer rightfully sees his job as getting it working as quickly as possible, but the supervisor should also be concerned about maintenance. In so many cases, though, the supervisor is also an engineer and abides more by the “get it done” philosophy than the “get it done right” philosophy. I learned early on to include documentation right in the code, so that if someone else ever needs to look at, understand it, or fix it, all the info is right there in the source. Even, as a retired engineer, whenever I write code that only I will ever see, I continue to do in-line documentation.


I’ve been looking at the documentation for the Aptina MT9M114, the image sensor and processing chip that’s used on the Pixy, and it sure does look like it’s fairly straightforward to specify how much and which part of the image to return. SO it looks like it ought to be fairly straightforward to add a few methods to the Arduino (and other) APIs to handle the functionality of, essentially, “zooming in” on an area of interest. I’m thinking that that would be a welcome addition to the Pixy’s fuctionality.

I’ll continue to look over the code (thanks for pointing out which files are the relative ones) and see what I can come up with.

  • Roger


Way back in December when I first purchased the Pixycam I quickly realized that the 1296X976 high resolution of the imaging chip actually is not used by the image analysis software, but rather it averages over groups of pixels in order to get a smaller set of pixels that the analysis software can handle. Looking through the documentation of the various chips it became clear that it’s actually possible, by setting certain parameters on the chips, to effectively have it selected a 316X208 SUBSET of the pixels on the chip, effectively zeroing in on a section of the pixels and to do the image analysis on just that subsection. Looking through the software that actually interfaces with the chips and which provides the functions that are callable by user code, I found that your software people actually recognized this and included some definitions and commented-out sections in their code that essentially recognizes that possible capability but which was NOT IMPLEMENTED in the set of available functions.

I had contacted you back then and you very kindly responding, noting that it could indeed be possible, that it wasn’t currently implemented, but that you would put it on the list for potential future releases.

So, my question now, 8 months later, is this: Has that capability been added to the functionality provided by the library of functions available to the user?

I would really like to get back to work on my project which absolutely needs that kind of capability?

  • Roger Garrett


Hello Roger,
I’m told this is a difficult functionality to implement and that it hasn’t been added. Let me ask you this – can you describe what you are trying to do and how this functionality will help? The context will help us :slight_smile:



Well, first of all, I’m a career software engineer. When I first came across this issue with Pixycam I took a good hard look at the software that’s supplied with it as well as the specs for the various chips. From the specs I could see that it does indeed have the ability to set certain settings so that a sub-section of the overall, high-resolution, image field can be selected. And in the software written by the Pixycam engineers there are constants defined specifically for that purpose as well as code that sure looks like it’s intended precisely for the purpose of zeroing in on subsections of the chip’s pixels. All of those constants and code is commented out, but it sure looks like at least one of the engineers recognized the potential and at least attempted to set things up so that it could be possible.

As for my project: I want to provide a low-cost means of providing high-accuracy location data to a mobile robot. In particular it would be used by small Christmas tree farms to control a mowing robot to keep the grass cut between the trees. That effort, to keep the grass cut, is a time-consuming and expensive part of business. And while general robot mowers do indeed exist they aren’t suitable for this particular application because the location data that they can access, like GPS, is to way too inaccurate, it only tells them where they are within about ten feet. For this mowing-between-the-trees application the mowing robots need to know their location to within inches, in order to not only avoid bumping into existing, mostly-grown, trees but also to avoid mowing OVER newly planted seedlings.

My idea is to position several digital cameras, with the image processing capabilities like those in the Pixycam, around the corners of the field. The mowing robot will have a vertical shaft, one that’s tall enough to reach above the height of the full-grown trees and thereby be visible to all the cameras. The top of the shaft would have a set of uniquely colored bands to make it stand out against the generally-green background of all the trees. Each camera would track the location of that shaft, and implicitly the robot, by analyzing its field of view and identifying where within that field of view the colored bands exist. Each camera would know it’s own position relative to the others, know the extent of its field of view, and be able to calculate and angle value for the detected shaft. The cameras would transmit that data to the robot in realtime and the robot’s onboard processor would use that data to very accurately calculate its position within the field, allowing it to easily traverse the field, cutting the grass, while avoiding the trees.

A similar approach would be initially used to map out the locations of all the trees and that data stored in the robot’s memory.

The overall idea is that the farmer, recognizing that the field needs cutting, can just turnon the robot, perhaps tell it WHICH field needs mowing, and send it on its way to do the work, while the farmer is free to attend to the hundreds of other tasks he has to do.

This project could easily lead to a very viable product, not just for Christmas tree farmers but by numerous other agricultural ventures.

While the Pixycam, as configured for low-resolution use of the image chip, is fine for small work areas,like a table top, or small room, it appears to me that by allowing the camera to selectively zero in on specific high-resolution subsections it could be useful for much larger operating areas.

My hope is that Pixycam will make that zero-in capability accessible to us programmers so that we can experiment with its capabilities. And while the Pixycam itself might not actually be suitable for an area as large as a farmer’s field or be able to operate in the harsh outdoor environment,it would certainly provide a TEST BASE for exploring the possibilities.

And if it all works out as I plan, I would certainly be looking for some company that has the image processing technologies(Pixycam?) and that could turn it into a full-fledged product or at least license out the technology.

I’m convinced that I can’t be the only person who could make use of the zero-in capability. At the very least it would provide even more functionality that could spur even more inventive exploration by its users. You can only go so far with tracking a bouncing ball or following a line. Increase it’s capabilities and you increase your market.


Hi Roger,
Cool stuff! :slight_smile: So you want the ability to digitally zoom in on certain parts of the image, is that correct?



Yes, exactly.

The existing software that analyzes the image operates only on a 316 X 208 set of pixels. ( I understand that, and it’s pretty clear that it’s for processing speed and other considerations in designing the chips that accomplish it. ) It currently accomplishes this by first doing an averaging over 4 X 4 sets of pixels. As far as I can see that averaging is actually a function accomplished by the pixel image chip itself, having certain parameters set on the chip to tell it to do that averaging. In fact, it appears that there is also a parameter for the chip that tells which pixel (of the 1296 X 976 array of pixels) is to be the the upper left-hand corner where the averaging starts. By default it’s using the 0,0 pixel of the 1296 X 976 array and telling it to average 4 X 4 sets of pixels, because it can only handle 316 X 208 color values. BUT it appears that it is possible to set ANY location within the 1296 X 976 array and to do the averaging with 4 X 4 pixels, 3 X 3 pixels, 2 X 2 pixels or 1 X 1 pixels, that last one basically telling it to not do any averaging at all.

For the way the PixyCam is currently set up it always specifies the 0,0 pixel as the starting point and 4 X 4 as the averaging size. And it does that so it can essentially use the entire field of view of the image chip in spite of the fact that the PixyCam image processing chip can only handle a 316 X 208 array of pixels.

BUT, there are certainly applications where the user would want to not only have access to the entire 1296 X 976 array but also make effective use of that RESOLUTION, to use all pixels, not just averaged 4 X 4 pixels. And he would be able to do that if he could tell it to do the analysis on a 316 X 208 SUBSECTION of the full image chip array, and then a different subsection, and yet another subsection. In fact, in just four sub-section operations it could get the analysis of the ENTIRE 1296 X 976 array at its full resolution. Of course, he could potentially zoom in on ANY sub-area if his program wanted to, not just “quadrants” of the high-res image.

Basically, why have a high res pixel chip and NOT provide a means to make use of that really nice high resolution? The parameters are there in the image chip to do that. Why not make that selection available via a couple of additional function calls, SetOrigin(x, y) and SetResolution(res) where res would be one of 4 constant values: Res4X4, Res3X3, Res2X2, and Res1X1?

In my particular application the robot can easily be quite far away so that the specially colored mast bands might only show up as a few pixels in the high-res image chip. But when that gets averaged by the 4X4 averaging process the color will basically be lost. If, instead, I can “zoom-in”, which might actually require a couple of such zooms in order to find it, there won’t be any data loss, the color will not be averaged, and I (should be able to) find the thing that I’m looking for.


Hello Roger,
I’m told that it is a technically difficult feature to implement, but you’ve made your case

Thank you :slight_smile:


Could you perhaps have your software guy contact me directly ([email protected]) so I could discuss this with him, Frankly,having looked carefully at the software and the chip specs, it sure looks to me like it’s fairly straightforward and that whoever originally wrote the main software had a zoom-in capability in mind all along.