Aforge only working for certain videos, when extracting frames as bitmaps

Guitarmonster

Member
Joined
Mar 24, 2023
Messages
19
Programming Experience
10+
I am designing a simple C# application that is using the Aforge.NET library, I already have all of the FFDSHOW required files in my build folder. The purpose of the application is to scan the video and extract each frame as a bitmap, then it takes that bitmap and analyzes different pixels. The code I am working is pulling frames and converting them to Bitmap objects, then using several GetPixel functions to get pixel color, which is analyzed for brightness. The code detects drops in brightness where the video fades to black, which is usually an indicator of an upcoming commercial break.

In short, my app needs to be able to load a video and extract one or several frames as a Bitmap object. So far it works well, except it won't work for any H.264 videos. I started noticing that MKV files didn't get me anything other than basic stream data, no frames. I tried MP4 and it gave me the same result, which tells me it's an H.264 issue and not a container issue.

Since this doesn't necessarily rely on code but FFDSHOW configuration, I don't think this is a code issue as my code is nearly identical to the sample code released by Aforge.net. Unfortunately, Aforge.net is no longer supported and they have locked their forums, so I have to ask here.

I am looking for a way to make Aforge work for me, or an alternative to what I am trying to do.

This is the code in it's most basic format:
C#:
// create instance of video reader
VideoFileReader reader = new VideoFileReader( );
// open video file
reader.Open( "test.avi" );
// check some of its attributes
Console.WriteLine( "width:  " + reader.Width );
Console.WriteLine( "height: " + reader.Height );
Console.WriteLine( "fps:    " + reader.FrameRate );
Console.WriteLine( "codec:  " + reader.CodecName );
// read 100 video frames out of it
for ( int i = 0; i < 100; i++ )
{
    Bitmap videoFrame = reader.ReadVideoFrame( );
    // process the frame somehow
    // ...

    // dispose the frame when it is no longer required
    videoFrame.Dispose( );
}
reader.Close( );

When it runs, it DOES extract the stream information including width, height, and codec name. After that there are zero frames returned. I have been searching the net like crazy and cannot seem to find anyone else with this issue, and I'm seeing a lot of coders using Aforge with FFDSHOW. So I am fairly sure this is a codec configuration issue of some sort. Any direction as to what settings to check, or for a better alternative would be greatly appreciated.
 
Dude, I can't believe I didn't think to just look into the source code! Part of me is thinking that perhaps this is an unfinished feature or just a design bug.
 
We'll let's take a look at the source code.
The entry point is here:

which then calls readVideoFrame():

which then calls DecodeVideoFrame():
DecodeVideoFrame():
// Decodes video frame into managed Bitmap
Bitmap^ VideoFileReader::DecodeVideoFrame(BitmapData^ bitmapData)
{
    Bitmap^ bitmap = nullptr;

    if (bitmapData == nullptr)
    {
        // create a new Bitmap with format 24-bpp RGB
        bitmap = gcnew Bitmap(data->VideoCodecContext->width, data->VideoCodecContext->height, PixelFormat::Format24bppRgb);

        // lock the bitmap
        bitmapData = bitmap->LockBits(
            System::Drawing::Rectangle(0, 0, data->VideoCodecContext->width, data->VideoCodecContext->height),
            ImageLockMode::WriteOnly, PixelFormat::Format24bppRgb);
    }

    uint8_t* srcData[4] = { static_cast<uint8_t*>(static_cast<void*>(bitmapData->Scan0)),
                           nullptr, nullptr, nullptr };
    int srcLinesize[4] = { bitmapData->Stride, 0, 0, 0 };

    // convert video frame to the RGB bitmap
    sws_scale(data->sws_ctx, data->VideoFrame->data, data->VideoFrame->linesize, 0,
              data->VideoCodecContext->height, srcData, srcLinesize);

    if (bitmap != nullptr)
        bitmap->UnlockBits(bitmapData); // unlock only if we have created the bitmap ourselves
    return bitmap;
}

Looking at that code, it looks like if a bitmapData is passed in, then it assumes that the Scan0 member is already set and pointing to the beginning of the bitmap data. So that entry point is pretty much pointless in my view if you want are getting a fresh frame. Looks like a design bug.

So it looks like you'll need to call the entry point which only takes the frame number and you'll need to call LockBits()/UnlockBits() yourself.

So far I'm not making sense of this when looking at the source. I'm interested in your approach but you kind of lost me, although I'm a seasoned programmer, graphics is not my strongest area. Would you be willing to share a small example with me?

If I can't get a better way for this to work, I guess I can just do what I can to enhance my code to use less resources. I could do an initial search skipping frames, then have more code to go back to potential drop points and further analyze. I could also put in logic that would skip a predetermined amount of time right after a commercial break, probably not likely that another commercial break would be only 30 or a minute since the last, so no need to scan those frames.

There is also the idea that generally in a tv show, a season may be laid out under the same format, meaning each commercial break may be around the same time as the others. So I am thinking of analyzing the first couple of episodes, having the code compare, then using that info to skip ahead through remaining files making seeking easier.

I also plan on having this run as a background service that only executes when the attached television is turned off, and I'll set it up to automatically prioritize episodes that are going to be scheduled to play in near future.

One way or the other, I intend to implement those features because I want performance and reliability to be the cornerstone of this project. The good news is a file only needs to be scanned once, then the data is in the database forever.
 
I'm interested in your approach but you kind of lost me, although I'm a seasoned programmer, graphics is not my strongest area. Would you be willing to share a small example with me?

Basically it's the same as post #8, but instead of passing in the frame number and bitmap data structure, just passing in the frame number, and get back a bitmap like you are currently doing.

You are on the right track about only needing to sample a subset of the pixels in a given frame.

I kind of like your idea of "learning" where that commercial breaks usually get put in and just start sampling around that area.

I think Tivo does the full scan instead of trying to guess at where the commercial breaks are at because I've seen it handle NFL games and NBA games where there are quick cuts to commercials.
 
Basically it's the same as post #8, but instead of passing in the frame number and bitmap data structure, just passing in the frame number, and get back a bitmap like you are currently doing.

You are on the right track about only needing to sample a subset of the pixels in a given frame.

I kind of like your idea of "learning" where that commercial breaks usually get put in and just start sampling around that area.

I think Tivo does the full scan instead of trying to guess at where the commercial breaks are at because I've seen it handle NFL games and NBA games where there are quick cuts to commercials.

So far my scanning only one frame per second is really speeding things up. However, I noticed that when scanning MKV files it's painfully slow. AVI files are quick, a 20 minute file is scanned in less than 30 seconds.

I am happy to report that I was correct so far on timing within the same show and season. I sampled the first episode of a show, then tested the rest and found each commercial break at exactly the same place right down to the timestamp, so that's a step in the right direction.

I'm considering taking a different approach by instead analyzing audio and looking for deep audio dropouts, then maybe confirming by pulling the frames. I'm sure doing this over audio would probably be a lot faster than video. I remember my first PC DVR I built had an app called BeyondTV. That program used to scan files so fast and very accurately snipped out commercials, perhaps that's the same way they did it starting with the audio.
 
You are basically constrained by the speed of your library to give you the frames to sample. Have you tried other libraries? (Sorry, I don't have any suggestions. This is far out of the stuff I dabble in.)
 
You are basically constrained by the speed of your library to give you the frames to sample. Have you tried other libraries? (Sorry, I don't have any suggestions. This is far out of the stuff I dabble in.)

I am pretty sure you're right. It seems that Accord.net is also abandoned as well. My background is more of business applications and database, I'm pretty new to the whole multimedia scene. I've seen a few mentions of newer frameworks that could possibly be used that are actually OS dependent (must have at least Windows 10) so they obviously depend on newer technologies. I think the best case scenario would be to use something that takes advantage of GPU processing like pixel shaders, but that is an area I know very little about.
 
You are basically constrained by the speed of your library to give you the frames to sample. Have you tried other libraries? (Sorry, I don't have any suggestions. This is far out of the stuff I dabble in.)

I've found the perfect solution by posting in the Graphics area of the forum. I'm simply using ffmpeg.exe blackdetect by calling the file through code, then I'm extracting the needed data from the output. It's really fast and the accuracy is incredible. I'm also able to use ffprobe.exe to get a lot of important information about the media file itself.

Blackdetect is actually designed for this exact purpose and analyzes the video looking for drops in luminance that are over a certain duration. With this I'll be able to test one single episode of a show and preview the results, tweak any settings if needed, then apply those settings and let it scan the rest of the show on it's own.

Continued in this thread:
Looking for the latest and fastest way to extract single frames from video.
 
Back
Top Bottom