Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autotune seems to be "forgetting" after a short delay within the same session. #355

Open
MindMusic opened this issue Oct 30, 2018 · 1 comment

Comments

@MindMusic
Copy link

I'm trying to process 128x128 pixel images real-time on a low-powered PC. I've been profiling my code by programmatically recording millisecond timings. We've created a CAE.pb file and it seems to run properly, and the time that the session.run() call is made in rapid succession (using the same session each time), the time it takes decreases (up to a certain point). Specifically, when I process the first image takes 1400ms, second image 1150ms, third image 900ms... at the 7th our 8th image it levels off at 350ms per image processing. The problem is that this speed increase only increases if I push images to TensorFlow in rapid succession. If I pause for more than approx 350ms, then the time it takes to process the next image (with the same session being reused the whole time), jumps back up to 1400ms. I'm assuming that the decreasing time is due to autotune running. However, autotune seems to "forget" it's session values if I don't keep it running and fed with new data constantly. Am I doing something wrong or is this a bug? Is there a RunOption or SessionOption that I'm unaware of that controls how long until autotune (or what I'm presuming is autotune) "forgets"? :P

Note: Unlike in Issue #330 and #345, I'm being very careful to reuse my session. And strangely, when my co-worker runs the same CAE.pb in python using TensorFlow directly, the file seems to process as expected (ie. first time slow, subsequent times super fast). What's going on?

@MindMusic
Copy link
Author

EXAMPLE PROGRAM:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using TensorFlow;

namespace TensorFlow_Test
{
    class Program
    {
        public static Stopwatch oTimer = new Stopwatch();                                 // Stopwatch for elapsed program execution time in ms.
        private static long lLastElapsedTicks = 0;                                        // Number of ticks elapsed at the last timing fence.
        public static long lTicksPerMicrosecond = Stopwatch.Frequency / (1000L * 1000L);  // Number of ticks per microsecond.

        static void Main(string[] args)
        {
            TFSession m_oCaeSession;
            TFGraph m_oCaeGraph;

            oTimer.Start();

            m_oCaeGraph = new TFGraph();
            byte[] acCaeModel = File.ReadAllBytes("CAE.pb");
            m_oCaeGraph.Import(acCaeModel, "");
            TFSessionOptions oTFOptions = new TFSessionOptions();
            m_oCaeSession = new TFSession(m_oCaeGraph, oTFOptions);

            float[,,,] afSpectrogram = new float[1, 128, 128, 1];
            var oCaeInputTensor = new TFTensor(afSpectrogram);
            LogSpeed("Setup");

            for (int iLoop = 1; iLoop < 1000; iLoop++)
            {
                TFBuffer oWhatever = new TFBuffer();
                var oCaeOutputTensor = m_oCaeSession.Run(new TFOutput[] { new TFOutput(m_oCaeGraph["input/X"]) }, new TFTensor[] { oCaeInputTensor }, new TFOutput[] { new TFOutput(m_oCaeGraph["MaxPool2D_4/MaxPool"]) }, null, oWhatever, new TFBuffer(new byte[] { 0x08, 0x03 }), null);
                LogSpeed("PerformCAE #" + iLoop.ToString());

                if (iLoop%10==0)
                {
                    System.Threading.Thread.Sleep(1000);
                    LogSpeed("Delay");
                }
            }
        }

        public static void LogSpeed(String p_sFenceName)
        {
            long lElapsedTicks = oTimer.ElapsedTicks;
            Console.WriteLine(((lElapsedTicks - lLastElapsedTicks) / lTicksPerMicrosecond) + " microseconds to " + p_sFenceName);
            lLastElapsedTicks = lElapsedTicks;
        }
    }
}

EXAMPLE OUTPUT:

2018-10-30 14:57:02.646680: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
668414 microseconds to Setup
1470936 microseconds to PerformCAE #1
874811 microseconds to PerformCAE #2
761587 microseconds to PerformCAE #3
658079 microseconds to PerformCAE #4
551307 microseconds to PerformCAE #5
378162 microseconds to PerformCAE #6
370983 microseconds to PerformCAE #7
390558 microseconds to PerformCAE #8
376005 microseconds to PerformCAE #9
383651 microseconds to PerformCAE #10
1267764 microseconds to Delay
1302333 microseconds to PerformCAE #11
1137831 microseconds to PerformCAE #12
864681 microseconds to PerformCAE #13
713534 microseconds to PerformCAE #14
635954 microseconds to PerformCAE #15
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant