Genius DM

IIS > IIS Express slow performance for initial requests. 본문

.NET

IIS > IIS Express slow performance for initial requests.

Damon Jung 2018. 8. 4. 05:00

IIS Express 에서 최초 요청이 느리게 처리되는 경우


Async await 패턴 및 Task 에 관해서 깊이 파고 들어가고 있는 요즘 다양한 시도를 하는 중에 흥미로운 테스트 결과가 나와서 Stackoverflow 에 질문까지 남기게 되었다. 이 블로그 포스트 내용과는 다르게 Async API 와 일반 API 에 대해 질문하는 내용인데, 원인이 Async 함수와 일반적인 함수에 있지 않다는 사실을 깨닫기 전이기 때문에, 그 관점으로 질문을 했던 것이다.


최초에 단순한 문자열을 반환하는 async API / 일반 API 를 아래 처럼 만들었었다.


public class TestController : ApiController
{
    [HttpGet]
    [Route("Welcome")]
    public async Task<string> Welcome()
    {
        string r = await SlowMethod();
        return r;
    }

    public async Task<string> SlowMethod()
    {
        Thread.Sleep(3000);
        return await Task.FromResult("data");
    }

    [HttpGet]
    [Route("WelcomeSync")]
    public string WelcomeSync()
    {
        string r = SlowSyncMethod();
        return r;
    }

    public string SlowSyncMethod()
    {
        Thread.Sleep(3000);
        return "data";
    }
}


이런 ASP.NET WebAPI 코드를 작성하고, jMeter 와 이 플랜을 이용하여 200개의 Thread 로 500 번씩 총 100,000 번 HTTP Request 를 발생시켰을 때의 차이점을 확인해보았다. TPS 측면에서 큰 차이점은 없었는데, "표준편차" ( Std. Dev ) 수치에서 300배에 육박하는 차이가 발생한 것을 발견했다.



Label   Samples Average Min     Max     Std.Dev             Error%    Throughput    Received KB/sec     Sent KB/sec         Avg.Bytes

Async   100000  3193    3000    [32046] 1343.6219321793749  0.0 62.534550339062335  21.557320575868168  7.69468099875181    353.0    
Sync    100000  3003    3000    [3050]  3.523236520886259   0.0 66.54055014396047   23.458143165986066  8.447530779994983   361.0


TPS 가 60 수준으로 매우 비슷한데 표준 편차 수치가 이렇게 차이가 난다는 것은, 어떤 요청은 의도대로 3초 내로 처리되었고, 어떤 요청은 3초~32초 사이로 처리 시간이 오래 걸렸다는 것을 의미하는 것이였다. 가뜩이나 async 와 hang 에 대한 키워드가 pair 로 머리 속에 있는 상태에서 자연스럽게 이런 생각을 하게 되었다. 


" 분명히 async 처리하는 도중 어떤 이유로 인해 hang 이 발생해서 그런거야 ! " 

" Sync API 는 hang 이 없으니 Min/Max 가 3초에 근접해 있잖아 !! "


그래서 이를 증명하기 위해 아래와 같이 로깅을 추가하였다.


public class TestController : ApiController
{
    static Logger logger = new LoggerConfiguration().WriteTo.File(AppDomain.CurrentDomain.BaseDirectory + "log-server.txt").CreateLogger();

    [HttpGet]
    [Route("Welcome")]
    public async Task<string> Welcome()
    {
        Guid guid = Guid.NewGuid();
            
        Stopwatch sw = new Stopwatch();
        sw.Start();
        string r = await SlowMethod();
        sw.Stop();

        logger.Information($"[{guid.ToString()}] - It took {sw.Elapsed.Seconds} seconds.");
        return r;
    }

    public async Task<string> SlowMethod()
    {
        Thread.Sleep(3000);
        return await Task.FromResult("data");
    }

    [HttpGet]
    [Route("WelcomeSync")]
    public string WelcomeSync()
    {
        Guid guid = Guid.NewGuid();

        Stopwatch sw = new Stopwatch();
        sw.Start();
        string r = SlowSyncMethod();
        sw.Stop();

        logger.Information($"[{guid.ToString()}] - It took {sw.Elapsed.Seconds} seconds.");
        return r;
    }

    public string SlowSyncMethod()
    {
        Thread.Sleep(3000);
        return "data";
    }
}


그런데 예상과는 달리, 로그에서는 꾸준히 " It took 3 seconds. " 가 출력되었다. 그리고 코드에서는 제시되지 않았지만, 아래 ThreadPool 함수를 이용해서 현재 사용 가능한 Thread 개수와 완료된 Thread 개수도 로그로 남겨봤는데


ThreadPool.GetAvailableThreads(out int workerThreads, out int completedThreads);


Worker 쓰레드는 차고 넘치게 남아있는 상태 ( 8000개 가량 ) 였고, 완료된 쓰레드도 처리 시간에 맞는 수치가 일정하게 출력되고 있었다. 결국 어떤 조건에 의해 특정 Thread 에서 Hang 이 발생했다 라는 가정은 틀린 것으로 판명될 확률이 높아졌다. 그런데 아직 확신할 수 없었다. ASP.NET WebAPI 의 HTTP Response 자체가 느린 이유를 증명해야 했고, 여전히 의문점이 남아있었다. " 왜 Sync 코드에서는 표준 편차가 없지? " 


jMeter 에서는 각 Thread 의 요청에 대한 응답 시간을 개별적으로 알 수 없으므로, jMeter 테스트 플랜과 동일한 사양으로 HTTP Request 를 발생시킬 클라이언트 코드를 작성했다.


class Program { static Logger logger = new LoggerConfiguration().WriteTo.File(Environment.CurrentDirectory + @"\log-client.txt").CreateLogger(); static void Main(string[] args) { for(int i = 0; i < 200; i++) { ThreadPool.QueueUserWorkItem(new WaitCallback(GetData)); } Console.Read(); } static async void GetData(object obj) { var client = new HttpClient(); Stopwatch sw = new Stopwatch(); for (int i = 0; i < 500; i++) { sw.Restart(); string get = await client.GetStringAsync("http://localhost:32965/Welcome"); sw.Stop(); Console.WriteLine($"CurrentThread {Thread.CurrentThread.ManagedThreadId}, Context { Thread.CurrentContext.ContextProperties.FirstOrDefault().Name }, Time passed { sw.Elapsed.Seconds }. The result was { get }"); logger.Information($"CurrentThread {Thread.CurrentThread.ManagedThreadId}, Context { Thread.CurrentContext.ContextProperties.FirstOrDefault().Name }, Time passed { sw.Elapsed.Seconds }. The result was { get }"); } } }


클라이언트 코드를 작성하고나니 문제 해결 실마리를 잡을만한 현상을 볼 수 있었다. 위 코드 실행 로그는 아래와 같이 남았다.


2018-08-04 04:48:50.030 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 3. The result was "data"
2018-08-04 04:48:50.030 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 3. The result was "data"
2018-08-04 04:48:50.030 +09:00 [INF] CurrentThread 9, Context LeaseLifeTimeServiceProperty, Time passed 3. The result was "data"
2018-08-04 04:48:50.030 +09:00 [INF] CurrentThread 10, Context LeaseLifeTimeServiceProperty, Time passed 3. The result was "data"
2018-08-04 04:48:50.632 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 4. The result was "data"
2018-08-04 04:48:51.637 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 5. The result was "data"
2018-08-04 04:48:52.631 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:52.968 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:52.969 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:52.969 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:52.973 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:53.631 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 7. The result was "data"
2018-08-04 04:48:53.637 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 7. The result was "data"
2018-08-04 04:48:54.631 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 8. The result was "data"
2018-08-04 04:48:54.640 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 8. The result was "data"
2018-08-04 04:48:55.633 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.635 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.975 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.976 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.976 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.979 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:56.633 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 10. The result was "data"
.
.
.
.
2018-08-04 04:49:17.654 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 30. The result was "data"
2018-08-04 04:49:17.660 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 30. The result was "data"
2018-08-04 04:49:17.662 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 30. The result was "data"
2018-08-04 04:49:17.668 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:17.675 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:17.676 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.634 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.644 +09:00 [INF] CurrentThread 10, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.647 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.648 +09:00 [INF] CurrentThread 15, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.651 +09:00 [INF] CurrentThread 15, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.654 +09:00 [INF] CurrentThread 15, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.661 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.668 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:18.671 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:18.675 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.640 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.646 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.646 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.656 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.659 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"


응답속도가 점진적으로 느려지면서 32초 수준까지 도달했고, 32초를 정점으로 응답 속도가 점진적으로 빨라지면서, 결국 원래 성능인 3초 로깅이 유지되는 것을 목격했다. 확실히 SlowMethod() 는 예외없이 3초에 완료가 되었지만, HTTP Response 자체가 늦게 오는 것이였다 그것도 초반에만..


이쯤에서 다시 상기되는 논리의 허점이 있었는데, " 왜 Sync 코드에서는 표준 편차가 없지? " 이 의문점이였다. 그래서 다시 한 번 클라이언트를 통해 WelcomeSync 를 호출해보았는데, 결과는 황당했다. Sync 를 먼저 실행하니, 똑같이 표준 편차 값 차이가 나는 것을 목격했다. 초반 HTTP 요청에 대한 응답을 늦게 받다가, 특정 정점을 찍은 이후로는 점점 제 성능을 찾아가면서 결국 꾸준히 3초마다 응답을 받게되는 현상이 Async 코드 실행시 현상과 100% 동일했다. 


비로소 ASP.NET WebAPI 자체의 문제이거나, 이를 구동하는 환경의 문제임이 분명해졌다. 원인이 좁혀지니, 이유도 금방 찾게 되었는데, 바로 IIS Express 의 소스 컴파일과 WorkerThread 의 재사용 문제가 그 원인이였다. 이런 문제라면 최초 테스트 결과가 왜 그렇게 나왔는지 설명이 된다. 


첫 번째로 Async 테스트가 수행되었고, IIS Express 에서 컴파일 및 해당 요청을 처리하는 Thread 생성이 완료되기까지 병목 현상이 있어 처리가 지연되어 Max 값이 올라가면서 표준 편차가 벌어졌고, 모든 준비가 끝난 이후 Sync 테스트가 수행된 것 뿐이다. 당연히 순서를 바꿔서 테스트 해보았는데, 예상에 딱 맞는 결과가 나왔다. 


이를 좀 더 구체적이고 객관적으로 증명하기 위해서 Perfview 를 돌려서 프로파일링 하거나 IIS Express 로그를 샅샅히 뒤져보았는데, 공부가 더 필요할 듯 하다. 일단 관련 URL 만 남겨두고, 확실히 증명되면 해당 내용에 대해서 다시 다뤄봐야 겠다.


ref : https://support.microsoft.com/en-us/help/961884/fix-you-experience-very-slow-performance-on-an-initial-request-to-an-a

ref : https://github.com/aspnet/KestrelHttpServer/issues/1276






정확한 원인

이 글을 쓴 것이 8월 4일이였는데, 8월 19일인 오늘 보름만에 드디어 원인을 밝혀냈다. 최초에 ThreadPool 을 의심했다는 점에서 "감" 자체가 아주 틀리진 않았는데, 조금 아쉽게 빗나갔다. ( 사실 조금 아쉽게 빗나간 것 처럼 보이지만, 이런 한 끗 차이를 메우기 위해 수 많은 시행착오와 공부, 그리고 경험이 필요한 것이다... ) 

ThreadPool.GetAvailableThreads(out int workerThreads, out int completedThreads);


내가 했어야 하는 것은 GetAvailableThreads 이 아니라, 바로 GetMinthreads 와 SetMinThreads 였다.


// 최소 쓰레드 갯수를 확인 ( CPU 에 따라 4 또는 8 ~ )
ThreadPool.GetMinThreads(out int threadCount, out int portThread);
// 최소 쓰레드 갯수를 지정
// 기본 Comment 는 아래와 같음
// 스레드 생성 및 소멸을 관리하기 위한 알고리즘으로 전환하기 전에 새 요청에 따라 스레드 풀이 만드는 스레드의 최소 개수를 설정합니다
ThreadPool.SetMinThreads(200, 8);


즉 초당 200개 씩 HTTP Request 가 날라오는 상황에서, 최초에 4개의 Thread 만이 이를 대응하고 있었고, ThreadPool 이 초당 200개의 HTTP Request 를 감당할 만큼의 Thread 를 생성할 때 까지 처리가 지연되었던 것이다.


처리가 지연되었던 이유는 Thread.Sleep(3000) 에 의해서 상태가 Waiting 으로 변경되었고, 새로운 HTTP Request 가 진입했을 때 ThreadPool 에서 현재 대여할 수 있는 Thread 가 없는 상태에서 새로운 Thread 를 생성하기 보다는, 우선적으로 대여 가능한 상태의 Thread 가 생길 때 까지 기다렸기 때문이다.


그래서 초반에만 몇 십초까지 느려지고, 특정 시점 이후로는 점점 재 성능을 되찾아가면서 종국엔 균일한 3초 Response 성능을 보여주게 된 것이다. ThreadPool 의 기본 속성은 잘 모르지만, 새로운 Thread 를 생성하는데 소극적인 것은 분명하다.


참고로 JetBrain 의 DotTrace 를 이용해서 확인해보아도, IIS Express 에서 사용한 주요 CLR Worker Process 에서 Thread.Waiting 이 42초 이상 발생했음을 알 수 있었다.




핵심

결국 핵심은 ThreadPool 에서 필요할 때 만드는 작업 Thread 최소 개수 자체를 Application 에서 요구하는 만큼 지정을 해두어야 한다는 것이였다. 이 테스트 케이스의 경우 200 개의 Thread 를 생성하여 매 초 요청을 발생시키기 때문에 최소 200개는 생성해두어야 지연현상 없이 원활하게 처리할 수 있기 때문에 200개를 설정하였다.

ThreadPool.SetMinThreads(200, 4);


이후 테스트 성능은 아무리 최초에 시작되었다고 해도, 균일하게 3초로 로깅이 됨을 확인할 수 있다. 이번 트러블 슈팅을 통해서 Perfview  분석 방법도 많이 배우고, DotTrace 사용 법도 배우고, ThreadPool 속성도 알게되고 여러모로 유익했다.



주의사항

MSDN ThreadPool 관련 글에서 SetMinThreads 에 대한 주의사항이 있어 참고로 명시하려한다. 

Caution

SetMinThreads 함수를 이용해서 쓰레드의 최소 개수를 증가시킬 수 있습니다. 그러나 불필요하게 최소 쓰레드 수를 늘리면 성능 문제가 발생할 수 있습니다. 만약 지나치게 많은 수의 작업이 동시에 시작되면 굉장히 느리게 처리되는 것 처럼 보일 수 있습니다. 대부분의 경우는 쓰레드를 할당하는 쓰레드 풀 자체의 알고리즘에 의해서 더 나은 성능을 보여줍니다.















IIS Express slow performance for initial requests.


While trials and erorrs, digging into Async await pattern and Task these days, I put up a question on Stackoverflow as an interesting test result had came out. Unlike this post content, the question is focusing on async and none async APIs. That's because I didn't realize that the difference in the test result is not coming from async stuff.


I tweaked an async API and a simple none-async API that return a simple string at first.


public class TestController : ApiController
{
    [HttpGet]
    [Route("Welcome")]
    public async Task<string> Welcome()
    {
        string r = await SlowMethod();
        return r;
    }

    public async Task<string> SlowMethod()
    {
        Thread.Sleep(3000);
        return await Task.FromResult("data");
    }

    [HttpGet]
    [Route("WelcomeSync")]
    public string WelcomeSync()
    {
        string r = SlowSyncMethod();
        return r;
    }

    public string SlowSyncMethod()
    {
        Thread.Sleep(3000);
        return "data";
    }
}


And I built up a jMeter test plan afterwards to see the differences between those two APIs by calling total 100k requests with 200 threads. There was not so much differences in TPS for those two methods, but the Std. Dev showed a significant difference, The async Std. Dev was 300x greater than the non-async one.



Label   Samples Average Min     Max     Std.Dev             Error%    Throughput    Received KB/sec     Sent KB/sec         Avg.Bytes

Async   100000  3193    3000    [32046] 1343.6219321793749  0.0 62.534550339062335  21.557320575868168  7.69468099875181    353.0    
Sync    100000  3003    3000    [3050]  3.523236520886259   0.0 66.54055014396047   23.458143165986066  8.447530779994983   361.0


The difference in Std. Dev while the throughput result is fairly same indicates that some requests were processed within 3 seconds as expected, others were taking 32 seconds at best to be done. This thoughts below were instantly coming unbidden into my mind because I was so into this async and hang behaviors. 


" It's gotta be the hang while async was doing its job for some reasons. 

" The sync API didn't hang because it's synchronous job. Look at the Min/Max values, the both of them are almost 3 seconds. "


To prove this, I added logging points.


public class TestController : ApiController
{
    static Logger logger = new LoggerConfiguration().WriteTo.File(AppDomain.CurrentDomain.BaseDirectory + "log-server.txt").CreateLogger();

    [HttpGet]
    [Route("Welcome")]
    public async Task<string> Welcome()
    {
        Guid guid = Guid.NewGuid();
            
        Stopwatch sw = new Stopwatch();
        sw.Start();
        string r = await SlowMethod();
        sw.Stop();

        logger.Information($"[{guid.ToString()}] - It took {sw.Elapsed.Seconds} seconds.");
        return r;
    }

    public async Task<string> SlowMethod()
    {
        Thread.Sleep(3000);
        return await Task.FromResult("data");
    }

    [HttpGet]
    [Route("WelcomeSync")]
    public string WelcomeSync()
    {
        Guid guid = Guid.NewGuid();

        Stopwatch sw = new Stopwatch();
        sw.Start();
        string r = SlowSyncMethod();
        sw.Stop();

        logger.Information($"[{guid.ToString()}] - It took {sw.Elapsed.Seconds} seconds.");
        return r;
    }

    public string SlowSyncMethod()
    {
        Thread.Sleep(3000);
        return "data";
    }
}


However the log keeps printing " It took 3 seconds " . This isn't what I thought it would be. Below ThreadPool API is used to get the number of the available threads and and completed threads, although the code wasn't provided here.


ThreadPool.GetAvailableThreads(out int workerThreads, out int completedThreads);


The result wasn't what I expected either. There're too many available threads ( approx 8000 ) for me to concern, and the completed threads were coming back from the individual jobs right at the time. Eventually, the assumption that certain conditions caused a hang in a particular thread is likely to prove incorrect. But it wasn't the time to make a decision. I had to creak into the reason the HTTP response was so slow in ASP.NET WebAPI and this question was still left to be solved. " Why sync code shows no difference in Std. Dev? "


Since I don't get to know the response time for each requests in jMeter, I had to make a client acting exactly like the jMeter and the test plan.


class Program { static Logger logger = new LoggerConfiguration().WriteTo.File(Environment.CurrentDirectory + @"\log-client.txt").CreateLogger(); static void Main(string[] args) { for(int i = 0; i < 200; i++) { ThreadPool.QueueUserWorkItem(new WaitCallback(GetData)); } Console.Read(); } static async void GetData(object obj) { var client = new HttpClient(); Stopwatch sw = new Stopwatch(); for (int i = 0; i < 500; i++) { sw.Restart(); string get = await client.GetStringAsync("http://localhost:32965/Welcome"); sw.Stop(); Console.WriteLine($"CurrentThread {Thread.CurrentThread.ManagedThreadId}, Context { Thread.CurrentContext.ContextProperties.FirstOrDefault().Name }, Time passed { sw.Elapsed.Seconds }. The result was { get }"); logger.Information($"CurrentThread {Thread.CurrentThread.ManagedThreadId}, Context { Thread.CurrentContext.ContextProperties.FirstOrDefault().Name }, Time passed { sw.Elapsed.Seconds }. The result was { get }"); } } }


After writing this simple HTTP client, I was able to get some logs that might help me get the clue for this performance mystery. Above code leaves below logs.


2018-08-04 04:48:50.030 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 3. The result was "data"
2018-08-04 04:48:50.030 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 3. The result was "data"
2018-08-04 04:48:50.030 +09:00 [INF] CurrentThread 9, Context LeaseLifeTimeServiceProperty, Time passed 3. The result was "data"
2018-08-04 04:48:50.030 +09:00 [INF] CurrentThread 10, Context LeaseLifeTimeServiceProperty, Time passed 3. The result was "data"
2018-08-04 04:48:50.632 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 4. The result was "data"
2018-08-04 04:48:51.637 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 5. The result was "data"
2018-08-04 04:48:52.631 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:52.968 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:52.969 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:52.969 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:52.973 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 6. The result was "data"
2018-08-04 04:48:53.631 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 7. The result was "data"
2018-08-04 04:48:53.637 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 7. The result was "data"
2018-08-04 04:48:54.631 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 8. The result was "data"
2018-08-04 04:48:54.640 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 8. The result was "data"
2018-08-04 04:48:55.633 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.635 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.975 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.976 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.976 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:55.979 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 9. The result was "data"
2018-08-04 04:48:56.633 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 10. The result was "data"
.
.
.
.
2018-08-04 04:49:17.654 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 30. The result was "data"
2018-08-04 04:49:17.660 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 30. The result was "data"
2018-08-04 04:49:17.662 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 30. The result was "data"
2018-08-04 04:49:17.668 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:17.675 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:17.676 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.634 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.644 +09:00 [INF] CurrentThread 10, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.647 +09:00 [INF] CurrentThread 14, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.648 +09:00 [INF] CurrentThread 15, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.651 +09:00 [INF] CurrentThread 15, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.654 +09:00 [INF] CurrentThread 15, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.661 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 31. The result was "data"
2018-08-04 04:49:18.668 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:18.671 +09:00 [INF] CurrentThread 12, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:18.675 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.640 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.646 +09:00 [INF] CurrentThread 11, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.646 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.656 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"
2018-08-04 04:49:19.659 +09:00 [INF] CurrentThread 16, Context LeaseLifeTimeServiceProperty, Time passed 32. The result was "data"


The response time was getting slower and slower until it hit 32-second response time and slowly falling down to its genuine performance, 3 seconds, from its zenith. Obviously the SlowMethod took just 3 seconds, yet something was dragging down the HTTP response in the initial stage of the processes.


I still wonder why the sync method doesn't have any Std. Dev difference. That's the biggest whole in this theory. So I gave it another shot with my simple HTTP client for WelcomeSync API, and the result made me puzzled. It behaved exactly the same as the async API; Slow at first phase, but gradually recovering its pace and eventually processes the incoming requests within 3 seconds.


This is the exact point where I'm confidently sure that the environment or ASP.NET WebAPI is in the behind scene for this performance issue. As I got the culprit, I was able to find what's dragging the performance down. IIS Express compiles the source code at first and reuses worker threads, and this is the reason. It all adds up why the test result came out like that if this is the actual reason.


The initial test was async and IIS Express was fully ready while processing the async requests after all the compiling and generating threads. The bottleneck had been made during the warming up phase and that made all the difference in Std. Dev. Sync was able to get passed faster in constant pace because IIS Express didn't need any warming up at all. Switching the test order puts out the same result, as I expected.


To be more specific and analytic on this problem, I was trying to profile the internal jobs with Perfview and scoure IIS Express logs, but I need to get aquianted with the tool or learn more about IIS Express. I'm going to put up another post when I'm ready for that.


ref : https://support.microsoft.com/en-us/help/961884/fix-you-experience-very-slow-performance-on-an-initial-request-to-an-a

ref : https://github.com/aspnet/KestrelHttpServer/issues/1276





The exact, true culprit

I wrote this article on 4th, August and I've finally figured this out today, 19 August. I pat myself a little bit in that I suspected the ThreadPool, it was a right approach but I was fading away from the right path. ( In fact, this slight gap is quite huge. You need to fill up the void with so much trial errors and learning, and experiences. )

ThreadPool.GetAvailableThreads(out int workerThreads, out int completedThreads);


What I should have tried was not GetAvailableThreads, but GetMinthreads and SetMinthreads.


// Retrieves the minimum number of threads the thread pool creates on demand
ThreadPool.GetMinThreads(out int threadCount, out int portThread);
// Set the minimum number of threads.
// This is the basic comment in MSDN
// Sets the minimum number of threads the thread pool creates on demand, as new requests are made,
// before switching to an algorithm for managing thread creation and destruction.
ThreadPool.SetMinThreads(200, 8);


With the default, untouched settings, the web api deals with the default number of threads in ThreadPool when the 200 http requests per second is in flight. Since it just cannot deal with the default 4 threads, the delay has been made while ThreadPool was creating enough threads to deal with the flyting, simultaneous requests.


To be specific, the ThreadState was changed to Waiting by Thread.Sleep(3000), and when a new HTTP request came into the HTTP windows, the ThreadPool would rather choose to wait until another thread will become available from the waiting state than create another new thread.


This explains this weird performance dragging at the initial phase, and the mystery of the incremental recovery from the slow performance. No one would know about the exact ThreadPool algorithm except those who made it, but it's obvious that ThreadPool doesn't like to create a thread unless it comes across an inevitable circumstance.


DotTrace from JetBrain tells me that one of the IIS Express processes, a CLR Worker, was waiting for 42 seconds for something. What it was waiting for was probably one of the threads in the ThreadPool becoming available again.




Conclusion

So you have to get ThreadPool ready for the amount of jobs you'd expect before running you application by setting the number of minimum threads in it. In this case, the bunch of 200 HTTP requests occur every seconds, so you need at least about 200 threads in your ThreadPool to prevent the performance dragging.

ThreadPool.SetMinThreads(200, 4);


After this setup, no matter how initial phase it is, the webapi can response within exactly 3 seconds. I've learnt a lot about Perfview and how to analyze an app with it, DotTrace, and the ThreadPool's nature, it's been really good time for me.



Caution

There's an article about ThreadPool, and it states a caution for using ThreadPool.SetMinThreads. I want to specify the warnings here.

Caution

You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases the thread pool will perform better with its own algorithm for allocating threads.















'.NET' 카테고리의 다른 글

IdentityServer > Allow specific users in a client.  (0) 2018.08.11
IIS > IIS Express & IIS applicationhost.config path  (0) 2018.08.06
C# > Span<T>  (0) 2018.07.31
C# > Tuples  (0) 2018.07.31
Generics > Covariance and Contravariance  (0) 2018.07.14
Comments