TIMEOUT requests && then Duplicate Requests && then Threads block

classic Classic list List threaded Threaded
6 messages Options
| Threaded
Open this post in threaded view
|

TIMEOUT requests && then Duplicate Requests && then Threads block

Joe Maimon
At least I think thats the sequence of events I am troubleshooting now.


Here is a snippet in request_process.c


         case REQUEST_FAIL_SERVER_TIMEOUT:
                 radlog(L_ERR, "TIMEOUT for request %d in module %s,
component %s",
                        request->number,
                        request->module ? request->module : "<server core>",
                        request->component ? request->component :
"<server core>");

                 /*
                  *      Tell the server to stop processing the request
                  */
                 request->options |= RAD_REQUEST_OPTION_STOP_NOW;
/* This is the line I have added?? */
                 request->finished = TRUE;
                 break;






-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/devel.html
| Threaded
Open this post in threaded view
|

Re: TIMEOUT requests && then Duplicate Requests && then Threads block

Alan DeKok
Joe Maimon <[hidden email]> wrote:
>                  request->options |= RAD_REQUEST_OPTION_STOP_NOW;
> /* This is the line I have added?? */
>                  request->finished = TRUE;

  And the server will likely core dump.

  The "finished" flag is used also as a "it's OK to free() the
request" flag.  If the thread is still accessing it, it's a SEGV.

  The solution is to take those "dead" requests, and move them to a
*separate* list, which is cleaned up periodically.  That way old
requests don't prevent new ones from coming in.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/devel.html
| Threaded
Open this post in threaded view
|

Re: TIMEOUT requests && then Duplicate Requests && then Threads block

Joe Maimon


Alan DeKok wrote:

> Joe Maimon <[hidden email]> wrote:
>
>>                 request->options |= RAD_REQUEST_OPTION_STOP_NOW;
>>/* This is the line I have added?? */
>>                 request->finished = TRUE;
>
>
>   And the server will likely core dump.
>
>   The "finished" flag is used also as a "it's OK to free() the
> request" flag.  If the thread is still accessing it, it's a SEGV.
>
>   The solution is to take those "dead" requests, and move them to a
> *separate* list, which is cleaned up periodically.  That way old
> requests don't prevent new ones from coming in.
>
>   Alan DeKok.
> -
> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/devel.html
>
>
No segvs, instead the requests are being cleaned up.

Examining the request structure suggests that the TIMEOUT requests never
hit rad_decode() or rad_respond()

(username is null .....)

I am working on the theory now that they were never dequeued.

This appears to be a thread spawning issue since it doesnt appear to
happen under -s


How can I tell how many entries are in the queue?
(theory: at the time of the semaphore there are more requests in the
queue than waiting threads)

lrad_hash_table_num_elements(thread_pool.queue)

does not appear to return anything other than an ever incrementing number.

Joe
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/devel.html
| Threaded
Open this post in threaded view
|

Re: TIMEOUT requests && then Duplicate Requests && then Threads block

Alan DeKok
Joe Maimon <[hidden email]> wrote:
> Examining the request structure suggests that the TIMEOUT requests never
> hit rad_decode() or rad_respond()

  Ah... that would be OK, then.

> I am working on the theory now that they were never dequeued.

  Yup.

> How can I tell how many entries are in the queue?
> (theory: at the time of the semaphore there are more requests in the
> queue than waiting threads)
>
> lrad_hash_table_num_elements(thread_pool.queue)
>
> does not appear to return anything other than an ever incrementing number.

  Which would appear to indicate that there's a bug....

  Hmm... I think that the lrad_hash_table_delete() function is being
called in the wrong place in threads.c.  The order SHOULD be:

  entry = finddata(queue, head)
  if (!entry) {
     head++;
     // DON'T delete, entry==NULL means there's nothing to delete
  }
  ...

  delete(queue, head++);


  I think the current order of "head++, delete(queue, head)" is a
little dumb.

  If that works for you, I'll commit a fix.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/devel.html
| Threaded
Open this post in threaded view
|

Re: TIMEOUT requests && then Duplicate Requests && then Threads block

Joe Maimon


Alan DeKok wrote:

>
>   I think the current order of "head++, delete(queue, head)" is a
> little dumb.
>

Meaning the queue has spurious entries in it and eventualy the server
blocks.

>   If that works for you, I'll commit a fix.

Appears to work great.

>
>   Alan DeKok.
> -
> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/devel.html
>
>
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/devel.html
| Threaded
Open this post in threaded view
|

Re: TIMEOUT requests && then Duplicate Requests && then Threads block

Alan DeKok
Joe Maimon <[hidden email]> wrote:
> >   If that works for you, I'll commit a fix.
>
> Appears to work great.

  OK.  I've committed a fix.

  I also think I'll move the queue (fifo) to to src/lib, and set up
multiple fifo's in threads.c.  This could help server stability in
overload situations.

  e.g. Handle responses from proxies FIRST, then EAP (ongoing
conversations), and finally brand new requests.

  In an overload situation, it's better to drop new requests on the
floor than to drop partially finished conversations.

  My only concern is starvation, but I think that can be addressed.

  Alan DeKok.

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/devel.html