Quantcast

cannot get core dump of crashing freeradius

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

cannot get core dump of crashing freeradius

Jakob Hirsch
Hi,

we have freeradius 2.1.8 running on a couple of servers and are very
happy with it. But every few days FR crashes on one of the servers (a
random one, not always the same). The load is significant (average 150
requests/s per server, 400/s peak) but sureley not too high. So
everything seems to run fine besides the annoying crashes, which alarms
people and make the weekly availibility reports look bad (even though FR
is restarted automatically, of course). The previous 1.1.8 installation
we upgraded 6 months ago from did not have this problem.

Anyways, I really want to find out what's going wrong, so I wanted to
get core dumps of these crashes. Only that I just don't get them.
- radiusd.conf has allow_core_dumps = yes (and FR says "Info: Core dumps
are enabled." at startup)
- /proc/sys/kernel/core_pattern is set to '/tmp/core.%t.%e.%p', so core
dumps can be written to disk (tested with a little programm that forces
a segfault)
- I put "ulimit -c unlimited" in the startup script.
cat /proc/$(pidof freeradius)/limits shows "unlimited" for soft and hard
limit of "Max core file size"

So what's missing? The only indication of the crash is this line in syslog:

> Apr 10 17:57:19 xxxxxxxx kernel: [12268615.000288] freeradius[14846]: segfault at 73818 ip 00007f0cb40e875e sp 00007fff9c6304c0 error 4 in libfreeradius-radius-2.1.8.so[7f0cb40d1000+1f000]

(This is debian lenny x86_64, btw.)

Any hints?
I even thought about running FR as a foreground process or even with
gdb, but I wanted to check here first.


Regards and thanks in advance,
Jakob

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Alan DeKok-2
Jakob Hirsch wrote:
> we have freeradius 2.1.8 running on a couple of servers and are very
> happy with it. But every few days FR crashes on one of the servers (a
> random one, not always the same). The load is significant (average 150
> requests/s per server, 400/s peak) but sureley not too high. So
> everything seems to run fine besides the annoying crashes, which alarms
> people and make the weekly availibility reports look bad (even though FR
> is restarted automatically, of course). The previous 1.1.8 installation
> we upgraded 6 months ago from did not have this problem.

  Hmm... I've run it at 20K pps for *days*....

> Anyways, I really want to find out what's going wrong, so I wanted to
> get core dumps of these crashes. Only that I just don't get them.
> - radiusd.conf has allow_core_dumps = yes (and FR says "Info: Core dumps
> are enabled." at startup)
> - /proc/sys/kernel/core_pattern is set to '/tmp/core.%t.%e.%p', so core
> dumps can be written to disk (tested with a little programm that forces
> a segfault)
> - I put "ulimit -c unlimited" in the startup script.
> cat /proc/$(pidof freeradius)/limits shows "unlimited" for soft and hard
> limit of "Max core file size"

  Often 'root' can't core dump, and programs that change uid can't core
dump.  It's hard to know what's going on with the OS.

> So what's missing? The only indication of the crash is this line in syslog:
>
>> Apr 10 17:57:19 xxxxxxxx kernel: [12268615.000288] freeradius[14846]: segfault at 73818 ip 00007f0cb40e875e sp 00007fff9c6304c0 error 4 in libfreeradius-radius-2.1.8.so[7f0cb40d1000+1f000]
>
> (This is debian lenny x86_64, btw.)
>
> Any hints?

  doc/bugs.  You'll need symbols to find out what's going on.

  Alan DeKok.


-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: FreeRadius 2.1.8 problem

Stylianos Stylianou
Hi all,

I have a similar problem on a machine with CentOS 5 Update 4. The
freeradius packages which I use are taken from jdennis repository
(http://people.redhat.com/jdennis/freeradius-rhel-centos/)
Packages versions are:
freeradius2-utils-2.1.8-2.el5
freeradius2-2.1.8-2.el5
freeradius2-mysql-2.1.8-2.el5
freeradius2-perl-2.1.8-2.el5


I configured two different freeradius servers on the same machine.
Compared to the default configuration, the one radius has the addition
of using sqlippool and the second radius calls an external perl script
which in turns connects via ssh to another server and then runs some
other scripts on it.

The first radius with the sqlippool runs for more than 2 months without
any problem at all.

The second one which calls an external perl script hungs after a few hours.
When I issue the status command the result is
> # /etc/init.d/radiusd status
> radiusd dead but pid file exists

I configured freeradius to call the script with the perl module as well
as using the exec module with identical results. After the radius stops,
I see that the perl script log file stops always when it tries to ssh to
the other server. From the other server statistics it doesn't show
anything unusual (e.g. high cpu) or any error in the ssh log file.
Of course, the issue is not that there is some problem with the perl
script, the ssh command or the remote server but that the radius hungs
when the external script which calls hungs.

Note that this behavior can be reproduced by calling an external script
like the following
> #!/usr/bin/perl
>
> use strict;
>
> my $rc=system("ssh 1.1.1.1");
> exit($rc);
1.1.1.1 is just an IP address that would cause the ssh to timeout.
Note that the freeradius server does not hang when started in debug mode.

We use exactly the same perl script for the last few years without any
problem on another machine which runs freeradius version 1.0.1.

Regards,
Stylianos


On 16/4/2010 1:05 μμ, Alan DeKok wrote:

> Jakob Hirsch wrote:
>    
>> we have freeradius 2.1.8 running on a couple of servers and are very
>> happy with it. But every few days FR crashes on one of the servers (a
>> random one, not always the same). The load is significant (average 150
>> requests/s per server, 400/s peak) but sureley not too high. So
>> everything seems to run fine besides the annoying crashes, which alarms
>> people and make the weekly availibility reports look bad (even though FR
>> is restarted automatically, of course). The previous 1.1.8 installation
>> we upgraded 6 months ago from did not have this problem.
>>      
>    Hmm... I've run it at 20K pps for *days*....
>
>    
>> Anyways, I really want to find out what's going wrong, so I wanted to
>> get core dumps of these crashes. Only that I just don't get them.
>> - radiusd.conf has allow_core_dumps = yes (and FR says "Info: Core dumps
>> are enabled." at startup)
>> - /proc/sys/kernel/core_pattern is set to '/tmp/core.%t.%e.%p', so core
>> dumps can be written to disk (tested with a little programm that forces
>> a segfault)
>> - I put "ulimit -c unlimited" in the startup script.
>> cat /proc/$(pidof freeradius)/limits shows "unlimited" for soft and hard
>> limit of "Max core file size"
>>      
>    Often 'root' can't core dump, and programs that change uid can't core
> dump.  It's hard to know what's going on with the OS.
>
>    
>> So what's missing? The only indication of the crash is this line in syslog:
>>
>>      
>>> Apr 10 17:57:19 xxxxxxxx kernel: [12268615.000288] freeradius[14846]: segfault at 73818 ip 00007f0cb40e875e sp 00007fff9c6304c0 error 4 in libfreeradius-radius-2.1.8.so[7f0cb40d1000+1f000]
>>>        
>> (This is debian lenny x86_64, btw.)
>>
>> Any hints?
>>      
>    doc/bugs.  You'll need symbols to find out what's going on.
>
>    Alan DeKok.
>
>
> -
> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
>    
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: FreeRadius 2.1.8 problem

Alan DeKok-2
Stylianos Stylianou wrote:
> The second one which calls an external perl script hungs after a few hours.
> When I issue the status command the result is
>> # /etc/init.d/radiusd status
>> radiusd dead but pid file exists

  Whoops.

> Note that this behavior can be reproduced by calling an external script
> like the following

  It's always good to have a test case.

> We use exactly the same perl script for the last few years without any
> problem on another machine which runs freeradius version 1.0.1.

  Hm... the problem code seems to be the same in all versions of the
server.  However, in older versions, the server core would notice, and
kill the thread.  This worked around the problem without fixing it.

  In any case... a fix will be in 2.1.9.  Until it's released, you could
try grabbing the v2.1.x branch from git.freeradius.org.

  The problem is that it was blocked trying to read STDOUT of the child.
 The solution is to not block, and give up reading if it takes too long.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Josip Rodin-7
In reply to this post by Alan DeKok-2
On Fri, Apr 16, 2010 at 12:05:38PM +0200, Alan DeKok wrote:

> Jakob Hirsch wrote:
> > Anyways, I really want to find out what's going wrong, so I wanted to
> > get core dumps of these crashes. Only that I just don't get them.
>
> > So what's missing? The only indication of the crash is this line in syslog:
> >
> >> Apr 10 17:57:19 xxxxxxxx kernel: [12268615.000288] freeradius[14846]: segfault at 73818 ip 00007f0cb40e875e sp 00007fff9c6304c0 error 4 in libfreeradius-radius-2.1.8.so[7f0cb40d1000+1f000]
> >
> > (This is debian lenny x86_64, btw.)
> >
> > Any hints?
>
>   doc/bugs.  You'll need symbols to find out what's going on.

For Debian users you can recommend installing the symbols from the
package freeradius-dbg

See also http://packages.debian.org/freeradius-dbg

--
     2. That which causes joy or happiness.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Jakob Hirsch
In reply to this post by Alan DeKok-2
Alan DeKok, 2010-04-16 12:05:

>   Often 'root' can't core dump, and programs that change uid can't core
> dump.  It's hard to know what's going on with the OS.

ok, I digged deeper into this and made some tests:

- no core dump with kill -11
- /proc/sys/fs/suid_dumpable is 0, set it to 1 and restart FR
- kill -11 -> core dump, yeah!

So it's probably a problem with the uid change disabling the process'
dumpability (I found nothing in /proc/[pid]/* where I can see this.

So we have now all machines running with /proc/sys/fs/suid_dumpable set
to 1.

Strange thing is, this should not be neccessary with the
prctl(PR_SET_DUMPABLE, 1) in mainconfig.c:698.

Anyway, I'm now looking forward for FR to crash :)

>> Any hints?
>   doc/bugs.  You'll need symbols to find out what's going on.

I know, and I have them (in the -dbg package), but they are useless
without a core dump :)

Maybe the info about /proc/sys/fs/suid_dumpable should be added to
doc/bugs...

Thanks for your input!


Regards,
J
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

A.L.M.Buxey
Hi,

> Maybe the info about /proc/sys/fs/suid_dumpable should be added to
> doc/bugs...

to quote the man page:

       /proc/sys/fs/suid_dumpable (since Linux 2.6.13)
              The  value  in  this file determines whether core dump files are
              produced for set-user-ID or  otherwise  protected/tainted  bina-
              ries.  Three different integer values can be specified:

              0 (default)  This  provides  the  traditional (pre-Linux 2.6.13)
              behavior.  A core dump will not be produced for a process  which
              has  changed  credentials  (by calling seteuid(2), setgid(2), or
              similar, or by executing a set-user-ID or set-group-ID  program)
              or whose binary does not have read permission enabled.

              1 ("debug")  All  processes  dump  core when possible.  The core
              dump is owned by the file system user ID of the dumping  process
              and  no security is applied.  This is intended for system debug-
              ging situations only.  Ptrace is unchecked.

              2 ("suidsafe") Any binary which normally  would  not  be  dumped
              (see  "0"  above)  is dumped readable by root only.  This allows
              the user to remove the core dump file but not to read  it.   For
              security  reasons core dumps in this mode will not overwrite one
              another or other files.  This mode is appropriate when  adminis-
              trators  are  attempting  to debug problems in a normal environ-
              ment.


i dont think this got enough coverage in most information outlets..in fact
2.6.13 has been around for a while but today was the first time i learnt of
that behaviour.

maybe FreeRADIUS code updated to detect this value...and if its set to 0
then it could mention it in the debug output? ;-)

alan
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Jakob Hirsch
Alan Buxey, 2010-04-19 16:43:

>> Maybe the info about /proc/sys/fs/suid_dumpable should be added to
>> doc/bugs...
> to quote the man page:
>        /proc/sys/fs/suid_dumpable (since Linux 2.6.13)
...
> i dont think this got enough coverage in most information outlets..in fact
> 2.6.13 has been around for a while but today was the first time i learnt of
> that behaviour.

I agree, even though it's mentioned in the CORE(5) man page.

> maybe FreeRADIUS code updated to detect this value...and if its set to 0
> then it could mention it in the debug output? ;-)

Maybe, but with calling prctl(PR_SET_DUMPABLE, 1) this should not be
necessary any more.
I tried this with a small test program and it worked as specified, but
still I won't get a core dump of the FR process unless I set
suid_dumpable to 1.

So after some debugging I got to the root cause of this:
The process's dumpable flag is reset every time the UID is changed. FR
does this several times with fr_suid_up() and fr_suid_down() after
switch_users() is run, e.g. in listen_bind().
So I guess we have to change the fr_suid_* functions to always set the
dumpable flag after setting the uid.


btw, I wonder why is prctl() is not called when debug_flag is set. I
would have thought that one would want to get a core dump especially
when running in debug mode.

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Alan DeKok-2
Jakob Hirsch wrote:
> So after some debugging I got to the root cause of this:
> The process's dumpable flag is reset every time the UID is changed. FR
> does this several times with fr_suid_up() and fr_suid_down() after
> switch_users() is run, e.g. in listen_bind().
> So I guess we have to change the fr_suid_* functions to always set the
> dumpable flag after setting the uid.

  Ah... OK.  That can be fixed for 2.1.9.

> btw, I wonder why is prctl() is not called when debug_flag is set. I
> would have thought that one would want to get a core dump especially
> when running in debug mode.

  It doesn't switch UIDs when in debug mode.  So it inherits whatever
code dump policy you set in the shell.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Jakob Hirsch
Alan DeKok, 2010-04-20 10:54:

>> So after some debugging I got to the root cause of this:
>> The process's dumpable flag is reset every time the UID is changed. FR
>> does this several times with fr_suid_up() and fr_suid_down() after
>> switch_users() is run, e.g. in listen_bind().
>> So I guess we have to change the fr_suid_* functions to always set the
>> dumpable flag after setting the uid.
>   Ah... OK.  That can be fixed for 2.1.9.

Excellent! :)

Any idea when it will be released?

>> btw, I wonder why is prctl() is not called when debug_flag is set. I
>> would have thought that one would want to get a core dump especially
>> when running in debug mode.
>   It doesn't switch UIDs when in debug mode.  So it inherits whatever

AFAICS it does when starting it as root (check in mainconfig.c:532). I'd
say a quite common case for debugging is to run freeradius -X as root...
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Alan DeKok-2
Jakob Hirsch wrote:
> Any idea when it will be released?

  In the next month or so.

>>> btw, I wonder why is prctl() is not called when debug_flag is set. I
>>> would have thought that one would want to get a core dump especially
>>> when running in debug mode.
>>   It doesn't switch UIDs when in debug mode.  So it inherits whatever
>
> AFAICS it does when starting it as root (check in mainconfig.c:532). I'd
> say a quite common case for debugging is to run freeradius -X as root...

  OK.

  Alan DeKok.

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Jakob Hirsch
Alan DeKok, 04/20/2010 06:21 PM:

>>>> btw, I wonder why is prctl() is not called when debug_flag is set. I
>>>> would have thought that one would want to get a core dump especially
>>>> when running in debug mode.
>>>   It doesn't switch UIDs when in debug mode.  So it inherits whatever
>> AFAICS it does when starting it as root (check in mainconfig.c:532). I'd
>> say a quite common case for debugging is to run freeradius -X as root...
>   OK.

This will become a non-issue when the prctl() calls are moved into the
fr_suid_* functions. :)
Would you like me to prepare a patch for that or would you rather do
that yourself?

Anyway, here's the aftermath: I got my core dump, finally, and it turns
out that we are probably hit by the notorious bug #35 (as I half feared,
half hoped :).
I will try the fix for list_delete() you proposed if I can get to it...

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot get core dump of crashing freeradius

Alan DeKok-2
Jakob Hirsch wrote:
> This will become a non-issue when the prctl() calls are moved into the
> fr_suid_* functions. :)
> Would you like me to prepare a patch for that or would you rather do
> that yourself?

  Patch, please.  It's just easier.

> Anyway, here's the aftermath: I got my core dump, finally, and it turns
> out that we are probably hit by the notorious bug #35 (as I half feared,
> half hoped :).
> I will try the fix for list_delete() you proposed if I can get to it...

  I'm not sure that will help.  <sigh>

  It's happened enough that I know it's real.  But I have *no* idea why
it's happening:

- there is ONE location in the code where entries get added to the cache
- there is ONE location where they're looked up
- there is ONE location where they're deleted
- all this is done from ONE thread

  So if the request is in the cache, the packet pointer *cannot* be
NULL.  So it's likely not a race condition between threads.  It's not a
mismanagement issue.  It's not a "use after free" memory issue.  <sigh>

  I'll put a fix into 2.1.9 which works around the issue.  It's better
than having the server crash.

  If you don't mind trying things, I can send you some patches which
might help tracking it down.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Loading...