Concurrent instances of msys have corrupt environments

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Concurrent instances of msys have corrupt environments

Michael Vincent
Summary:
Running multiple instances of msys tools concurrently (especially
non-interactively via a service or scheduled task) causes paths in
some environment variables to lose their leading slash and thus become
"corrupt".

Background:
We use Jenkins to perform continuous integration builds on Windows 7
nodes. Our source is stored in Git repositories, so we use msysgit
1.8.3 to check out the code to build. We run Jenkins as a Windows
service (with a specific user account) on these build nodes. Jenkins
allows multiple builds to execute concurrently on a single node, which
is a feature we make use of. This all worked fairly well until the
security updates listed at the end of this post were installed.

Following the updates, when multiple jobs were started concurrently,
ssh would occasionally hang after displaying the following error
message: "Could not create directory 'c/Users/buildbot/.ssh'". Note
that $HOME is set to %USERPROFILE% as a user environment variable
(HKCU\Environment\HOME). %USERPROFILE% evaluates to C:\Users\buildbot
in this case. I've tried setting $HOME to an absolute path instead of
%USERPROFILE% and also setting it as a system environment variable.
After digging a bit deeper, I found that $HOME, $TEMP, $TMP, $TMPDIR,
and parts of $PATH were missing the leading slash or missing portions.

Working environment:

  HOME=/c/Users/buildbot
  TEMP=/tmp
  TMP=/tmp
  TMPDIR=/tmp
  PATH=/usr/libexec/git-core:/usr/bin:/usr/bin:/usr/mingw/bin:/c/Perl/bin:/c/Python27:/c/Windows/system32:/c/Windows:/usr/cmd

Broken environment:

  HOME=c/Users/buildbot
  TEMP=c/Users/BUILDB~1/AppData/Local/Temp
  TMP=c/Users/BUILDB~1/AppData/Local/Temp
  TMPDIR=c/Users/BUILDB~1/AppData/Local/Temp
  PATH=/libexec/git-core:/bin:/bin:/mingw/bin:c/Perl/bin:c/Python27:c/Windows/system32:c/Windows:/cmd

However, if I run Jenkins interactively as a logged in user, instead
of as a service, this problem does not occur at all.

I wrote a Python script (attached) to roughly simulate what Jenkins
does and demonstrate the problem. It spawns two bash processes per
second for 10 seconds that simply check whether $HOME is readable. Any
failures get logged to stdout (redirected to msys_test.stdout.log).

This script always completes perfectly fine when run interactively
from a command prompt. However, when run non-interactively from a
service (using the Jenkins service launcher or nssm), 30~40% of the
processes have a corrupt environment. The Task Scheduler is a much
easier way to run processes non-interactively and also causes the same
issue:

1. Create a task from the command line (username and paths will need
to be changed):

  schtasks /create /tn "MsysTest" /sc once /st 00:00 /ru buildbot /rp
/tr "C:\Python27\python.exe C:\msys_test.py"

2. Run the task now. When it's complete, look for msys_test.stdout.log
in the same directory as msys_test.py. The log files will start out
empty and flush when the script finishes.

  schtasks /run /tn "MsysTest"

3. Delete the task when you're done so it doesn't run automatically by mistake:

  schtasks /delete /f /tn "MsysTest"

Interestingly, if I bump it up to 8 threads at a time, typically only
one of them fails (at most 2) and many batches have no failures at
all.

Now, before you say that msysgit uses a really old version of msys and
isn't really supported, I performed the same tests with the current
version of msys (msys-base 2013072300), and the situation appears to
be worse. With the latest msys, concurrent instances become corrupt
even when run interactively.

My only thought is that there must be some kind of race condition or
reentrancy issue in the msys dll (?). I can't think of anything else
that could cause this kind of behavior. Any ideas for tracking this
down further?


Here's the list of security updates, one of which appears to have
broken the ability for multiple concurrent instances of msys to run
non-interactively.

MS13-046: Description of the security update for Windows Kernel-Mode
drivers: May 14, 2013
https://support.microsoft.com/kb/2830290

MS13-049: Vulnerability in kernel-mode driver could allow denial of
service: June 11, 2013
https://support.microsoft.com/kb/2845690

MS13-040: Description of the security update for the .NET Framework 4
on Windows XP, Windows Server 2003, Windows Vista, Windows Server
2008, Windows 7, and Windows Server 2008 R2: May 14, 2013
https://support.microsoft.com/kb/2804576

MS13-040: Description of the security update for the .NET Framework
3.5.1 on Windows 7 Service Pack 1 and Windows Server 2008 R2 Service
Pack 1: May 14, 2013
https://support.microsoft.com/kb/2804579

MS13-050: Vulnerability in Windows print spooler components could
allow elevation of privilege: June 11, 2013
https://support.microsoft.com/kb/2839894

MS13-042: Description of the security update for Publisher 2007
Service Pack 3: May 14, 2013
https://support.microsoft.com/kb/2597971

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
MinGW-users mailing list
[hidden email]

This list observes the Etiquette found at
http://www.mingw.org/Mailing_Lists.
We ask that you be polite and do the same.  Disregard for the list etiquette may cause your account to be moderated.

_______________________________________________
You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Also: mailto:[hidden email]?subject=unsubscribe

msys_test.py (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent instances of msys have corrupt environments

Earnie Boyd
On Fri, Sep 6, 2013 at 12:40 PM, Michael Vincent wrote:

> Summary:
> Running multiple instances of msys tools concurrently (especially
> non-interactively via a service or scheduled task) causes paths in
> some environment variables to lose their leading slash and thus become
> "corrupt".
>
> Background:
> We use Jenkins to perform continuous integration builds on Windows 7
> nodes. Our source is stored in Git repositories, so we use msysgit
> 1.8.3 to check out the code to build. We run Jenkins as a Windows
> service (with a specific user account) on these build nodes. Jenkins
> allows multiple builds to execute concurrently on a single node, which
> is a feature we make use of. This all worked fairly well until the
> security updates listed at the end of this post were installed.
>

Firstly, we do not maintain msysgit and the msys-1.0.dll that exists
in it is forked from the main repository.  I use msysgit but I remove
the msys-1.0.dll from that release to use our version.

> Following the updates, when multiple jobs were started concurrently,
> ssh would occasionally hang after displaying the following error
> message: "Could not create directory 'c/Users/buildbot/.ssh'". Note
> that $HOME is set to %USERPROFILE% as a user environment variable
> (HKCU\Environment\HOME). %USERPROFILE% evaluates to C:\Users\buildbot
> in this case. I've tried setting $HOME to an absolute path instead of
> %USERPROFILE% and also setting it as a system environment variable.
> After digging a bit deeper, I found that $HOME, $TEMP, $TMP, $TMPDIR,
> and parts of $PATH were missing the leading slash or missing portions.
>
> Working environment:
>
>   HOME=/c/Users/buildbot
>   TEMP=/tmp
>   TMP=/tmp
>   TMPDIR=/tmp
>   PATH=/usr/libexec/git-core:/usr/bin:/usr/bin:/usr/mingw/bin:/c/Perl/bin:/c/Python27:/c/Windows/system32:/c/Windows:/usr/cmd
>
> Broken environment:
>
>   HOME=c/Users/buildbot
>   TEMP=c/Users/BUILDB~1/AppData/Local/Temp
>   TMP=c/Users/BUILDB~1/AppData/Local/Temp
>   TMPDIR=c/Users/BUILDB~1/AppData/Local/Temp
>   PATH=/libexec/git-core:/bin:/bin:/mingw/bin:c/Perl/bin:c/Python27:c/Windows/system32:c/Windows:/cmd
>
> However, if I run Jenkins interactively as a logged in user, instead
> of as a service, this problem does not occur at all.
>
> I wrote a Python script (attached) to roughly simulate what Jenkins
> does and demonstrate the problem. It spawns two bash processes per
> second for 10 seconds that simply check whether $HOME is readable. Any
> failures get logged to stdout (redirected to msys_test.stdout.log).
>
> This script always completes perfectly fine when run interactively
> from a command prompt. However, when run non-interactively from a
> service (using the Jenkins service launcher or nssm), 30~40% of the
> processes have a corrupt environment. The Task Scheduler is a much
> easier way to run processes non-interactively and also causes the same
> issue:
>
> 1. Create a task from the command line (username and paths will need
> to be changed):
>
>   schtasks /create /tn "MsysTest" /sc once /st 00:00 /ru buildbot /rp
> /tr "C:\Python27\python.exe C:\msys_test.py"
>
> 2. Run the task now. When it's complete, look for msys_test.stdout.log
> in the same directory as msys_test.py. The log files will start out
> empty and flush when the script finishes.
>
>   schtasks /run /tn "MsysTest"
>
> 3. Delete the task when you're done so it doesn't run automatically by mistake:
>
>   schtasks /delete /f /tn "MsysTest"
>
> Interestingly, if I bump it up to 8 threads at a time, typically only
> one of them fails (at most 2) and many batches have no failures at
> all.
>
> Now, before you say that msysgit uses a really old version of msys and
> isn't really supported, I performed the same tests with the current
> version of msys (msys-base 2013072300), and the situation appears to
> be worse. With the latest msys, concurrent instances become corrupt
> even when run interactively.
>

I was about to, well did already sort of.  Since you've tested with
the core msys dll I plead that you open a ticket at
http://sf.net/p/mingw/bugs with the detail for our DLL.

> My only thought is that there must be some kind of race condition or
> reentrancy issue in the msys dll (?). I can't think of anything else
> that could cause this kind of behavior. Any ideas for tracking this
> down further?

Certainly a race condition could step on the child processes.  The
fork emulation is very sensitive in that it needs copy the parent
memory to the child memory and then wake up the child.

>
>
> Here's the list of security updates, one of which appears to have
> broken the ability for multiple concurrent instances of msys to run
> non-interactively.
>

So are you saying that the MS updates broke it?  Or maybe just made it worse.

--
Earnie
-- https://sites.google.com/site/earnieboyd

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
MinGW-users mailing list
[hidden email]

This list observes the Etiquette found at
http://www.mingw.org/Mailing_Lists.
We ask that you be polite and do the same.  Disregard for the list etiquette may cause your account to be moderated.

_______________________________________________
You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Also: mailto:[hidden email]?subject=unsubscribe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent instances of msys have corrupt environments

Michael Vincent
On Fri, Sep 6, 2013 at 5:13 PM, Earnie Boyd
<[hidden email]> wrote:
> I was about to, well did already sort of.  Since you've tested with
> the core msys dll I plead that you open a ticket at
> http://sf.net/p/mingw/bugs with the detail for our DLL.

I've opened a ticket: https://sourceforge.net/p/mingw/bugs/2040/


> So are you saying that the MS updates broke it?  Or maybe just made it worse.

Prior to those security updates, msysgit 1.8.3 did not have this
environment problem when running as a non-interactive service.
Following the security updates, the problem started occurring. I then
discovered that the problem does not occur when running interactively.
Testing with the latest version of msys itself, I found that it
appears to have the same problem. Slightly worse though since the
problem occurs both when run non-interactively as well as
interactively. I have not been able to try the latest version of msys
without the security updates installed.

So, it appears that the security updates either broke something
directly or caused some change in timing that msys is sensitive to.
That is just a hypothesis at this point though since I haven't been
able to rigorously test both before and after installing each security
update.

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
MinGW-users mailing list
[hidden email]

This list observes the Etiquette found at
http://www.mingw.org/Mailing_Lists.
We ask that you be polite and do the same.  Disregard for the list etiquette may cause your account to be moderated.

_______________________________________________
You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Also: mailto:[hidden email]?subject=unsubscribe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent instances of msys have corrupt environments

lrn-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12.09.2013 01:12, Michael Vincent wrote:

> On Fri, Sep 6, 2013 at 5:13 PM, Earnie Boyd
> <[hidden email]> wrote:
>> I was about to, well did already sort of.  Since you've tested with
>> the core msys dll I plead that you open a ticket at
>> http://sf.net/p/mingw/bugs with the detail for our DLL.
>
> I've opened a ticket: https://sourceforge.net/p/mingw/bugs/2040/
>
>
>> So are you saying that the MS updates broke it?  Or maybe just made it worse.
>
> Prior to those security updates, msysgit 1.8.3 did not have this
> environment problem when running as a non-interactive service.
> Following the security updates, the problem started occurring. I then
> discovered that the problem does not occur when running interactively.
> Testing with the latest version of msys itself, I found that it
> appears to have the same problem. Slightly worse though since the
> problem occurs both when run non-interactively as well as
> interactively. I have not been able to try the latest version of msys
> without the security updates installed.
You could try MSYS2 or Cygwin (doesn't matter which one). Cygwin also
has x86 and x86_64 versions, another variation you could test.

- --
O< ascii ribbon - stop html email! - www.asciiribbon.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (MingW32)

iQEcBAEBAgAGBQJSMOHsAAoJEOs4Jb6SI2CwKegIALovL2IrW81VaOxbIy1mHFU1
xJrAjQ0zc2Xb6AidOjbe5MyjNXkQxsnaQuktV9S89lMKkYnMWpVELB41/rZF8kYt
xxzfpPLUMdQGhcQK7rnibF0MmYfFlWAmWOMJsxDRVbxKu9qGpNi+B4t7Qj0QPlHn
tejWPEkflecYS2aaYoPNpirkonUsrmwIQwvaCmuwBuQthtwqSjhp+KGAWUqAGgKq
YGexQEIBmfTqrhM7lXvRFE/yk/V66QSzoI2jd1SiL6a/SoAAdBBhRV6vSAfQd8Ed
M3TsDmExDvddY1P5Xhq04wgbpiYYfnS5PIiChCTWJrDSW7AGB/7rXMJ+OVv8vWA=
=yRFv
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
MinGW-users mailing list
[hidden email]

This list observes the Etiquette found at
http://www.mingw.org/Mailing_Lists.
We ask that you be polite and do the same.  Disregard for the list etiquette may cause your account to be moderated.

_______________________________________________
You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Also: mailto:[hidden email]?subject=unsubscribe