It’s so hard to believe that we’re already in the final quarter of 2018. The year is flying by(although it might be all of the time I’ve spent flying these past few weeks) and I’m very proud of all the work the FreeBSD Foundation team has accomplished this past month alone. Take a moment to check out the conference recaps, the Development Projects update on building packages at scale, and how you can make an impact on the Project over the next few months. Thank you for your continued support of FreeBSD.
We can’t do this without you!
Happy reading!
Deb
October 2018 Development Projects Update
Building Packages at Scale
Package building with poudriere is a great way to stress test several parts of the kernel and expose performance bottlenecks. Using the results of an instrumented poudriere run, this article provides a short overview of the package building process and the demands that that imposes on the FreeBSD kernel. While the official FreeBSD package builders typically have less than 32 hardware threads, the results presented below come from a larger machine more prone to systemic bottlenecks.
Setup
The system used for this test contains four Intel Xeon E5-4660v4 CPUs – for a total of 128 hardware threads – and 512GB RAM. It runs FreeBSD-CURRENT and builds packages for 12.0-BETA1. The build process is controlled using a custom version of poudriere enhanced with very rudimentary NUMA awareness. When building, poudriere manages 128 worker jails, each backed by a private tmpfs mount. There are 32 jails assigned to each CPU socket and each jail is restricted to that socket – processes running within a jail only ever run on the CPUs belonging to that socket.
Package Building Statistics
During the test run there were 33956 ports queued, 30681 built, 2634 skipped, 319 failed and 323 ignored. The total run time was 09:24:16 (9 hours 24 minutes and 16 seconds), but would have been much shorter if not for a handful of long-running port builds. cad/qcad in particular finished last with a total build time of 03:20:50. The second-last port finished at 07:49:42, roughly 90 minutes before cad/qcad. This situation could be improved with smarter port build scheduling in poudriere, but that is outside the scope of this article.
Workload Description
The process of building a package starts with a pristine jail. First, all dependencies are installed and the build starts. Builds often spawn many short-lived programs like awk or sed in quick succession, inducing significant load on the process creation and destruction code paths in the kernel. Apart from that, builds are dominated by CPU-bound compilation tasks. Once a package has finished building there should be no processes left running in the jail, but buggy ports may still have some; for this reason poudriere always sends SIGKILL to any lingering processes in the jail.
The package build rate is highest during the first several minutes of a poudriere run, as most ports build rather quickly. This portion of a run also exhibits the most kernel lock contention, as shown on the flame graph below. The hotspots later change as the process creation and destruction rates decrease and longer-lived processes (e.g., C++ compilers) dominate the system’s CPU usage. It should be noted this workload does not stress concurrent execution of programs in the same jail to a large extent.
General System Statistics vmstat 300 was running for the duration of the poudriere run, dumping system statistics every 5 minutes. The results are graphed below. The sliver of idle time at the beginning of the graph stems from the preparation phase during which poudriere computes dependencies, creates the build queue and so on. The idle time at the end of the package building run was spent waiting for the aforementioned trailing cad/qcad port to finish building. Both page fault and syscall rates are scaled down by 1000 to fit more easily on the graph.
Kernel Flamegraph
Flame graphs are a visualization tool that excel at presenting software profiling data. Lots of examples and information about them can be found here: http://www.brendangregg.com/flamegraphs.html
The flat area on the left of our flame graph demonstrates time spent executing userspace code – typically, compilers. On the right we can get a sense of how time in the kernel was spent.
A zoomable version of the flame graph can be found here.
sys_fork creates new processes, sys_execve allows them to executes binaries and finally exit1 kills them. Afterwards they are reaped with a sys_wait4 call. Code seen starting with trap is mostly page faults – it’s the kernel allocating memory pages for processes as they access them. The most significant bottleneck at this stage of the build is pipe creation and destruction – click ‘Search’ and look for “lock_delay”. The presence of local_delay in a stack signifies lock contention generally stemming from the manipulation of shared data structures.
The average amount of CPU time spent in the kernel is about 16% of the total and drops as the poudriere run progresses. We are doing fine but we can do better. Note that not all lock contention is shown on this flamegraph. In some cases (like the one outlined below), the wait time is long enough that threads waiting for a lock decide to deschedule themselves so that other tasks may use the CPU.
The Biggest Bottleneck
Running the poudriere workload with lock profiling reports the following total wait times (in milliseconds) after a 2 hour run:
Standing out very significantly is the proctree lock. It is heavily used in process creation and destruction code paths and is the target of some upcoming scalability work. There are two major components contributing to the problem.
1. As mentioned earlier, poudriere sends SIGKILL to any possible leftover jail processes after each package is built. To implement this functionality the kernel currently must scan the entire system process list in order to locate any lingering processes in the target jail. This is clearly inefficient, but right now there is no kernel mechanism to do it smarter.
2. The wait system call gets called very frequently, and it always acquires this lock. However, in most cases it has nothing to do because no relevant processes have exited at the time of the call. The kernel can be modified to handle this case by only taking locks relevant to the waiting process, thus avoiding a bottleneck.
Summary
We are doing fine on package building front, but even then there are bottlenecks to fix. Our current state is the result of a number of general scalability fixes done over the past several years, some of which specifically targeted this workload. It can’t be stressed enough that building packages alone is not a thorough scalability test of the entire system. Calling execve on a binary in 128 different jails at the same time is a very different workload than performing 128 execve‘s of one binary in one jail (or just a system in general). There are known bottlenecks that manifest themselves only in the latter case and which will be subject of future work.
Thanks to Limelight Networks for providing access to the system used for this test.
— contributed by Mateusz Guzik
Fundraising Update: Supporting the Project
First, I’d like to start out by sending a heartfelt thank you to handshake.org for their generous donation of $100,000! They join a growing list of companies like NetApp, Microsoft, Xiplink, Tarsnap, VMware, and NeoSmart Technologies that are stepping up and showing their commitment to FreeBSD!
What you don’t see in the above list are the hundreds of companies that benefit from the work we do to improve FreeBSD to keep it the innovative, secure, stable, and reliable operating system they depend on.
In the next two months, we need to raise another $800,000 to reach our 2018 fundraising goal! I know that’s a lot of money to raise in only 2 months, and I’m sure you’re wondering why we need so much money to support the Project. It’s because we have increased our support in critical areas of FreeBSD to provide needed resources, where volunteers aren’t available. We’ve recently increased the number of software developers we are funding to step in to fix issues; improve and maintain areas of the software; and improve developer tools such as continuous integration and increasing test coverage. We’ve also expanded FreeBSD education and awareness around the world.
We’ve made a lot of progress over the last year. However, to keep the momentum moving forward and take advantage of key developers who are available right now to step in and fill critical needs, we’ve decided to dip into our reserves. Obviously, this is not sustainable, and we will have to reduce the funding to certain areas of the Project if we don’t meet our goal.
This is where you come in. We need your help to champion the work we’re doing for the Project. Help us show these companies why it’s so important to give back to FreeBSD when their companies benefit significantly from the work we do.
Here’s a more in-depth look at the critical areas we are supporting which directly benefit the companies mentioned above:
Operating System Improvements: Providing staff to immediately respond to urgent problems and implement new features and functionality allowing for the innovation and stability you’ve come to rely on.
Security: Providing engineering resources to bolster the capacity and responsiveness of the Security team providing your users with piece of mind when security issues arise.
Release Engineering: Continue providing a full-time release engineer, resulting in timely and reliable releases you can plan around.
Quality Assurance: Improving and increasing test coverage, continuous integration, and automated testing with a full-time software engineer to ensure you receive the highest quality, secure, and reliable operating system.
New User Experience: Improving the process and documentation for getting new people involved with FreeBSD, and supporting those people as they become integrated into the FreeBSD Community providing the resources you may need to get new folks up to speed.
Training: Supporting more FreeBSD training for undergraduates, graduates, and postgraduates. Growing the community means reaching people and catching their interest in systems software as early as possible and providing you with a bigger pool of candidates with the FreeBSD skills you’re looking for.
Face-to-Face Opportunities: Facilitating collaboration among members of the community, and building connections throughout the industry to support a healthy and growing ecosystem and make it easier for you to find resources when questions emerge.
Your support has already accomplished great things for the Project, please consider helping us continue the work above! If your company uses FreeBSD, please talk to them about the benefits they are currently receiving and ask them to give a financial contribution to not only help continue the growth and health of FreeBSD, but to show their support for open source. Finally, if you haven’t yet, please consider donating today.
— contributed by Deb Goodkin
October 2018 Release Engineering Update
The FreeBSD Release Engineering team continued working on the upcoming 12.0-RELEASE. Work continued on the OpenSSL version 1.1.1 update, and we adjusted the 12.0 schedule slightly as a result.
During the month, two additional ALPHA builds were released, ALPHA9 and ALPHA10, the latter being in order to have an additional ALPHA build available for testing following the OpenSSL update from the project branch being merged back to head.
The stable/12 branch was created on October 19, with head being updated to 13-CURRENT and stable/12 being switched from ALPHA to BETA with two BETA builds being released at the time of this writing. At present, it is expected there will be at least one additional BETA build before the releng/12.0 branch is created, which will mark the point where 12.0 moves from the BETA phase to the RC (release candidate) phase for the duration of the release cycle.
Should the 12.0-RELEASE schedule need to be adjusted at any time during the release cycle, the schedule on the FreeBSD project website will be updated accordingly. The current schedule is available at: https://www.freebsd.org/releases/12.0R/schedule.html
— contributed by Glen Barber
EuroBSDcon 2018 Conference Recap
Last month I attended EuroBSDCon and the preceding FreeBSD Developer Summit. Both events were held at the University Politehnica of Bucharest, Romania. The two-day developer summit was a productive, face-to-face opportunity for contributors from various parts of the project to work together discussing ideas, and working on plans and solutions for improving FreeBSD, processes, and developer support.
Sean Chittendon, FreeBSD Core team member, lead the summit on the second day. Topics for discussion ranged from generating Google Summer of Code project ideas and how to help get more contributors to step in as mentors, to the technologies and functionality that really needs to be added to FreeBSD. Sometimes, the discussions would involve “we need funding” to get this work done, and everyone would look at me. Over half our budget goes to software development work, and being able to listen to the compelling arguments about the proposed work and why it needs to be done, really helps us determine what work we should fund. Read more…
— contributed by Deb Goodkin
Grace Hopper Celebration 2018 Recap
In our continuing efforts to recruit more women to FreeBSD, Dru Lavigne and I attended the Grace Hopper Celebration last month. They claim it’s the largest gathering of women technologists in the world, and I believe it! This was the fourth time we’ve had a table in the job fair to recruit people to the Project. Where else can you find 20,000 people, in one place, who have (or can quickly pick up) the skills to contribute to our Project?
I flew directly to Houston from Bucharest, where I attended EuroBSDCon 2018 (check out my article on that conference in this newsletter!). So, I started off the conference in Eastern European time and was suffering from jetlag. Nothing a double espresso couldn’t counter! We started off Wednesday morning quickly setting up our booth for the rush of women we knew were going to stop by. Read more…
The September/October 2018issue of the FreeBSD Journal is now available. Don’t miss the Networking focused issue.
Sample Issue! If you’ve ever wanted to read through an entire issue of the FreeBSD
Journal, now’s your chance. Downloadthe sample issue and be sure to share with your friends and colleagues.
ScaleEngine is a global video streaming and content delivery network. We specialize in delivering live and on-demand video streams to users from servers near them for lower latency to ensure the best viewing experience. Our small team of developers and systems administrators manage more than 100 servers in 38 data centers spread across 11 countries. Achieving this level of scale with such a small team is only possible because of the extensive level of documentation, observability, monitoring, and automation tooling available for FreeBSD. The ready supply of expertise also made FreeBSD especially attractive.
ScaleEngine uses FreeBSD throughout its entire infrastructure. With more than 1000 TB of storage to manage, access to frequently updated OpenZFS is a critical component to our products. We also make extensive use of bhyve, jails, and the general extensibility of FreeBSD.
ScaleEngine sponsors the FreeBSD Foundation to ensure that we have access to regular timely releases and that the FreeBSD project gets the support it needs to flourish.