Apache Roadshow DC

Schedule

NOTE: The Career Fair starts at 10AM and runs until at least 3PM

8:00a - Registration, Continental Breakfast & Career Fair

Keynote: 9:00a

Jim Jagielski: Why Open Source is Vital in IT, (ConSensys / Apache HTTP & Tomcat)

Abstract. At its core, any IT development done in today’s environment needs to leverage, consume, and involve open source. This session will describe what open source is, why open source works, and how to fully engage with the open source community. Whether knowledgeable about open source, or requiring an intro to the topic, this session is for you. Suitable for both developers as well as managers and CXO-level execs.

Dawn of the Code War: Book Signing

John P. Carlin, former Assistant Attorney General for the US Department of Justice’s (DOJ) National Security Division (NSD), chairs Morrison & Foerster’s Global Risk + Crisis Management practice and co-chairs the National Security practice, where he advises industry-leading organizations in sensitive cyber- and other national security matters. He is the author of Dawn of the Code War: America’s Battle Against Russia, China, and the Rising Global Cyber Threat, which provides an inside look into how we combat daily attacks on United States companies, citizens and government. Prior to serving as the DOJ’s highest-ranking national security lawyer, Mr. Carlin served as Chief of Staff and Senior Counsel to FBI Director Robert S. Mueller, III. Under his leadership, the NSD launched nationwide outreach across industries to raise awareness of national security, cyber- and espionage threats against US companies and encourage greater C-suite involvement in corporate cybersecurity matters. Mr. Carlin also chairs the Aspen Institute’s Cybersecurity and Technology policy program, which provides a cross-disciplinary forum for industry, government, and media to address the rapidly developing landscape of digital threats and craft appropriate policy solutions.

Sessions:

Time	Room A	Room B
9:15a	The Apache Way: The Heart and Soul of the ASF [Apache Track] , Jim Jagielski (ConSensys / Apache HTTP & Tomcat) More Abstract. The Apache Way forms the heart and soul on how the ASF runs and how all Apache projects are based. Learn not only the core tenets of the Apache Way, but the history and rationale behind them. less	Hacks for getting a great job that's right for you [Career Track] , Michael Rizzo (BMC Software) More Abstract. There are a lot of bad, OK or even 'decent' jobs out there but how do you find a great job for you? We will explore how to create your own job description of a 'great job', how to find companies and jobs that will be a good fit and then how to hack past the automated systems with job description bingo and bypassing gatekeepers to get to the right person that can hire you. If you are 'new' to the job market or in an 'OK' job right now, This will be a presentation and discussion focusing on both new candidates as well as people that are in 'OK' jobs right now. A key focus is how relevant, modern and open source knowledge and experience can make you stand out from the crowd. less
9:45a	From no OSS contributions to Apache PMC in 16 months. [Apache Track] , Rob Tompkins (Capital One / Apache Commons) More Abstract. I started making Open Source contributions in March of 2016, and by July of 2017 I was on the Project Management Committee for Apache Commons. I will explain why I decided to get involved in open source development and how I figured out how to make my first contributions. Further I will talk about how my open source work has fostered my personal career development. less	You want to be at a company that gives its code away. [Career Track] , Gil Yehuda (Verizon Media) More Abstract. An impassioned pitch as to why open source is critical for business success and why you, the job seeker, want to work at a company that gets the meaning of open source. I'll draw upon more than 8 years of running an open source program at one at a large tech company to explain why giving away software is the key to growing the right technology infrastructure less

10:15a - Mid-morning Coffee Break @ Sponsor Booths

Time	Room A	Room B
10:45a	Hadoop {Submarine} Project: Running deep learning workloads on YARN [Apache Track] , Tim Spann (Cloudera) More Abstract. Deep learning is useful for enterprises tasks in the field of speech recognition, image classification, AI chatbots and machine translation, just to name a few. In order to train deep learning/machine learning models, applications such as TensorFlow / MXNet / Caffe / XGBoost can be leveraged. And sometimes these applications will be used together to solve different problems. To make distributed deep learning/machine learning applications easily launched, managed, monitored. Hadoop community has introduced Submarine project along with other improvements such as first-class GPU support, container-DNS support, scheduling improvements, etc. These improvements make distributed deep learning/machine learning applications run on YARN as simple as running it locally, which can let machine-learning engineers focus on algorithms instead of worrying about underlying infrastructure. Also, YARN can better manage a shared cluster which runs deep learning/machine learning and other services/ETL jobs with these improvements. In this session, we will take a closer look at Submarine project as well as other improvements and show how to run these deep learning workloads on YARN with demos. Audience can start trying running these workloads on YARN after this talk. less	Your career is what you make it, so what will you make? [Career Track] , Nakia Powell (Intel) More Abstract. Ready to learn how to become the driver of your career? Now is the time to stop dreaming about the career you want and put your best foot forward to having not only the career you envision, but the life you deserve. If you are ready to take control of your life as an active participant and end the cycle of letting others dictate and drive your decisions about your career and most importantly your life, then this talk is for you! Many times people dream of the careers they want with no real motivation or confidence to see it manifest fully. Join this talk to know firsthand, how to end the cycle of an unfulfilled career and live the life you want to live! less
11:15a	Need for speed: Boosting Apache Cassandra's performance using Netty [Apache Track] , Dinesh Joshi (IEEE / Apache Cassandra) More Abstract. Apache Cassandra 4.0 has several enhancements. One of the biggest enhancements is switching from blocking network IO using JDK Sockets to Non Blocking IO with Netty. As a result, Cassandra has seen gains in performance and efficiency. These gains translate into real world costs and allow Cassandra to scale better. This presentation will take you on a tour of the improvements of Cassandra's network layer (old & new) and help quantify the gains in real world terms. less	Digital Artifact Persistence, Extraction, Analysis, and Manipulation [CARE Track] , Jim Jones (George Mason University) More Abstract. James 'Jim' Jones has been a cyber security and digital forensics practitioner, researcher, and educator for over 20 years in industry, government, and academia. That experience drives his teaching, which blends theory and practical applications, and his research, which focuses on the extraction, analysis, and manipulation of partial digital artifacts. These digital fragments are the remnants of past actions and processes. Jim and his students spend their days (and nights) collecting and analyzing these fragments, much like a traditional archaeologist works with fragments of pottery or stone tools. This analysis enables us to look backwards in time to understand cyber attacks, find malware infections, detect system and device misuse, and recover lost data. Jim's research funding comes from industry and the US Government. Past and current funded research sponsors include the Defense Advanced Research Projects Administration (DARPA), the U.S. Department of Homeland Security (DHS), the National Science Foundation (NSF), and the United States Department of Defense (DoD). His research interests are focused on digital artifact extraction, analysis, and manipulation, and on offensive cyber deception in adversarial environments. He has degrees in Systems Engineering (BS), Mathematical Sciences (MS), and Computational Sciences and Informatics (PhD). This formal education is complemented with work experience and extensive self-learning, driven by an insatiable curiosity and a need to know how things work, how they break, and what we can learn from both. Digital data dies an uncertain death. Delete a file today, and the content might be entirely destroyed immediately, or some of it may survive for a few seconds, hours, days, or longer. For a forensic investigator, this is good news – residual fragments of a deleted file might be recoverable days, months, even years after the file was deleted. But why do some fragments persist while others do not, what can we infer from the fragments that we can recover, and can such fragments be artificially created or modified? In this talk, I will discuss our efforts to understand the patterns and mechanisms of deleted digital data decay, analysis and interpretation of recovered fragments, and techniques for the manipulation of digital fragments under various circumstances. less
11:45a	Building a Better Knowledge Graph Pipeline [Apache Track] , Nathan Maynes (Thomson Reuters Special Services) More Abstract. Storing relationships in a graph provides the flexibility to represent dynamic relationships in the real world. That flexibility tempts stakeholders to put everything into the graph and hope for something magic. This presentation will provide an overview of lessons learned building a Solr powered knowledge graph for global supply chains. It will focus on how Apache NiFi can be used to maintain strong data management to keep your data lake from turning into a swamp. less	Using OSS to enhance Cyber dataset [Cyber Track] , Eddie Satterly (DataNexus) More Abstract. The talk with focus on using OSS to automate the flow of data from varying and new sources, and process in stream, to create a richer store of data for cybersecurity events. If you want to learn about how to collect new sources including IoT, relational and event streams as well as combine them in stream to create new capabilities this talk is for you. less

12:15p - 1:15p: Lunch Talks

Room A	Room B	(Location to be determined)	(WebEx:https://wso2.zoom.us/my/nuwanbando)
About Docker Container(s) , Srinivasa "Naga" Kadiyala (Deloitte) More Abstract. Brief and practical introduction about Docker. What is Docker, why it is used and what it does? Objective is to help new open source enthusiasts to drive in to Docker world, put them in the right direction to become a Docker professional. Help them guide with next steps for making their Docker journey successful. less	Fast and Flexible Data Integrations with Apache Airflow , Michael Ghen (Benefits Data Trust) More Abstract. Apache Airflow is an open source workflow management system written in Python and originally built at Airbnb. Its programming model allows for the rapid development of integrations with just about any information system using Python. This talk is about Apache Airflow's programming model and how it has been used to successfully integrate with information systems at Federal and State agencies. In this talk, we will provide an overview of Apache Airflow, review its programming model and use cases, and walk through 2 real-world examples that show how Airflow is being used for data integrations between information systems. Apache Airflow is a easy to use task scheduler that anyone working on internal or external data integrations should consider. With over 500 contributors and 200 users (including banks, hospitals, and unicorn startups), Apache Airflow is an open source software project worth exploring. less	Keysigning Party , Tim Allison (MITRE / Apache Lucene/Solr, Apache Tika, Apache POI and Apache PDFBox) More Abstract. We'll do a keysigning party during lunch at the Roadshow! To speed things up, email your key to tallison <at sign> apache.org by 5PM (EDT) on March 24th. Please bring a pen and an ID to the signing. See https://wiki.apache.org/apachecon/PgpKeySigning for the process and background. less	Ballerina: Cloud Native Programming Language , Nuwan Bandara (WSO2) More Abstract. Crazy customer demand has caused companies like Google and Amazon to build massively disaggregated architectures in order to scale. Massively disaggregated approaches like microservices, serverless, and APIs are becoming the norm for us all. These disaggregated components are network accessible as programmable endpoints. The apps we will write increasingly depend upon these endpoints. Integration is the discipline of resilient communication between endpoints. It isn’t easy. The challenges include compensation, transactions, events, circuit breakers, discovery, and protocol handling, and mediation. Ballerina - https://ballerina.io/ Ballerina makes it easy to write resilient services that orchestrate and integrate across distributed endpoints. It’s a language that understands protocols, endpoints, transactions, workers, JSON, XML, asynchrony, and tainted data as first class constructs. Ballerina is a compiled language with its own VM technology based upon a custom JVM. Services execute as servers, microservices, and serverless functions packaged for deployment into any infrastructure including Kubernetes. It’s the first language that understands the architecture around it - the compiler is environment aware, and includes or integrates circuit breakers, identity servers, API gateways, and message brokers. This session will cover Ballerina’s language and runtime model while building a variety of integrations. We’ll also cover how the Ballerina open source community operates and how you can get involved. less

Afternoon sessions.

Time	Room A	Room B
1:15p	CARE: Local Governments, Cybersecurity Governance and Open Source Software [CARE Track] , J.P. Auffret (Geroge Mason University) More Abstract. Description of CARE (Center for Assurance Research and Engineering) projects. Targeted audience: general public, students and professionals who are interested in the CARE's work. less	Schemas, Records, and Registries with Apache NiFi [Apache Track] , Bryan Bende (Cloudera / Apache NiFi) More Abstract. This talk will introduce Apache NiFi’s “record” abstraction which provides a powerful way of treating common data formats such as JSON, CSV, and Avro, as a sequence records. This approach typically unlocks significant performance benefits, and often greatly simplifies many dataflows. We will start by discussing the role of schemas and introduce NiFi’s record reader & writer concept, along with the various options for accessing a schema. We will then walk through simple examples demonstrating how to use the most popular record processors for performing operations such as conversion, updating, merging, and partitioning. Finally, we will discuss the options for integrating NiFi with enterprise schema registries, and how this can be used across an end-to-end streaming platform. less
1:45p	Cloud Native Threat Modelling at Data-Center Scale [Cyber Track] , Jay Vyas (Platform 9) More Abstract. Do you know how many critical CVE's are exposed as running images in your data center right now ? If so - do you know how often that number fluctuates (and how often you should recalculate it)? As kuberenetes is increasingly a platform which where ASF components, such as Spark, Mesos, Airflow, will run on a continuous basis, the ability to model incoming threats that are embedded in containers, at some image layer, becomes an increasingly critical step in the continous delivery workflow. We introduce the open-source perceptor project for real-time annotation of vulnerabilities, tooling for autoscaling vulnerability detection using the kubernetes API, and a standalone threat modelling project for simulating vulnerability churn in data centers with 100s of users, over time scales of 100s of days. We demonstrate the power of this approach by analyzing a collection of ASF artifacts (hadoop, zookeeper, solr, tomcat) on large clusters, alongside timeseries analysis of the ensuing vulnerability status. In addition, we demonstrate how to integrate free and open source scanning into open source projects, using ASF BigTop as an example. less	Apache cTAKES - NLP in Healthcare [Gov't Track] , Alexandru Zbarcea (Fannie Mae) More Abstract. Improve your skills with one of the hot innovations in the industry: Natural Language Processing. Apache cTAKES bridges the gap between humans and machines by transforming clinical thinking from provider notes into semi-structured data that can be processed by machines. This talk will also comprise of a demo analyzing and visualizing healthcare records using Apache cTAKES , running using Docker. From A-Z in 5 minutes. less
2:15p	The Anatomy of a Secure Java Web App Using Apache Fortress [Cyber Track] , Shawn McKinney (Symas / Apache Directory) More Abstract. The Jakarta EE architecture provides the necessary enablement but most developers do not have the time or the training to take full advantage of what it has to offer. This technical session describes various techniques and principles surrounding the topic of application security best practices on the Java platform. It includes practical, hands-on guidance to implementing security controls with Java, Spring and Apache Fortress. In addition to finding out where the controls must be placed and why, attendees will be provided code to kick-start their own highly secure Java web apps. less	Apache role in Healthcare and Disaster Relief [Gov't Track] , Hadrian Zbarcea (apifocal / Apache ActiveMQ) More Abstract. Apache is the open source powerhouse where talented engineers collaborate to build best of breed technologies. While it is recognized that Hadoop and Spark dominate the big data space, it is less known that the integration stack (Kafka, ActiveMQ, CXF, Karaf) provide the infrastructure backbone for many mission or safety critical projects (e.g. FAA SWIM). This talk provides an overview of what makes Apache projects ideal for large integration projects, with a focus on Healthcare, the ONC National Healthcare Information Network and Disaster Relief Infrastructure. The presentation also includes a demo and concrete, practical advice for students to get involved in OSS and use their talents for building the infrastructure needed by their generation. less

2:45p - Mid-Afternoon Snack @ Sponsor Booths; Happy 20th Anniversary, ASF.

Time	Room A	Room B
3:15p	Drilling Security Data: Rapid Analysis of Security Data with Apache Drill [Cyber Track] , Charles S. Givre (Deutsch Bank) More Abstract. "Security data is often challenging to analyze because it comes in a variety of formats that are difficult and time consuming to parse, requiring a myriad of various tools to analyze this data. Additionally, bringing this data together to correlate multiple data sets can be difficult and extremely time consuming. But what if all your data could be queried with single tool, using a common language? This talk will demonstrate how to use Apache Drill's enormous analytic power on security data sets and visualize this data using Apache Superset." less	Search Relevance Tuning [Gov't Track] , Tim Allison (MITRE / Apache Lucene/Solr, Apache Tika, Apache POI and Apache PDFBox) More Abstract. Enterprise search is notoriously challenging, and, out-of-the-box, quite disappointing. In this talk, I'll offer lessons learned and some basic steps taken to help several federal agencies improve their search systems. While the focus will be on Apache Solr as the backend, the lessons and steps apply to any search implementation. less
3:45p	Locking Down Apache Tomcat: Practical Security for Real-world Applications [Cyber Track] , Christopher Schultz (Total Child Health, Inc. / Apache Tomcat & Apache Velocity) More Abstract. Out of the box, Apache Tomcat is quite secure. Then you need to configure it to suit your environment, connect your data sources, and deploy your applications. Those processes can potentially reduce the security of the entire system. A thorough review of your host, network, application and configuration is necessary to identify those areas where your security needs improvement. We’ll discuss each of these areas in some detail and how some simple tweaks and tools can make you and your users safer. less	Securely Consuming Government Data with Apache Daffodil and DFDL [Gov't Track] , Steve Lawrence (Tresys / Apache Daffodil) More Abstract. The government handles vast amounts of complex and legacy data across security boundaries every day. But before such data can be consumed, it must be inspected for correctness and sanitized of malicious data. Traditional methods rely on software libraries or custom one-off data parsers and sanitizers, which are often buggy, incomplete, proprietary, and poorly maintained. The Apache Daffodil project aims to solve these problems by creating an open-source implementation of the Data Format Description Language (DFDL) specification. DFDL provides a way to fully describe a wide array of complex and legacy file formats down to the bit level. Using a DFDL data format description, Daffodil can parse data into XML or JSON, allowing for validation, sanitization, and transformation using existing and well understood technologies. Daffodil can also serialize or ''unparse'' XML/JSON back to the original file format, effectively mitigating a large variety of common vulnerabilities. This talk will discuss the DFDL specification, the open-source Apache Daffodil project, and how it can be used to securely consume government data. less
4:15p	Government IT Modernization in Practice: How Apache Ignite Adds Speed, Scale and Agility to Applications, APIs and Real-time Analytics [Gov't Track] , Glenn Wiebe (GridGain) More Abstract. The growth in data volume & velocity, plus the aging of government IT infrastructure, along with the continual drive to do more with less is putting added stress on existing data sources, application development teams and the applications, tools, and clients that consume this data. IT modernization increasingly targets legacy databases, and the roadblocks they have become to deliver faster analytics, while holding more volume and more varieties of data. Modernization project are being asked to deliver effective architectures and development patterns for new applications like modern Web apps and new tools for analytic, machine & deep learning. In-Memory Computing can deliver the real-time performance that modern apps and tools expect at the massive scale of existing / expanding data sources present and do this at speed that is expected in order to drive digital transformation founded on modern analytics and tools your clients expect. Topics to be covered are: * Add speed and scale to existing applications using Relational or NoSQL databases with no rip-and-replace * Perform real-time and streaming analytics at scale, with details of how companies use Apache Ignite and Apache Spark * Address performance issues with Hadoop and "SQL" for real-time reporting * Adopt machine and deep learning, both the model training and execution, including TensorFlow less	Apache Rya – A Scalable RDF Triple Store [Gov't Track] , Adina Crainiceanu (US Naval Academy / Apache Rya) More Abstract. The Resource Description Framework (RDF) is a standard model for storing graph data. While the standard was initially created for storing meta-data about the World Wide Web, its flexible format made it a popular choice for storing many different types of information. With the explosive increase in the size of available data, scalable solutions are needed to efficiently store and query very large RDF graphs within big data architectures. Apache Rya (incubating) is a scalable database management system designed for storing and searching very large RDF data. Rya is built on top of Apache Accumulo. Originally developed by the Laboratory for Telecommunication Sciences and US Naval Academy, Rya is currently being used by a number of government agencies and commercial companies. In this talk, we present storage methods, primary and secondary indexing schemes, statistics based query optimization, as well as query evaluation techniques that allow Rya to scale to billions of triples across multiple nodes, while providing fast and easy access to data through conventional query mechanisms such as SPARQL. less

4:45p - Endnote by Kevin A. McGrail Open Source Challenges

5:15p - Post-Event Happy Hour at Oh George's (cash bar, food & non-alcoholic drinks) - https://ohgeorge.com/ - $2 off drafts and wine by the glass

* Schedule subject to change.