NOTE: The Career Fair starts at 10AM and runs until at least 3PM
8:00a - Registration, Continental Breakfast & Career Fair
Keynote: 9:00a
Jim Jagielski: Why Open Source is Vital in IT, (ConSensys / Apache HTTP & Tomcat)
Abstract. At its core, any IT development done in today’s environment needs to leverage, consume,
and involve open source. This session will describe what open source is, why open source works, and how to
fully engage with the open source community. Whether knowledgeable about open source, or requiring an intro
to the topic, this session is for you. Suitable for both developers as well as managers and CXO-level execs.
Dawn of the Code War: Book Signing
John P. Carlin, former Assistant Attorney General for the US Department of Justice’s (DOJ) National Security
Division (NSD), chairs Morrison & Foerster’s Global Risk + Crisis Management practice and co-chairs the
National Security practice, where he advises industry-leading organizations in sensitive cyber- and other
national security matters. He is the author of Dawn of the Code War: America’s Battle Against Russia,
China, and the Rising Global Cyber Threat, which provides an inside look into how we combat daily attacks
on United States companies, citizens and government. Prior to serving as the DOJ’s highest-ranking
national security lawyer, Mr. Carlin served as Chief of Staff and Senior Counsel to FBI Director Robert
S. Mueller, III. Under his leadership, the NSD launched nationwide outreach across industries to raise
awareness of national security, cyber- and espionage threats against US companies and encourage greater
C-suite involvement in corporate cybersecurity matters. Mr. Carlin also chairs the Aspen Institute’s
Cybersecurity and Technology policy program, which provides a cross-disciplinary forum for industry,
government, and media to address the rapidly developing landscape of digital threats and craft
appropriate policy solutions.
Sessions:
Time
|
Room A
|
Room B
|
9:15a |
The Apache Way: The Heart and Soul of the ASF [Apache Track]
, Jim Jagielski (ConSensys / Apache
HTTP & Tomcat)
More
Abstract. The Apache Way forms the heart and soul on how the ASF runs and how all
Apache projects are based. Learn not only the core tenets of the Apache Way, but the history and
rationale behind them.
less
|
Hacks for getting a great job that's right for you [Career Track]
, Michael Rizzo (BMC Software)
More
Abstract. There are a lot of bad, OK or even 'decent' jobs out there but how do you
find a great job for you?
We will explore how to create your own job description of a 'great job', how to find companies
and jobs that will be a good fit and then how to hack past the automated systems with job
description bingo and bypassing gatekeepers to get to the right person that can hire you.
If you are 'new' to the job market or in an 'OK' job right now, This will be a presentation and
discussion focusing on both new candidates as well as people that are in 'OK' jobs right now.
A key focus is how relevant, modern and open source knowledge and experience can make you stand
out from the crowd.
less
|
9:45a |
From no OSS contributions to Apache PMC in 16 months. [Apache Track]
, Rob Tompkins
(Capital One / Apache Commons)
More
Abstract. I started making Open Source contributions in March of 2016, and by July of
2017 I was on the Project Management Committee for Apache Commons.
I will explain why I decided to get involved in open source development and how I figured out
how to make my first contributions. Further I will talk about how my open source work has
fostered my personal career development.
less
|
You want to be at a company that gives its code away. [Career Track]
, Gil Yehuda (Verizon Media)
More
Abstract. An impassioned pitch as to why open source is critical for business success
and why you, the job seeker, want to work at a company that gets the meaning of open source.
I'll draw upon more than 8 years of running an open source program at one at a large tech
company to explain why giving away software is the key to growing the right technology
infrastructure
less
|
10:15a - Mid-morning Coffee Break @ Sponsor Booths
Time
|
Room A
|
Room B
|
10:45a |
Hadoop {Submarine} Project: Running deep learning workloads on YARN [Apache Track]
, Tim Spann (Cloudera)
More
Abstract. Deep learning is useful for enterprises tasks in the field of speech
recognition, image classification, AI chatbots and machine translation, just to name a few.
In order to train deep learning/machine learning models, applications such as TensorFlow /
MXNet / Caffe / XGBoost can be leveraged. And sometimes these applications will be used
together to solve different problems.
To make distributed deep learning/machine learning applications easily launched, managed,
monitored. Hadoop community has introduced Submarine project along with other improvements
such as first-class GPU support, container-DNS support, scheduling improvements, etc. These
improvements make distributed deep learning/machine learning applications run on YARN as
simple as running it locally, which can let machine-learning engineers focus on algorithms
instead of worrying about underlying infrastructure. Also, YARN can better manage a shared
cluster which runs deep learning/machine learning and other services/ETL jobs with these
improvements.
In this session, we will take a closer look at Submarine project as well as other improvements
and show how to run these deep learning workloads on YARN with demos. Audience can start
trying running these workloads on YARN after this talk.
less
|
Your career is what you make it, so what will you make? [Career Track]
, Nakia Powell (Intel)
More
Abstract. Ready to learn how to become the driver of your career? Now is the time to stop dreaming about the career you want and put your best foot forward to having not only the career you envision, but the life you deserve. If you are ready to take control of your life as an active participant and end the cycle of letting others dictate and drive your decisions about your career and most importantly your life, then this talk is for you! Many times people dream of the careers they want with no real motivation or confidence to see it manifest fully. Join this talk to know firsthand, how to end the cycle of an unfulfilled career and live the life you want to live!
less
|
11:15a |
Need for speed: Boosting Apache Cassandra's performance using Netty [Apache Track]
, Dinesh Joshi (IEEE / Apache
Cassandra)
More
Abstract. Apache Cassandra 4.0 has several enhancements. One of the biggest enhancements is switching from blocking network IO using JDK Sockets to Non Blocking IO with Netty. As a result, Cassandra has seen gains in performance and efficiency. These gains translate into real world costs and allow Cassandra to scale better. This presentation will take you on a tour of the improvements of Cassandra's network layer (old & new) and help quantify the gains in real world terms.
less
|
Digital Artifact Persistence, Extraction, Analysis, and Manipulation [CARE Track]
, Jim Jones (George Mason University)
More
Abstract. James 'Jim' Jones has been a cyber security and digital forensics
practitioner, researcher, and educator for over 20 years in industry, government, and academia.
That experience drives his teaching, which blends theory and practical applications, and his
research, which focuses on the extraction, analysis, and manipulation of partial digital
artifacts. These digital fragments are the remnants of past actions and processes. Jim and his
students spend their days (and nights) collecting and analyzing these fragments, much like a
traditional archaeologist works with fragments of pottery or stone tools. This analysis enables
us to look backwards in time to understand cyber attacks, find malware infections, detect system
and device misuse, and recover lost data.
Jim's research funding comes from industry and the US Government. Past and current funded
research sponsors include the Defense Advanced Research Projects Administration (DARPA), the
U.S. Department of Homeland Security (DHS), the National Science Foundation (NSF), and the
United States Department of Defense (DoD). His research interests are focused on digital
artifact extraction, analysis, and manipulation, and on offensive cyber deception in adversarial
environments. He has degrees in Systems Engineering (BS), Mathematical Sciences (MS), and
Computational Sciences and Informatics (PhD). This formal education is complemented with work
experience and extensive self-learning, driven by an insatiable curiosity and a need to know how
things work, how they break, and what we can learn from both.
Digital data dies an uncertain death. Delete a file today, and the content might be entirely destroyed immediately, or some of it may survive for a few seconds, hours, days, or longer. For a forensic investigator, this is good news – residual fragments of a deleted file might be recoverable days, months, even years after the file was deleted. But why do some fragments persist while others do not, what can we infer from the fragments that we can recover, and can such fragments be artificially created or modified? In this talk, I will discuss our efforts to understand the patterns and mechanisms of deleted digital data decay, analysis and interpretation of recovered fragments, and techniques for the manipulation of digital fragments under various circumstances.
less
|
11:45a |
Building a Better Knowledge Graph Pipeline [Apache Track]
, Nathan Maynes (Thomson
Reuters Special Services)
More
Abstract. Storing relationships in a graph provides the flexibility to represent
dynamic relationships in the real world. That flexibility tempts stakeholders to put
everything into the graph and hope for something magic. This presentation will provide an
overview of lessons learned building a Solr powered knowledge graph for global supply
chains. It will focus on how Apache NiFi can be used to maintain strong data management to
keep your data lake from turning into a swamp.
less
|
Using OSS to enhance Cyber dataset [Cyber Track]
, Eddie Satterly (DataNexus)
More
Abstract. The talk with focus on using OSS to automate the flow of data from varying
and new sources, and process in stream, to create a richer store of data for cybersecurity
events. If you want to learn about how to collect new sources including IoT, relational and
event streams as well as combine them in stream to create new capabilities this talk is for you.
less
|
12:15p - 1:15p: Lunch Talks
Room A
|
Room B
|
(Location to be determined)
|
(WebEx:https://wso2.zoom.us/my/nuwanbando)
|
About Docker Container(s)
, Srinivasa "Naga" Kadiyala
(Deloitte)
More
Abstract. Brief and practical introduction about Docker. What is Docker, why it is used
and what it does?
Objective is to help new open source enthusiasts to drive in to Docker world, put them in the
right direction to become a Docker professional. Help them guide with next steps for making
their Docker journey successful.
less
|
Fast and Flexible Data Integrations with Apache Airflow
, Michael Ghen (Benefits Data
Trust)
More
Abstract. Apache Airflow is an open source workflow management system written in Python
and originally built at Airbnb. Its programming model allows for the rapid development of
integrations with just about any information system using Python. This talk is about Apache
Airflow's programming model and how it has been used to successfully integrate with information
systems at Federal and State agencies. In this talk, we will provide an overview of Apache
Airflow, review its programming model and use cases, and walk through 2 real-world examples that
show how Airflow is being used for data integrations between information systems. Apache Airflow
is a easy to use task scheduler that anyone working on internal or external data integrations
should consider. With over 500 contributors and 200 users (including banks, hospitals, and
unicorn startups), Apache Airflow is an open source software project worth exploring.
less
|
Keysigning Party
, Tim Allison (MITRE /
Apache Lucene/Solr, Apache Tika, Apache POI and Apache PDFBox)
More
Abstract. We'll do a keysigning party during lunch at the Roadshow! To speed
things up, email your key to tallison <at sign> apache.org by 5PM
(EDT) on March 24th. Please bring a pen and an ID to the signing.
See https://wiki.apache.org/apachecon/PgpKeySigning for the process
and background.
less
|
Ballerina: Cloud Native Programming Language
, Nuwan Bandara (WSO2)
More
Abstract. Crazy customer demand has caused companies like Google and Amazon to build
massively disaggregated architectures in order to scale. Massively disaggregated approaches like
microservices, serverless, and APIs are becoming the norm for us all. These disaggregated
components are network accessible as programmable endpoints. The apps we will write increasingly
depend upon these endpoints. Integration is the discipline of resilient communication between
endpoints. It isn’t easy. The challenges include compensation, transactions, events, circuit
breakers, discovery, and protocol handling, and mediation.
Ballerina - https://ballerina.io/
Ballerina makes it easy to write resilient services that orchestrate and integrate across
distributed endpoints. It’s a language that understands protocols, endpoints, transactions,
workers, JSON, XML, asynchrony, and tainted data as first class constructs.
Ballerina is a compiled language with its own VM technology based upon a custom JVM. Services
execute as servers, microservices, and serverless functions packaged for deployment into any
infrastructure including Kubernetes.
It’s the first language that understands the architecture around it - the compiler is
environment aware, and includes or integrates circuit breakers, identity servers, API gateways,
and message brokers.
This session will cover Ballerina’s language and runtime model while building a variety of
integrations. We’ll also cover how the Ballerina open source community operates and how you can
get involved.
less
|
Afternoon sessions.
Time
|
Room A
|
Room B
|
1:15p |
CARE: Local Governments, Cybersecurity Governance and Open Source Software [CARE Track]
, J.P. Auffret (Geroge
Mason University)
More
Abstract. Description of CARE (Center for Assurance Research and Engineering) projects.
Targeted audience: general public, students and professionals who are interested in the CARE's
work.
less
|
Schemas, Records, and Registries with Apache NiFi [Apache Track]
, Bryan Bende (Cloudera /
Apache NiFi)
More
Abstract. This talk will introduce Apache NiFi’s “record” abstraction which provides a
powerful way of treating common data formats such as JSON, CSV, and Avro, as a sequence records.
This approach typically unlocks significant performance benefits, and often greatly simplifies
many dataflows. We will start by discussing the role of schemas and introduce NiFi’s record
reader & writer concept, along with the various options for accessing a schema. We will then
walk through simple examples demonstrating how to use the most popular record processors for
performing operations such as conversion, updating, merging, and partitioning. Finally, we will
discuss the options for integrating NiFi with enterprise schema registries, and how this can be
used across an end-to-end streaming platform.
less
|
1:45p |
Cloud Native Threat Modelling at Data-Center Scale [Cyber Track]
, Jay Vyas (Platform 9)
More
Abstract. Do you know how many critical CVE's are exposed as running images in your data center right now ? If so - do you know how often that number fluctuates (and how often you should recalculate it)? As kuberenetes is increasingly a platform which where ASF components, such as Spark, Mesos, Airflow, will run on a continuous basis, the ability to model incoming threats that are embedded in containers, at some image layer, becomes an increasingly critical step in the continous delivery workflow. We introduce the open-source perceptor project for real-time annotation of vulnerabilities, tooling for autoscaling vulnerability detection using the kubernetes API, and a standalone threat modelling project for simulating vulnerability churn in data centers with 100s of users, over time scales of 100s of days. We demonstrate the power of this approach by analyzing a collection of ASF artifacts (hadoop, zookeeper, solr, tomcat) on large clusters, alongside timeseries analysis of the ensuing vulnerability status. In addition, we demonstrate how to integrate free and open source scanning into open source projects, using ASF BigTop as an example.
less
|
Apache cTAKES - NLP in Healthcare [Gov't Track]
, Alexandru Zbarcea (Fannie Mae)
More
Abstract. Improve your skills with one of the hot innovations in the industry: Natural
Language Processing.
Apache cTAKES bridges the gap between humans and machines by transforming clinical thinking from
provider notes into semi-structured data that can be processed by machines.
This talk will also comprise of a demo analyzing and visualizing healthcare records using Apache
cTAKES , running using Docker. From A-Z in 5 minutes.
less
|
2:15p |
The Anatomy of a Secure Java Web App Using Apache Fortress [Cyber Track]
, Shawn McKinney
(Symas / Apache Directory)
More
Abstract. The Jakarta EE architecture provides the necessary enablement but most
developers do not have the time or the training to take full advantage of what it has to offer.
This technical session describes various techniques and principles surrounding the topic of
application security best practices on the Java platform. It includes practical, hands-on
guidance to implementing security controls with Java, Spring and Apache Fortress. In addition to
finding out where the controls must be placed and why, attendees will be provided code to
kick-start their own highly secure Java web apps.
less
|
Apache role in Healthcare and Disaster Relief [Gov't Track]
, Hadrian Zbarcea (apifocal /
Apache ActiveMQ)
More
Abstract. Apache is the open source powerhouse where talented engineers collaborate to
build best of breed technologies. While it is recognized that Hadoop and Spark dominate the big
data space, it is less known that the integration stack (Kafka, ActiveMQ, CXF, Karaf) provide
the infrastructure backbone for many mission or safety critical projects (e.g. FAA SWIM).
This talk provides an overview of what makes Apache projects ideal for large integration
projects, with a focus on Healthcare, the ONC National Healthcare Information Network and
Disaster Relief Infrastructure.
The presentation also includes a demo and concrete, practical advice for students to get
involved in OSS and use their talents for building the infrastructure needed by their
generation.
less
|
2:45p - Mid-Afternoon Snack @ Sponsor Booths; Happy 20th Anniversary, ASF.
Time
|
Room A
|
Room B
|
3:15p |
Drilling Security Data: Rapid Analysis of Security Data with Apache Drill [Cyber Track]
, Charles S. Givre (Deutsch Bank)
More
Abstract. "Security data is often challenging to analyze because it comes in a variety
of formats that are difficult and time consuming to parse, requiring a myriad of various tools
to analyze this data. Additionally, bringing this data together to correlate multiple data sets
can be difficult and extremely time consuming.
But what if all your data could be queried with single tool, using a common language?
This talk will demonstrate how to use Apache Drill's enormous analytic power on security data
sets and visualize this data using Apache Superset."
less
|
Search Relevance Tuning [Gov't Track]
, Tim Allison (MITRE /
Apache Lucene/Solr, Apache Tika, Apache POI and Apache PDFBox)
More
Abstract. Enterprise search is notoriously challenging, and, out-of-the-box, quite
disappointing. In this talk, I'll offer lessons learned and some basic steps taken to help
several federal agencies improve their search systems. While the focus will be on Apache Solr as
the backend, the lessons and steps apply to any search implementation.
less
|
3:45p |
Locking Down Apache Tomcat: Practical Security for Real-world Applications [Cyber Track]
, Christopher Schultz
(Total Child Health, Inc. / Apache Tomcat & Apache Velocity)
More
Abstract. Out of the box, Apache Tomcat is quite secure. Then you need to configure it
to suit your environment, connect your data sources, and deploy your applications. Those
processes can potentially reduce the security of the entire system. A thorough review of your
host, network, application and configuration is necessary to identify those areas where your
security needs improvement. We’ll discuss each of these areas in some detail and how some simple
tweaks and tools can make you and your users safer.
less
|
Securely Consuming Government Data with Apache Daffodil and DFDL [Gov't Track]
, Steve Lawrence (Tresys /
Apache Daffodil)
More
Abstract. The government handles vast amounts of complex and legacy data across
security boundaries every day. But before such data can be consumed, it must be inspected for
correctness and sanitized of malicious data.
Traditional methods rely on software libraries or custom one-off data parsers and sanitizers,
which are often buggy, incomplete, proprietary, and poorly maintained.
The Apache Daffodil project aims to solve these problems by creating an open-source
implementation of the Data Format Description Language (DFDL) specification. DFDL provides a way
to fully describe a wide array of complex and legacy file formats down to the bit level. Using a
DFDL data format description, Daffodil can parse data into XML or JSON, allowing for validation,
sanitization, and transformation using existing and well understood technologies. Daffodil can
also serialize or ''unparse'' XML/JSON back to the original file format, effectively mitigating
a large variety of common vulnerabilities.
This talk will discuss the DFDL specification, the open-source Apache Daffodil project, and how
it can be used to securely consume government data.
less
|
4:15p |
Government IT Modernization in Practice: How Apache Ignite Adds Speed, Scale and Agility to Applications, APIs and Real-time Analytics [Gov't Track]
, Glenn Wiebe
(GridGain)
More
Abstract. The growth in data volume & velocity, plus the aging of government IT infrastructure, along with the continual drive to do more with less is putting added stress on existing data sources, application development teams and the applications, tools, and clients that consume this data.
IT modernization increasingly targets legacy databases, and the roadblocks they have become to deliver faster analytics, while holding more volume and more varieties of data. Modernization project are being asked to deliver effective architectures and development patterns for new applications like modern Web apps and new tools for analytic, machine & deep learning.
In-Memory Computing can deliver the real-time performance that modern apps and tools expect at the massive scale of existing / expanding data sources present and do this at speed that is expected in order to drive digital transformation founded on modern analytics and tools your clients expect.
Topics to be covered are:
* Add speed and scale to existing applications using Relational or NoSQL databases with no rip-and-replace
* Perform real-time and streaming analytics at scale, with details of how companies use Apache Ignite and Apache Spark
* Address performance issues with Hadoop and "SQL" for real-time reporting
* Adopt machine and deep learning, both the model training and execution, including TensorFlow
less
|
Apache Rya – A Scalable RDF Triple Store [Gov't Track]
, Adina
Crainiceanu (US Naval Academy / Apache Rya)
More
Abstract. The Resource Description Framework (RDF) is a standard model for storing
graph data. While the standard was initially created for storing meta-data about the World Wide
Web, its flexible format made it a popular choice for storing many different types of
information. With the explosive increase in the size of available data, scalable solutions are
needed to efficiently store and query very large RDF graphs within big data architectures.
Apache Rya (incubating) is a scalable database management system designed for storing and
searching very large RDF data. Rya is built on top of Apache Accumulo. Originally developed by
the Laboratory for Telecommunication Sciences and US Naval Academy, Rya is currently being used
by a number of government agencies and commercial companies.
In this talk, we present storage methods, primary and secondary indexing schemes, statistics
based query optimization, as well as query evaluation techniques that allow Rya to scale to
billions of triples across multiple nodes, while providing fast and easy access to data through
conventional query mechanisms such as SPARQL.
less
|
4:45p - Endnote by Kevin A. McGrail Open Source Challenges
5:15p - Post-Event Happy Hour at Oh George's (cash bar, food & non-alcoholic drinks) - https://ohgeorge.com/ - $2 off drafts and wine by the glass
* Schedule subject to change.