hero image
David Bader - New Jersey Institute of Technology. Newark, NJ, US

David Bader

Distinguished Professor, Data Science | New Jersey Institute of Technology

Newark, NJ, UNITED STATES

Interests lie at the intersection of data science & high-performance computing, with applications in cybersecurity

Spotlight

Media

Publications:

David Bader Publication David Bader Publication David Bader Publication David Bader Publication

Documents:

Photos:

Videos:

Predictive Analysis from Massive Knowledge Graphs on Neo4j – David Bader Interview: David Bader on Real World Challenges for Big Data Analytics, 5-Minute Interview: Dave Bader, Professor at Georgia Tech College of Computing David A. Bader interviewed by NJ PBS State of Affairs with Steve Adubato

Audio/Podcasts:

Large-Scale Data Analytics For Cybersecurity And Solving Real-World Grand Challenges | Redefining CyberSecurity With Professor David Bader

Social

Biography

David A. Bader is a Distinguished Professor and founder of the Department of Data Science and inaugural Director of the Institute for Data Science at New Jersey Institute of Technology.

Dr. Bader is a Fellow of the IEEE, ACM, AAAS, and SIAM; a recipient of the IEEE Sidney Fernbach Award; and the 2022 Innovation Hall of Fame inductee of the University of Maryland’s A. James Clark School of Engineering. He advises the White House, most recently on the National Strategic Computing Initiative (NSCI) and Future Advanced Computing Ecosystem (FACE).

Bader is a leading expert in solving global grand challenges in science, engineering, computing, and data science. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics, and he has co-authored over 300 scholarly papers and has best paper awards from ISC, IEEE HPEC, and IEEE/ACM SC. Dr. Bader has served as a lead scientist in several DARPA programs including High Productivity Computing Systems (HPCS) with IBM, Ubiquitous High Performance Computing (UHPC) with NVIDIA, Anomaly Detection at Multiple Scales (ADAMS), Power Efficiency Revolution For Embedded Computing Technologies (PERFECT), Hierarchical Identify Verify Exploit (HIVE), and Software-Defined Hardware (SDH).

Dr. Bader is Editor-in-Chief of the ACM Transactions on Parallel Computing, and previously served as Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems. He serves on the leadership team of Northeast Big Data Innovation Hub as the inaugural chair of the Seed Fund Steering Committee. ROI-NJ recognized Bader as a technology influencer on its 2021 inaugural and 2022 lists.

In 2012, Bader was the inaugural recipient of University of Maryland’s Electrical and Computer Engineering Distinguished Alumni Award. In 2014, Bader received the Outstanding Senior Faculty Research Award from Georgia Tech. Bader has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor and Director of an NVIDIA GPU Center of Excellence.

In 1998, Bader built the first Linux supercomputer that led to a high-performance computing (HPC) revolution, and Hyperion Research estimates that the total economic value of Linux supercomputing pioneered by Bader has been over $100 trillion over the past 25 years.

Areas of Expertise (6)

Graph Analytics

Massive-Scale Analytics

High-Performance Computing

Data Science

Applications in Cybersecurity

Computational Genomics

Accomplishments (8)

Inductee into University of Maryland's A. James Clark School of Engineering Innovator Hall of Fame

2022

NVIDIA AI Lab (NVAIL) Award

2019

Invited attendee to the White House’s National Strategic Computing Initiative (NSCI) Anniversary Workshop.

2019

Facebook AI System Hardware/Software Co-Design Research Award

2019

Named a member of "People to Watch" by HPC Wire

2014

The first recipient of the University of Maryland's Distinguished Alumni Award

2012 Department of Electrical and Computer Engineering

Named a member of "People to Watch" by HPC Wire

2012

Selected by Sony, Toshiba, and IBM to direct the first Center of Competence for the Cell Processor

2006

Education (3)

University of Maryland: Ph.D., Electrical and Computer Engineering 1996

Lehigh University: M.S., Electrical Engineering 1991

Lehigh University: B.S., Computer Engineering 1990

Affiliations (4)

  • AAAS Fellow
  • IEEE Fellow
  • SIAM Fellow
  • ACM Fellow

Media Appearances (8)

This New AI Brain Decoder Could Be A Privacy Nightmare, Experts Say

Lifewire  online

2023-05-08

The technique offers promise for stroke patients but could be invasive.

view more

Common password mistakes you're making that could get you hacked

CBS News  online

2023-03-03

It's hard to memorize passwords as you juggle dozens of apps — whether you're logging in to stream your favorite show, view your medical records, check your savings account balance or more, you'll want to avoid unwanted prying eyes.

view more

The Democratization of Data Science Tools with Dr. David Bader

To the Point Cybersecurity podcast  online

2023-09-19

He deep dives into the opportunity to democratize data science tools and the awesome free tool he and Mike Merrill spent the last several years building that can be found on the Bears-R-Us GitHub page open to the public.

view more

Academic Data Science Alliance Picks Up Steam

Datanami  online

2022-11-22

Universities looking for resources to build their data science curriculums and degree programs have a new resource at their disposal in the form of the Academic Data Science Alliance. Founded just prior to the pandemic, the ADSA survived COVID and now it’s working to foster a community of data science leaders at universities across North America and Europe...

view more

‘Weaponised app’: Is Egypt spying on COP27 delegates’ phones?

Al Jazeera  online

2022-11-12

Cybersecurity concerns have been raised at the United Nations’ COP27 climate talks over an official smartphone app that reportedly has carte blanche to monitor locations, private conversations and photographs. About 35,000 people are expected to attend the two-week climate conference in Egypt, and the app has been downloaded more than 10,000 times on Google Play, including by officials from France, Germany and Canada...

view more

Your Hard Drive May One Day Use Diamonds for Storage

Lifewire  online

2022-05-03

Diamonds could one day be used to store vast amounts of information. Researchers are trying to use the strange effects of quantum mechanics to hold information. However, experts say don’t expect a quantum hard drive in your PC anytime soon.

view more

Big Data Career Notes: July 2019 Edition

Datanami  online

2019-07-16

The New Jersey Institute of Technology has announced that it will establish a new Institute for Data Science, directed by Distinguished Professor David Bader. Bader recently joined NJIT’s Ying Wu College of Computing from Georgia Tech, where he was chair of the School of Computational Science and Engineering within the College of Computing. Bader was recognized as one of HPCwire’s People to Watch in 2014.

view more

David Bader to Lead New Institute for Data Science at NJIT

Inside HPC  online

2019-07-10

Professor David Bader will lead the new Institute for Data Science at the New Jersey Institute of Technology. Focused on cutting-edge interdisciplinary research and development in all areas pertinent to digital data, the institute will bring existing research centers in big data, medical informatics and cybersecurity together to conduct both basic and applied research.

view more

Event Appearances (3)

Massive-scale Analytics

13th International Conference on Parallel Processing and Applied Mathematics (PPAM)  BIalystok, Poland

2019-09-09

Predictive Analytics from Massive Streaming Data

44th Annual GOMACTech Conference: Artificial Intelligence & Cyber Security: Challenges and Opportunities for the Government  Albuquerque, NM

2019-03-26

Massive-Scale Analytics Applied to Real-World Problems

2018 Platform for Advanced Scientific Computing (PASC) Conference  Basel, Switzerland

2018-07-04

Research Focus (2)

NVIDIA AI Lab (NVAIL) for Scalable Graph Algorithms

2019-08-05

Graph algorithms represent some of the most challenging known problems in computer science for modern processors. These algorithms contain far more memory access per unit of computation than traditional scientific computing. Access patterns are not known until execution time and are heavily dependent on the input data set. Graph algorithms vary widely in the volume of spatial and temporal locality that is usable my modern architectures. In today’s rapidly evolving world, graph algorithms are used to make sense of large volumes of data from news reports, distributed sensors, and lab test equipment, among other sources connected to worldwide networks. As data is created and collected, dynamic graph algorithms make it possible to compute highly specialized and complex relationship metrics over the entire web of data in near-real time, reducing the latency between data collection and the capability to take action. With this partnership with NVIDIA, we collaborate on the design and implementation of scalable graph algorithms and graph primitives that will bring new capabilities to the broader community of data scientists. Leveraging existing open frameworks, this effort will improve the experience of graph data analysis using GPUs by improving tools for analyzing graph data, speeding up graph traversal using optimized data structures, and accelerating computations with better runtime support for dynamic work stealing and load balancing.

view more

Facebook AI Systems Hardware/Software Co-Design research award on Scalable Graph Learning Algorithms

2019-05-10

Deep learning has boosted the machine learning field at large and created significant increases in the performance of tasks including speech recognition, image classification, object detection, and recommendation. It has opened the door to complex tasks, such as self-driving and super-human image recognition. However, the important techniques used in deep learning, e.g. convolutional neural networks, are designed for Euclidean data type and do not directly apply on graphs. This problem is solved by embedding graphs into a lower dimensional Euclidean space, generating a regular structure. There is also prior work on applying convolutions directly on graphs and using sampling to choose neighbor elements. Systems that use this technique are called graph convolution networks (GCNs). GCNs have proven to be successful at graph learning tasks like link prediction and graph classification. Recent work has pushed the scale of GCNs to billions of edges but significant work remains to extend learned graph systems beyond recommendation systems with specific structure and to support big data models such as streaming graphs. This project will focus on developing scalable graph learning algorithms and implementations that open the door for learned graph models on massive graphs. We plan to approach this problem in two ways. First, developing a scalable high performance graph learning system based on existing GCNs algorithms, like GraphSage, by improving the workflow on shared-memory NUMA machines, balancing computation between threads, optimizing data movement, and improving memory locality. Second, we will investigate graph learning algorithm-specific decompositions and develop new strategies for graph learning that can inherently scale well while maintaining high accuracy. This includes traditional partitioning, however in general we consider breaking the problem into smaller pieces, which, when solved will result in a solution to the bigger problem. We will explore decomposition results from graph theory, for example, forbidden graphs and the Embedding Lemma, and determine how to apply such results into the field of graph learning. We will investigate whether these decompositions could assist in a dynamic graph setting.

view more

Research Grants (6)

Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes

DARPA/NVIDIA $25,000,000

2010-06-01

Goal: Develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks

view more

Center for Adaptive Supercomputing Software for Multithreaded Architectures (CASS-MT): Analyzing Massive Social Networks

Department of Defense $24,000,000

2008-08-01

Exascale Streaming Data Analytics for social networks: understanding communities, intentions, population dynamics, pandemic spread, transportation and evacuation.

view more

Proactive Detection of Insider Threats with Graph Analysis at Multiple Scales (PRODIGAL), under Anomoly Detection at Multiple Scales (ADAMS)

DARPA $9,000,000

2011-05-01

This paper reports on insider threat detection research, during which a prototype system (PRODIGAL)1 was developed and operated as a testbed for exploring a range of detection and analysis methods. The data and test environment, system components, and the core method of unsupervised detection of insider threat leads are presented to document this work and benefit others working in the insider threat domain...

view more

Challenge Applications and Scalable Metrics (CHASM) for Ubiquitous High Performance Computing

DARPA $ 7,500,000.00

2010-06-01

Develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks.

SHARP: Software Toolkit for Accelerating Graph Algorithms on Hive Processors

DARPA $6,760,425

2017-04-23

The aim of SHARP is to enable platform independent implementation of fast, scalable and approximate, static and streaming graph algorithms. SHARP will develop a software tool-kit for seamless acceleration of graph analytics (GA) applications, for a first of its kind collection of graph processors...

view more

GRATEFUL: GRaph Analysis Tackling power EFficiency, Uncertainty, and Locality

DARPA $2,929,819

2012-10-19

Think of the perfect embedded computer. Think of a computer so energy-efficient that it can last 75 times longer than today’s systems. Researchers at Georgia Tech are helping the Defense Advanced Projects Research Agency (DARPA) develop such a computer as part of an initiative called Power Efficiency Revolution for Embedded Computing Technologies, or PERFECT. “The program is looking at how do we come to a new paradigm of computing where running time isn’t necessarily the constraint, but how much power and battery that we have available is really the new constraint,” says David Bader, executive director of high-performance computing at the School of Computational Science and Engineering. If the project is successful, it could result in computers far smaller and orders of magnitude more efficient than today’s machines. It could also mean that the computer mounted tomorrow on an unmanned aircraft or ground vehicle, or even worn by a soldier would use less energy than a larger device, while still being as powerful. Georgia Tech’s part in the DARPA-led PERFECT effort is called GRATEFUL, which stands for Graph Analysis Tackling power-Efficiency, Uncertainty and Locality. Headed by Bader and co-investigator Jason Riedy, GRATEFUL focuses on algorithms that would process vast stores of data and turn it into a graphical representation in the most energy-efficient way possible.

view more

Answers (6)

What other emerging technologies excite you in their potential to transform computing?

View Answer >

Quantum computing. This technology, with its potential to solve complex problems exponentially faster than classical computers, could revolutionize fields ranging from cryptography to drug discovery, climate modeling and beyond. Quantum computing's promise to tackle challenges currently beyond our reach, due to its fundamentally different approach to processing information, represents a leap forward in our computational capabilities. Its convergence with AI could lead to unprecedented advancements, making this era an incredibly thrilling time to be at the forefront of computing and data science.

There’s a sci-fi plot where computers get so smart that people lose control. The new class of user-friendly AI is making people excited but also nervous. Should we be afraid?

View Answer >

While it’s natural to harbor concerns about the rapid progression of AI, allowing fear to dominate the discourse would be a disservice to the potential benefits these technologies can offer. Instead, this moment calls for proactive engagement with AI and an investment in understanding its inner workings, limitations and the ethical dilemmas it presents. By advocating for responsible AI development, emphasizing education and promoting transparency, we can foster an environment where AI serves as a tool for societal advancement. This approach ensures that we remain at the helm of AI's trajectory, steering it toward outcomes that uplift humanity rather than scenarios that fuel dystopian fears.

What should non-programmers learn about AI?

View Answer >

It’s important to be aware of how AI decisions are made, the potential biases in AI systems and the ethical considerations of AI use. Additionally, developing data literacy is crucial, as it enables individuals to evaluate AI outputs and understand the importance of data quality and biases. A basic grasp of AI and machine learning concepts — even without programming skills — can demystify AI technologies and reveal their potential applications. Staying informed about AI advancements across various sectors can also inspire innovative ideas and foster interdisciplinary collaborations.

Articles (8)

Cybersecurity Challenges in the Age of Generative AI

CTOTech Magazine

David Bader

2023-11-20

Cybersecurity professionals will not only have to discover malicious events at the time of occurrence, but also proactively implement preventative measures before an attack. For these professionals, the significant challenge will be protecting against new behaviors and methods that they are not yet familiar with.

view more

What CISOs need to know to mitigate quantum computing risks

Security

David Bader

2023-06-03

Quantum technologies harness the laws of quantum mechanics to solve complex problems beyond the capabilities of classical computers. Although quantum computing can one day lead to positive and transformative solutions for complex global issues, the development of these technologies also poses a significant and emerging threat to cybersecurity infrastructure for organizations.

view more

Tailoring parallel alternating criteria search for domain specific MIPs: Application to maritime inventory routing

Computers & Operations Research

Lluís-Miquel Munguía, Shabbir Ahmed, David A Bader, George L Nemhauser, Yufen Shao, Dimitri J Papageorgiou

2019 Parallel Alternating Criteria Search (PACS) relies on the combination of computer parallelism and Large Neighborhood Searches to attempt to deliver high quality solutions to any generic Mixed-Integer Program (MIP) quickly. While general-purpose primal heuristics are widely used due to their universal application, they are usually outperformed by domain-specific heuristics when optimizing a particular problem class.

view more

High-Performance Phylogenetic Inference

Bioinformatics and Phylogenetics

David A Bader, Kamesh Madduri

2019 Software tools based on the maximum likelihood method and Bayesian methods are widely used for phylogenetic tree inference. This article surveys recent research on parallelization and performance optimization of state-of-the-art tree inference tools. We outline advances in shared-memory multicore parallelization, optimizations for efficient Graphics Processing Unit (GPU) execution, as well as large-scale distributed-memory parallelization.

view more

Numerically approximating centrality for graph ranking guarantees

Journal of Computational Science

Eisha Nathan, Geoffrey Sanders, David A Bader

2018 Many real-world datasets can be represented as graphs. Using iterative solvers to approximate graph centrality measures allows us to obtain a ranking vector on the nodes of the graph, consisting of a number for each vertex in the graph identifying its relative importance. In this work the centrality measures we use are Katz Centrality and PageRank. Given an approximate solution, we use the residual to accurately estimate how much of the ranking matches the ranking given by the exact solution.

view more

Ranking in dynamic graphs using exponential centrality

International Conference on Complex Networks and their Applications

Eisha Nathan, James Fairbanks, David Bader

2017 Many large datasets from several fields of research such as biology or society can be represented as graphs. Additionally in many real applications, data is constantly being produced, leading to the notion of dynamic graphs. A heavily studied problem is identification of the most important vertices in a graph. This can be done using centrality measures, where a centrality metric computes a numerical value for each vertex in the graph.

view more

Scalable and High Performance Betweenness Centrality on the GPU [Best Student Paper Finalist]

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

A. McLaughlin, D. A. Bader

2014-11-01

raphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is between ness centrality, which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost that prevents the examination of large graphs of interest. Prior GPU implementations suffer from large local data structures and inefficient graph traversals that limit scalability and performance. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running between ness centrality on 192 GPUs.

view more

STINGER: High performance data structure for streaming graphs [Best Paper Award]

IEEE Conference on High Performance Extreme Computing

D. Ediger, R. McColl, J. Riedy, D. A. Bader

2012-09-01

The current research focus on “big data” problems highlights the scale and complexity of analytics required and the high rate at which data may be changing. In this paper, we present our high performance, scalable and portable software, Spatio-Temporal Interaction Networks and Graphs Extensible Representation (STINGER), that includes a graph data structure that enables these applications. Key attributes of STINGER are fast insertions, deletions, and updates on semantic graphs with skewed degree distributions. We demonstrate a process of algorithmic and architectural optimizations that enable high performance on the Cray XMT family and Intel multicore servers. Our implementation of STINGER on the Cray XMT processes over 3 million updates per second on a scale-free graph with 537 million edges.

view more