Month: October 2020

5 Most Essential Skills You Need to Know to Start Doing Machine Learning

Machine Learning is an important skill to have in today’s age. But acquiring the skill set could take some time especially when the path to it is unscattered. The below-mentioned points have a very wider reach to the topics it covers and essentially would give anyone a very good start when it comes to starting from scratch. Learners should not limit themselves to only the below-mentioned set of skills as machine learning is an ever-expanding field and keeping abreast about the latest things and events always becomes very beneficial in scaling new heights in this field.


  1. Programming knowledge
  2. Applied Mathematics
  3. Data Modeling and Evaluation
  4. Machine Learning Algorithms
  5. Neural Network Architecture



The very essence of machine learning is coding(until and unless you are building something using drag and drop tools and which does not require a lot of customization) for cleaning the data, building the model, and validating them as well. Having a very good knowledge of programming skills along with the best practices always helps. You might be using java based programming or object-oriented based programming. But irrespective of what learners are using, debugging, writing efficient user defines functions and for loops and using inherent properties of the data structures essentially pays in the longer run. Having a good understanding of the below things will help


  1. Computer Science Fundamentals and Programming
  2. Software Engineering and System Design
  • Machine Learning Algorithms and Libraries
  1. Distributed computing
  2. Unix




Knowledge of mathematics and related skills will always be beneficial when it comes to the understanding of the theoretical concepts of machine learning algorithms. Statistics, calculus, coordinate geometry, probability, permutations, and combinations come in very handy, although learners do not have to practically do mathematics using them. We have the libraries and the programming language to aid in a few of these, but in order to understand the underlying principles, these are very useful. Below listed are some of the mathematical concepts which are useful


2.1. Linear Algebra

Below skills in linear algebra could be very useful

  1. Principal Component Analysis (PCA)
  2. Singular Value Decomposition (SVD)
  • Symmetric Matrices
  1. Matrix Operations
  2. Eigenvalues & Eigenvectors
  3. Vector Spaces and Norms


2.2. Probability Theory and Statistics

There are a lot of probabilistic based algorithms in machine learning and knowledge of provability becomes useful in such cases. The below mention topics in probability are good to have


  1. Probability Rules
  2. Bayes’ Theorem
  • Variance and Expectation
  1. Conditional and Joint Distributions
  2. Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian)
  3. Moment Generating Functions, Maximum Likelihood Estimation (MLE)
  • Prior and Posterior
  • Maximum a Posteriori Estimation (MAP)
  1. Sampling Methods



In the world of machine learning, there is no one fixed algorithm which could be identified well in advance and used to build the model. Irrespective of whether its classification, regression, or unsupervised, there are a host of techniques that need to be applied before deciding the best one for a given set of data points. Of course, with the due course, for time and experience modelers do have the idea which out of the lot could be better used than the rest but that is subjected to the situation.

Finalizing the model always leads to interpreting the model output and there are a lot of technical terms involved in this part that could decide the direction of interpretation. As such, not only model selection, developers would also need to stress equally on the aspect of model interpretation and hence would be in a better position to evaluate and suggest changes. Model validation is comparatively easier and well-defined when it comes to supervised learning but in the case of unsupervised, learners need to tread carefully before choosing the hows and whens of model evaluation.

The below concepts related to model validation are very useful to know in order to be a better judge of the models


  1. Components of the confusion matrix
  2. Logarithmic Loss
  • Confusion Matrix
  1. Area under Curve
  2. F1 Score
  3. Mean Absolute Error
  • Mean Squared Error
  • Rand index
  1. Silhouette score




While a machine learning engineer may not have to explicitly apply complex concepts of calculus and probability, they always have the in-build libraries(irrespective of the platform/programming language being used) to help simplify things. When it comes to the libraries, be it for data cleansing/wrangling or building models or model evaluation, they are aplenty. Knowing each and every one of them in any platform is almost impossible and more often not beneficial.

However, there would be a set of libraries which would be used day in and day out for task related to either machine learning, natural language processing, or deep learning. Hence getting familiarised with the lot would always lead to an advantageous situation and faster development time as well. Machine learning libraries associated with the below techniques are useful


  1. Exposure to packages and APIs such as scikit-learn, Theano, Spark MLlib, H2O, TensorFlow
  2. Expertise in models such as decision trees, nearest neighbor, neural net, support vector machine, and a knack for deciding which one fits the best
  • Deciding and choosing hyperparameters that affect the learning model and the outcome
  1. Familiarity and understanding of concepts such as gradient descent, convex optimization, quadratic programming, partial differential equations
  2. Underlying working principle of techniques like random forests, support vector machines (SVMs), Naive Bayes Classifiers, etc helps drive the model building process faster



                                                                     Source: Google Image




Understanding neural networks working principle requires time as it is a different terrain in the field of AI especially if one considers neural nets to be an extension of machine learning techniques. Having said that, it is not impossible to have a very good understanding after spending some time with them, getting to know the underlying principles, and working on them as well. The architecture of neural nets takes a lot of inspiration from the human brain and hence the terms are related to the architecture that has been derived from biology. Neural nets form the very essence of deep learning. Depending on the architecture of a neural net, we will have a shallow or deeper model. Depending on the depth of the architecture, the computational complexity would increase proportionately. But they evidently have an edge when it comes to solving complex problems or problems with higher input dimensions. They almost have a magical effect on the model performance when compared to the traditional machine learning algorithms. Hence it is always better to have some initial understanding so that over a period of time learner can transition smoothly


  1. The architecture of neural nets
  • Single-layer perceptron
  • Backpropagation and Forward propagation
  1. Activation functions
  2. Supervised Deep Learning techniques
  3. Unsupervised Deep Learning techniques


The neural network is an ever-growing field of study. It is primarily divided into supervised and unsupervised techniques similar to machine learning techniques. In the area of deep learning(the basis of which is neural networks), supervised techniques are mostly studied.



                                                                                Source: Google Image

You can get more on the best machine learning course.


The complainant has requested details from Tamworth Borough Council (“the Council”) regarding the details of meetings and people who conducted a previous internal review. The Council stated that it did not hold all of the information within the scope of the complainant’s request but it provided some of the information that it did hold.  It also withheld part of the held information and relied on section 40 of the FOIA to do so. The Commissioner’s decision is that she is satisfied that the Council only holds some of the information within the scope of the request and that  the Council has correctly relied upon section 40(2) to withhold the information it has done. The Commissioner therefore does not require any further steps to be taken.

Thursday News, October 29

The MLOps Stack

What is MLOps (briefly)

MLOps is a set of best practices that revolve around making machine learning in production more seamless. The purpose is to bridge the gap between experimentation and production with key principles to make machine learning reproducible, collaborative, and continuous.

MLOps is not dependent on a single technology or platform. However, technologies play a significant role in practical implementations, similarly to how adopting Scrum often culminates in setting up and onboarding the whole team to e.g. JIRA.

What is the MLOps Stack?

To make it easier to consider what tools your organization could use to adopt MLOps, we’ve made a simple template that breaks down a machine learning workflow into components. This template allows you to consider where you need tooling.

The MLOps Stack Template

Download the MLOps Stack here: Download PDF

As a machine learning practitioner, you’ll have a lot of choices in technologies. First, some technologies cover multiple components, while others are more singularly focused. Second, some tools are open-source and can be implemented freely, while others are proprietary but save you the implementation effort.

No single stack works for everyone, and you need to consider your use-case carefully. For example, your requirements for model monitoring might be much more complex if you are working in the financial or medical industry.

The MLOps Stack template is loosely based on Google’s article on MLOps and continuous delivery, but we tried to simplify the workflow to a more manageable abstraction level. There are nine components in the stack which have varying requirements depending on your specific use case.


Component Requirements Selected tooling
Data analysis E.g. Must support Python/R, can run locally
Feature store
Code repository
ML pipeline
Metadata store
Model registry
Model serving
Model monitoring

An Example MLOps Stack

The MLOps Stack Template with Valohai

As an example, we’ve put together a technology stack containing our MLOps platform, Valohai, and some of our favorite tools that work complementary to it, including:

  • JupyterHub / Jupyter Notebook for data analysis and experimentation

  • Tecton for feature stores

  • GitLab for code repositories

  • Fiddler Labs for model monitoring

  • Valohai for training pipelines, model serving, and associated stores

Get Started

You might want to start by placing the tools you’re already using and work from there. The MLOps Stack template is free to download here.

If you are interested in learning more about MLOps, consider our other related content.

Originally published at

How to prepare for Big Data Internship interview

I am having a second round interview with an insurance company for a big data internship position. This is my first interview ever for a big data role.

The comapny collects massive amount data from vehicles and they work with distributed, parallel tech like Hadoop and Kafka to analyze the data. The interviewers will probably ask me how I would make a distributed framework to digest and analyze millions of rows of data. I only know basic stuff about Hadoop and Aws.

What are the typical questions that the employers ask for an entry level position like this in big data? How can I better prepare myself? What should I review?

submitted by /u/msv5450
[link] [comments]

Security Intelligence Handbook Chapter 1: Why Security Intelligence Matters

Editor’s Note: Over the next several weeks, we’re sharing excerpts from the third edition of our popular book, “The Security Intelligence Handbook: How to Disrupt Adversaries and Reduce Risk with Security Intelligence.” Here, we’re looking at chapter one, “What Is Security Intelligence?” To read the entire section, download your free copy of the handbook.

Today, anyone with a desire to do harm — from your run-of-the-mill bad guys to nation-state attackers — has the ability to put your organization’s most sensitive data at risk simply by accessing underground marketplaces and easily purchasing off-the-shelf tools.

These adversaries assume you’re at a disadvantage, hindered by legacy vulnerabilities, a lack of secure code development processes, explosive growth of connected devices, and a dispersed workforce that’s increasingly difficult to secure.

By the time you see threat indicators on your network, it’s often too late — and you’re probably at least two steps behind your attacker. You need a way to take back control by proactively uncovering attack methods and disrupting adversaries’ efforts before they strike.

Fusing internal and external threat, security, and business insights empowers teams with the advanced warning and actionable facts needed to confidently protect your organization. Elite security intelligence makes this possible by putting actionable context at the center of every workflow and security decision.

Today’s most successful security intelligence processes share four key characteristics:

  • A collaborative process and framework
  • 360-degree visibility
  • Extensive automation and integration
  • Alignment with key business priorities

Learn why security intelligence matters in the following excerpt from “The Security Intelligence Handbook, Third Edition: How to Disrupt Adversaries and Reduce Risk With Security Intelligence.” In this excerpt, which has been edited and condensed, we’ll paint a clear picture of what security intelligence is, explore the elements that make up a successful program, and explain the key benefits.

Visibility Into Threats Before They Strike

Cyber threats come in many forms. Certainly some of them are cybercriminals who attack your network at the firewall. However, they also include threat actors operating on the open and dark web who come at you through your employees and your business partners. Some devastate your brand through social media and external websites without ever touching your network. Malicious or merely careless insiders may also wreak havoc with your data and your reputation.

By the time you see indicators of these threats on your network, it is probably too late. To prevent damage, you need advance warning of threats, accompanied by actionable facts in order to:

  • Eliminate your most serious vulnerabilities before they are exploited
  • Detect probes and attacks at the earliest possible moment and respond effectively right away
  • Understand the tactics, techniques, and procedures (TTPs) of likely attackers and put effective defenses in place
  • Identify and correct your business partners’ security weaknesses — especially those that have access to your network
  • Detect data leaks and impersonations of your corporate brand
  • Make wise investments in security to maximize return and minimize risk

Many IT organizations have created intelligence programs to obtain the advance warning and actionable facts they need to protect their data and their brands.

Actionable Facts and Insights

When people speak of security intelligence, sometimes they are referring to certain types of facts and insights, and other times to the process that produces them. Let’s look at the first case.

More than data or information

Even security professionals sometimes use the words “data,” “information” and “intelligence” interchangeably, but the distinctions are important.

Of course, the details of the data, information, and intelligence differ across political, military, economic, business, and other types of intelligence programs. For security intelligence:

  • Data is usually just indicators such as IP addresses, URLs, or hashes. Data doesn’t tell us much without analysis.
  • Information answers questions like, “How many times has my organization been mentioned on social media this month?” Although this is a far more useful output than the raw data, it still doesn’t directly inform a specific action.
  • Intelligence is factual insight based on analysis that correlates data and information from across different sources to uncover patterns and add insights. It enables people and systems to make informed decisions and take effective action to prevent breaches, remediate vulnerabilities, improve the organization’s security posture, and reduce risk.

Implicit in this definition of “intelligence” is the idea that every instance of security intelligence is actionable for a specific audience. That is, intelligence must do two things:

  1. Point toward specific decisions or actions
  2. Be tailored for easy use by a specific person, group, or system that will use it to make a decision or take an action

Data feeds that are never used and reports that are never read are not intelligence. Neither is information, no matter how accurate or insightful, if it is provided to someone who can’t interpret it correctly or isn’t in a position to act on it.

Security Intelligence: The Process

Security intelligence also refers to the process by which data and information are collected, analyzed, and disseminated throughout the organization. The steps in such a process will be discussed in Chapter 3, where we describe the security intelligence lifecycle. However, it is important to note at the outset that successful security intelligence processes have four characteristics.

1. A collaborative process and framework

In many organizations, security intelligence efforts are siloed. For example, the security operations (SecOps), fraud prevention, and third-party risk teams may have their own analysts Chapter 1: What Is Security Intelligence | 7 and tools for gathering and analyzing intelligence. This leads to waste, duplication, and an inability to share analysis and intelligence. Silos also make it impossible to assess risk across the organization and to direct security resources where they will have the greatest impact. Security intelligence programs need to share a common process and framework, enable broad access to insights and operational workflows, encourage a “big picture” view of risk, and account for the allocation of resources.

2. 360-degree visibility

Because cyber threats may come from anywhere, security intelligence programs need visibility everywhere, including:

  • Security events on the corporate network
  • Conventional threat data feeds
  • Open web forums where attackers exchange information and tools for exploiting vulnerabilities
  • Dark web communities where hackers and state-sponsored actors share techniques and plot attacks
  • Online marketplaces where cybercriminals buy and sell confidential information
  • Social media accounts where threat actors impersonate your employees and counterfeit your products

Today, many organizations focus on conventional threat data feeds, and are only now becoming aware of the need to scan a broader variety and greater quantity of sources on a regular basis.

3. Extensive automation and integration

Because there is so much data and information to capture, correlate, and process, a security intelligence program needs a high degree of automation to reduce manual efforts and produce meaningful results quickly. To add context to initial findings and effectively disseminate intelligence, successful security intelligence programs must also integrate with many types of security solutions, such as security dashboards, secu- 8 | The Security Intelligence Handbook rity information and event management solutions (SIEMs), vulnerability management systems, firewalls, and security orchestration, automation and response (SOAR) tools.

4. Alignment with the organization and security use cases

Organizations sometimes waste enormous resources capturing and analyzing information that isn’t relevant to them. A successful security intelligence program needs to determine and document its intelligence needs to ensure that collection and processing activities align with the organization’s actual priorities. Alignment also means tailoring the content and format of intelligence to make it easy for people and systems to use.

Who Benefits From Security Intelligence?

Security intelligence is sometimes perceived to be simply a research service for the security operations and incident response teams, or the domain of elite analysts. In reality, it adds value to every security function and to several other teams in the organization.

The middle section of this handbook examines the primary use cases:

  • Security operations and incident response teams are routinely overwhelmed by alerts. Security intelligence accelerates their alert triage, minimizes false positives, provides context for better decision-making, and empowers them to respond faster.
  • Vulnerability management teams often struggle to differentiate between relevant, critical vulnerabilities and those that are unimportant to their organization. Security intelligence delivers context and risk scoring that enables them to reduce downtime while patching the vulnerabilities that really matter first.
  • Threat analysts need to understand the motives and TTPs of threat actors and track security trends for industries, technologies, and regions. Security intelligence provides them with deeper and more expansive knowledge to generate more valuable insights.
  • Third-party risk programs need up-to-date information on the security postures of vendors, suppliers, and other third parties that access the organization’s systems. Security intelligence arms them with an ongoing flow of objective, detailed information about business partners that static vendor questionnaires and traditional procurement methods can’t offer.
  • Brand protection teams need continuous visibility into unsanctioned web and social media mentions, data leaks, employee impersonations, counterfeit products, typosquatting websites, phishing attacks, and more. Security intelligence tools monitor for these across the internet at scale, and streamline takedown and remediation processes.
  • Geopolitical risk and physical security teams rely on advanced warning of attacks, protests, and other threats to assets in locations around the globe. Security intelligence programs capture data and “chatter” from multiple sources and filter it to deliver precise intelligence about what’s happening in the cities, countries, and regions of interest. ; Security leaders use intelligence about likely threats and their potential business impact to assess security requirements, quantify risks (ideally in monetary terms), develop mitigation strategies, and justify cybersecurity investments to CEOs, CFOs, and board members.

Get ‘The Security Intelligence Handbook’

This chapter is one of many in our new book that demonstrates how to disrupt adversaries and measurably reduce risk with security intelligence at the center of your security program. Subsequent chapters explore different use cases, including the benefits of security intelligence for brand protection, vulnerability management, SecOps, third-party risk management, security leadership, and more.

Download your copy of “The Security Intelligence Handbook” now.

The post Security Intelligence Handbook Chapter 1: Why Security Intelligence Matters appeared first on Recorded Future.

Digital Twin, Virtual Manufacturing, and the Coming Diamond Age

If you have ever had a book self-published through Amazon or similar fulfillment houses, chances are good that the physical book did not exist prior to the order being placed. Instead, that book existed as a PDF file, image files for cover art and author photograph, perhaps with some additional XML-based metadata indicating production instructions, trim, paper specifications, and so forth.

When the order was placed, it was sent to a printer that likely was the length of a bowling alley, where the PDF was converted into a negative and then laser printed onto the continuous paper stock. This was then cut to a precise size that varied minutely from page to page depending upon the binding type, before being collated and glued into the binding.

At the end of the process, a newly printed book dropped onto a rolling platform and from there to a box, where it was potentially wrapped and deposited automatically before the whole box was closed, labeled, and passed to a shipping gurney. From beginning to end, the whole process likely took ten to fifteen minutes, and more than likely no human hands touched the book at any point in the process. There were no plates to change out, no prepress film being created, no specialized inking mixes prepared between runs. Such a book was not “printed” so much as “instantiated”, quite literally coming into existence only when needed.

It’s also worth noting here that the same book probably was “printed” to a Kindle or similar ebook format, but in that particular case, it remained a digital file. No trees were destroyed in the manufacture of the ebook.

Such print on demand capability has existed since the early 2000s, to the extent that most people generally do not even think much about how the physical book that they are reading came into existence. Yet this model of publishing represents a profound departure from manufacturing as it has existed for centuries, and is in the process of transforming the very nature of capitalism.

No alt text provided for this image

Shortly after these printing presses came online, there were a number of innovations with thermal molded plastic that made it possible to create certain types of objects to exquisite tolerances without actually requiring a physical mold. Ablative printing techniques had been developed during the 1990s and involved the use of lasers to cut away at materials based upon precise computerized instruction, working in much the same that a sculptor chips away at a block of granite to reveal the statue within.

Additive printing, on the other hand, made use of a combination of dot matrix printing and specialized lithographic gels that would be activated by two lasers acting in concert. The gels would harden at the point of intersection, then when done the whole would be flushed with reagents that removed the “ink” that hadn’t been fixed into place. Such a printing system solved one of the biggest problems of ablative printing in that it could build up an internal structure in layers, making it possible to create interconnected components with minimal physical assembly.

The primary limitation that additive printing faced was the fact that it worked well with plastics and other gels, but the physics of metals made such systems considerably more difficult to solve – and a great deal of assembly requires the use of metals for durability and strength. By 2018, however, this problem was increasingly finding solutions for various types of metals, primarily by using annealing processes that heated up the metals to sufficient temperatures to enable pliability in cutting and shaping.

What this means in practice is that we are entering the age of just in time production in which manufacturing exists primarily in the process of designing what is becoming known as a digital twin. While one can argue that this refers to the use of CAD/CAM like design files, there’s actually a much larger, more significant meaning here, one that gets right to the heart of an organization’s digital transformation. You can think of digital twins as the triumph of design over manufacturing, and data and metadata play an oversized role in this victory.

No alt text provided for this image

At the core of such digital twins is the notion of a model. A model, in the most basic definition of the word, is a proxy for a thing or process. A runway model, for instance, is a person who is intended to be a proxy for the viewer, showing off how a given garment looks. An artist’s model is a stand-in or proxy for the image, scene, or illustration that an artist is producing. An architectural model is a simulation of how a given building will look like when constructed, and with 3D rendering technology, such models can appear quite life-like. Additionally, though, the models can also simulate more than appearance – they can simulate structural integrity, strain analysis, and even chemistry interactions. We create models of stars, black holes, and neutron stars based upon our understanding of physics, and models of disease spread in the case of epidemics.

Indeed, it can be argued that the primary role of a data scientist is to create and evaluate models. It is one of the reasons that data scientists are in such increasing demand, the ability to build models is one of the most pressing that any organization can have, especially as more and more of a company’s production exists in the form of digital twins.

There are several purposes for building such models: the most obvious is to reduce (or in some cases eliminate altogether) the cost of instantiation. If you create a model of a car, you can stress test the model, can get feedback from potential customers about what works and what doesn’t in its design, can determine whether there’s sufficient legroom or if the steering wheel is awkwardly placed, can test to see whether the trunk can actually hold various sized suitcases or packages, all without the cost of actually building it. You can test out gas consumption (or electricity consumption), can see what happens when it crashes, can even attempt to explode it. While such models aren’t perfect (nor are they uniform), they can often serve to significantly reduce the things that may go wrong with the car before it ever goes into production.

However, such models, such digital twins, also serve other purposes. All too often, decisions are made not on the basis of what the purchasers of the thing being represented want, but what a designer, or a marketing executive, or the CEO of a company feel the customer should get. When there was a significant production cost involved in instantiating the design, this often meant that there was a strong bias towards what the decision-maker greenlighting the production felt should work, rather than actually working with the stake-holders who would not only be purchasing but also using the product wanted. With 3D production increasingly becoming a reality, however, control is shifting from the producer to the consumer, and not just at the higher end of the market.

Consider automobile production. Currently, millions of cars are produced by automakers globally, but a significant number never get sold. They end up clogging lots, moving from dealerships to secondary markets to fleet sales, and eventually end up in the scrapyard. They don’t get sold primarily because they simply don’t represent the optimal combination of features at a given price point for the buyer.

The industry has, however, been changing their approach, pushing the consumer much closer to the design process before the car is actually even built. Colors, trim, engine type, seating, communications and entertainment systems, types of brakes, all of these and more can be can be changed. Increasingly, these changes are even making their way to the configuration of the chassis and carriage. This becomes possible because it is far easier to change the design of the digital twin than it is to change the physical entity, and that physical entity can then be “instantiated” within a few days of ordering it.

What are the benefits? You end up producing product upon demand, rather than in anticipation of it. This means that you need to invest in fewer materials, have smaller supply chains, produce less waste, and in general have a more committed customer. The downside, of course, is that you need fewer workers, have a much smaller sales infrastructure, and have to work harder to differentiate your product from your competitors. This is also happening now – it is becoming easier for a company such as Amazon to sell bespoke vehicles than ever before, because of that digitalization process.

This is in fact one of the primary dangers facing established players. Even today, many C-Suite managers see themselves in the automotive manufacturing space, or the aircraft production space, or the book publishing space. Yet ultimately, once you move to a stage where you have digital twins creating a proxy for the physical object, the actual instantiation – the manufacturing aspect – becomes very much a secondary concern.

Indeed, the central tenet of digital transformation is that everything simply becomes a publishing exercise. If I have the software product to build a car, then ultimately the cost of building that car involves purchasing the raw materials and the time on a 3D printer, then performing the final assembly. There is a growing “hobbyist’ segment of companies that can go from bespoke design to finished product in a few weeks. Ordinarily the volume of such production is low enough that it is likely tempting to ignore what’s going on, but between Covid-19 reshaping retail patterns, the diminishing spending power of Millennials and GenZers, and the changes being increasingly required by Climate Change, the bespoke digital twin is likely to eat into increasingly thin margins.

Put another way, existing established companies in many different sectors have managed to maintain their dominance both because they were large enough to dictate the language that described the models and because they could take advantage of the costs involved in manufacturing and production creating a major barrier to entry of new players. That’s now changing.

No alt text provided for this image

Consider the first part of this assertion. Names are important. One of the realizations that has emerged in the last twenty years is that before two people or organizations can communicate with one another, they need to establish (and refine) the meanings of the language used to identify entities, processes, and relationships. An API, when you get right down to it, is a language used to interact with a system. The problem with trying to deal with intercommunication is that it is generally far easier to establish internal languages – the way that one organization defines its terms – than it is to create a common language. For a dominant organization in a given sector, this often also manifests as the desire to dominate the linguistic debate, as this puts the onus of changing the language (a timeconsuming and laborious process) into the hands of competitors.

However, this approach has also backfired spectacularly more often than not, especially when those competitors are willing to work with one another to weaken a dominant player. Most successful industry standards are pidgins – languages that capture 80-90% of the commonality in a given domain while providing a way to communicate about the remaining 10-20% that typifies the specialty of a given organization. This is the language of the digital twin, the way that you describe it, and the more that organizations subscribe to that language, the easier it is for those organizations to interchange digital twin components.

To put this into perspective, consider the growth of bespoke automobiles. One form of linguistic harmonization is the standardization of containment – the dimensions of a particular component, the location of ports for physical processes (pipes for fluids, air and wires) and electronic ones (the use of USB or similar communication ports), agreements on tolerances and so forth. With such ontologies in place, construction of a car’s digital twin becomes far easier. Moreover, by adhering to these standards, linguistic as well as dimensional, you still get specialization at a functional level (for instance, the performance of a battery) while at the same time being able to facilitate containment variations, especially with digital printing technology.

As an ontology emerges for automobile manufacturing, this facilitates “plug-and-play” at a macro-level. The barrier to entry for creating a vehicle drops dramatically, though likely not quite to the individual level (except for well-heeled enthusiasts). Ironically, this makes it possible for a designer to create a particular design that meets their criterion, and also makes it possible for that designer to sell or give that IP to others for license or reuse. Now, if history is any indication, that will likely initially lead to a lot of very badly designed cars, but over time, the bad designers will get winnowed out by long-tail market pressures.

Moreover, because it becomes possible to test digital twins in virtual environments, the market for digital wind-tunnels, simulators, stress analyzers and so forth will also rise. That is to say, just as programming has developed an agile methodology for testing, so too would manufacturing facilitate data agility that serves to validate designs. Lest this be seen as a pipe dream, consider that most contemporary game platforms can, with very little tweaking, be reconfigured for exactly this kind of simulation work, especially as GPUs increase in performance and available memory.

The same type of interoperability applies not just to the construction of components, but also to all aspects of resource metadata, especially with datasets. Ontologies provide ways to identify, locate and discover the schemas of datasets for everything from usage statistics to simulation parameters for training models. The design of that car (or airplane, or boat, or refrigerator) is simply one more digital file, transmissible in the same way that a movie or audio file is, and containing metadata that puts those resources into the broader context of the organization.

The long term impact on business is simple. Everything becomes a publishing company. Some companies will publish aircraft or automobiles. Others will publish enzymes or microbes, and still others will publish movies and video games. You still need subject matter expertise in the area that you are publishing into – a manufacturer of pastries will be ill-equipped to handle the publishing of engines, for instance, but overall you will see a convergence in the process, regardless of the end-product.

How long will this process take to play out? In some cases, it’s playing out now. Book publishing is almost completely virtual at this stage, and the distinction between the physical object and the digital twin comes down to whether instantiation takes place or not. The automotive industry is moving in this direction, and drone tech (especially for military drones) have been shifting this way for years.

On the other hand, entrenched companies with extensive supply chains will likely adopt such digital twins approaches relatively slowly, and more than likely only at a point where competitors make serious inroads into their core businesses (or the industries themselves are going through a significant economic shock). Automobiles are going through this now, as the combination of the pandemic, the shift towards electric vehicles, and changing demographics are all creating a massive glut in automobile production that will likely result in the collapse of internal combustion engine vehicle sales altogether over the next decade along with a rethinking of the ownership relationship with respect to vehicles.

Similarly, the aerospace industry faces an existential crisis as demand for new aircraft has dropped significantly in the wake of the pandemic. While aircraft production is still a very high-cost business, the ability to create digital twins – along with an emergence of programming ontologies that make interchange between companies much more feasible – has opened up the market to smaller, more agile competitors who can create bespoke aircraft much more quickly by distributing the overall workload and specializing in configurable subcomponents, many of which are produced via 3D printing techniques.

No alt text provided for this image

Construction, likewise, is dealing with both the fallout due to the pandemic and the increasing abstractions that come from digital twins. The days when architects worked out details on paper blueprints are long gone, and digital twins of construction products are increasingly being designed with earthquake and weather testing, stress analysis, airflow and energy consumption and so forth. Combine this with the increasing capabilities inherent in 3D printing both full structures and custom components in concrete, carbon fiber and even (increasingly) metallic structures. There are still limitations; as with other large structure projects, the lack of specialized talent in this space is still an issue, and fabrication units are typically not yet built on a scale that makes them that useful for onsite construction.

Nonetheless, the benefits make achieving that scaling worthwhile. A 3D printed house can be designed, approved, tested, and “built” within three to four weeks, as opposed to six months to two years for traditional processes. Designs, similarly, can be bought or traded and modified, making it possible to create neighborhoods where there are significant variations between houses as opposed to the prefab two to three designs that tend to predominate in the US especially. Such constructs also can move significantly away from the traditional boxy structures that most houses have, both internally and externally, as materials can be shaped to best fit the design aesthetic rather than the inherent rectangular slabs that typifies most building construction.

Such constructs can also be set up to be self-aware, to the extent that sensors can be built into the infrastructure and viewscreens (themselves increasingly moving away from flatland shapes) can replace or augment the views of the outside world. In this sense, the digital twin of the instantiated house or building is able to interact with its physical counterpart, maintaining history (memory) while increasingly able to adapt to new requirements.

No alt text provided for this image

This feedback loop – the ability of the physical twin to affect the model – provides a look at where this technology is going. Print publishing, once upon a time, had been something where the preparation of the medium, the book or magazine or newspaper, occurred only in one direction – from digital to print. Today, the print resides primarily on phones or screens or tablets, and authors often provide live blog chapters that evolve in agile ways. You’re seeing the emergence of processors such as FPGAs that configure themselves programmatically, literally changing the nature of the processor itself in response to software code.

It’s not that hard, with the right forethought, to envision real world objects that can reconfigure themselves in the same way – buildings reconfiguring themselves for different uses or to adapt to environmental conditions, cars that can reconfigure its styling or even body shape, clothing that can change color or thermal profiles, aircraft that can be reconfigured for different uses within minutes, and so forth . This is reality in some places, though still piecemeal and one-offs, but the malleability of the digital twins – whether of office suites or jet engines – is the future of manufacturing.

The end state, likely still a few decades away, will be an economy built upon just-in-time replication and the importance of the virtual twin, where you are charged not for the finished product but the cost of the license to use a model, the material components, the “inks”, for same, and the processing to go from the former to the latter (and back), quite possibly with some form of remuneration for recycled source. Moreover, as this process continues, more and more of the digital twin carries the burden of existence (tools that “learn” a new configuration are able to adapt to that configuration at any time). The physical and the virtual become one.

No alt text provided for this image

Some may see the resulting society as utopian, others as dystopian, but what is increasingly unavoidable is the fact that this is the logical conclusion of the trends currently at work (for some inkling of what such a society may be like, I’d recommend reading The Diamond Age by Neal Stevenson, which I believe to be very prescient in this regard).

Scroll to top