To visit the Butler Snow Coronavirus Hub, click here.

Using Artificial Intelligence in Drug Discovery

Discovering new medicines is fundamentally complex, time-consuming, and expensive.[1] Even though thousands of compounds are ceaselessly studied as potential drug candidates of therapeutic value, it is estimated that it takes an average of 10 or more years to develop an innovative medicine, and only 12 percent of the drugs entering clinical trials will result in an approved medicine.[2] Considering the failure rate and inherent complexities, it is not surprising that the cost of developing novel drugs has increased exponentially over time. The average cost of developing a single new drug rose by 145 percent between 2003 and 2013, and it is estimated that the average out-of-pocket costs (alone) to develop a new drug easily exceed $1 billion.[3]

According to the U.S. Commissioner of Food and Drugs, Dr. Scott Gottlieb, on a relative basis, the costs of early-stage drug development have grown at a proportionally faster rate than the costs of late-stage drug development.[4]

As scientific and technological advancements redefine the drug discovery research and development (R&D) process, biopharmaceutical researchers are increasingly outsourcing[5] and partnering with other stakeholders in new collaborative models[6] seeking to address scientific and technological challenges and create greater efficiencies in their drug discovery programs. The number of early stage R&D discovery, basic research, and pre-clinical partnerships among biopharmaceutical stakeholders more than doubled between 2005 and 2014.[7]

The emergence of Artificial Intelligence (AI) solutions in early drug discovery promises to accelerate the discovery cycle time for lead candidates, improve success rates, and reduce costs. A Morgan Stanley report suggests that by digitizing drug discovery, companies could realize more than 20 percent in potential annual R&D savings by 2030.[8]

The modern multidimensional research approaches required for understanding mechanisms of complex diseases and biological systems generate enormous amounts of data. Advances in genomics, biology, high-throughput screening, in silico techniques, and combinatorial chemistry have dramatically increased the volume, diversity, and availability of biological macromolecule and small molecule data.[9] Thus, “[m]odern biology has entered the era of big data, wherein datasets are too large, high-dimensional, and complex for classical computational biology methods.”[10]

Data that was previously too difficult to access and analyze can now be leveraged through AI. AI algorithms and machine learning provide a framework for processing massive discovery datasets and probing biological systems. The newest “deep learning” methods model high-level representations of data using neural networks composed of multiple processing layers,[11] and have the capacity to find hidden unintuitive patterns and “learn” from existing data by modifying processing on the basis of newly acquired information. It is expected that these emerging AI solutions will have a major impact in bioinformatics, genomics, genetics and personalized medicine. Drug discovery scientists are already using AI platforms in a variety of complex applications, such as:

  • to predict bioactivity of small molecules for drug discovery applications;[12]
  • to analyze vast quantities of unstructured bioscience information, such as genomic data and information contained in patents, clinical trial data, scientific papers and other publications across biomedical journals and databases, to find connections and identify promising drug candidates using predictive and generative biochemistry;[13]
  • to predict therapeutic indications and side effects from various drug information sources;[14]
  • to analyze biological activity and make-up of diseased and healthy patient samples, combined with patient clinical data to reveal biomarkers for disease that may be used to develop improved diagnostics and/or lead to development of therapeutics;[15] and
  • to design bispecific compounds that independently bind to more than one disease target.[16]


There are several key terms to consider when negotiating AI-enabled R&D collaboration and licensing terms to ensure intended business results and allocations of assets and rights.


The potential for gaining access to intellectual property (IP) information and technology is typically what motivates research collaboration. Collaborative drug discovery and development endeavors are generally complex arrangements involving IP and technology contributions made by multiple parties and individual members of multidisciplinary teams. While AI-enabled projects give rise to all the usual issues associated with outsourcing and collaborative drug discovery and development, utilization of AI may introduce additional potential uncertainties, particularly with respect to the ownership and use of results and IP rights.

At the outset, it is important to precisely identify pre-existing technology, data and IP that each party will bring to the project and to address the types of assets and IP that may be developed/codeveloped in the course of collaboration.

It is also crucial to separate and clearly define and distinguish IP rights (i.e., patents, copyrights, trade secrets, as well as related registrations, applications, and priority rights) and tangible properties and technologies (e.g., software, hardware, biological materials, databases, research plans, and research tools). Combining IP rights with tangible properties/other subject matters may result in unintended consequences and future disputes. For example, if datasets, technologies, and/or other such properties are improperly included in the definition of IP, IP assignments and/or license grants will likely extend to conflated properties and rights in unintended ways.

In some cases, it may also be beneficial to address certain IP rights separately. For example, by separating patent rights from other IP rights, a license to patent rights may be granted for the duration of the patent term, while a separate license to know-how may extend beyond the expiration of the patent term.

Since R&D works and inventions enabled by AI-implemented algorithms may be developed by a number of individual authors/inventors (e.g., biologists, chemists, mathematicians and computer scientists), it may be difficult to ascertain which aspects of collaborative works/inventions are developed individually or jointly. It isn’t difficult to foresee the potential for disputes when there are multiple potential authors and inventors at various stages, and IP ownership may not be clear under applicable law, particularly in projects enabled by sophisticated AI technology capable of designing, testing and generating results by itself. Therefore, R&D agreement terms should explicitly address IP rights.

The parties may wish to grant a present assignment of all developed IP rights to one party, perhaps with licenses granted back to the assigning party(ies). This approach may also facilitate further licensing of IP to third parties, as potential licensees will have greater assurance that there are no gaps in rights.

There are significant implications inherent to joint development and ownership of IP. Under U.S. law, when two or more inventors contribute to a patentable invention, they are considered to be co-inventors, regardless of the extent of contribution made by each.[17] It may surprise some R&D discovery partners to know that, in the absence of agreement to the contrary, each coinventor owns an equal and undivided interest in the entire patent, and that each of the joint owners of a patent may grant non-exclusive license rights to third parties without the consent of and without accounting to other joint owners.[18] In the event that IP will be jointly owned, the joint owners may wish to include a contractual requirement that each will obtain the other’s approval prior to licensing any jointly owned IP to any third party licensee (including approval of intended uses), and that each will share in commercial benefits of licensing.

“Work made for hire” is a narrowly defined phrase[19] under U.S. copyright law that is unlikely to apply in the context of a drug discovery or development project, and this often misunderstood doctrine doesn’t apply in any way to patent rights.  Moreover, IP rights cannot be transferred by simply declaring in an agreement that something is a “work made for hire.” Assignments and exclusive license grants to IP must be expressly made in writing, and all grants should be carefully detailed to avoid potential disputes. As desirable and appropriate, exclusive and non-exclusive IP license grants may be limited in duration, territory, scope, field of use, or existence (e.g. conditioned on occurrence of specified event(s)).


AI system development is vitally dependent on access and use of massive amounts of data, which are utilized to train and refine the AI system’s decision-making abilities. At the outset of an AIcentric project, it is important to consider who owns the data/IP rights in data used to train the AI machine learning system and whether there are any restrictions that may prevent the proposed use. As data obtained from numerous sources may be subject to various IP rights and/or protected by contractual restrictions, use without due diligence review presents a potential liability risk for all parties concerned. Moreover, when personal data will be used in AI training and/or research, there will be data protection and privacy issues and responsibilities to address.

In the context of AI-driven R&D projects utilizing data leveraged from multiple sources, project partners will not always be able to confidently discern or agree on ownership of output datasets. Rather than focusing on ownership in such cases, often a workable solution is to establish a straight forward data-sharing arrangement that identifies all permitted data users and specifies the authorized uses each user may make.

AI hosting partners should be required to securely retain and make resulting project datasets available in a format that can be accessed and/or transferred to the customer/partner’s system (or another partner’s/vendor’s system), in compliance with HIPAA, as applicable. It is also important to secure extended time and rights to access, copy, extract and/or transfer project data from cloud servers in certain circumstances and following completion of the collaboration and/or termination of the agreement.

In addition to standard representations and warranties, it is advisable to include the following representations in an AI-drug development collaboration/outsourcing agreement:

(1) representations from each contributing partner that it is the owner of its specified technology and related IP rights and has all necessary rights to use, assign, and license such technology and rights as provided in the agreement, and (2) representations that no claims and/or threats have been made alleging that the technology, related IP, or the use thereof, infringes any third party rights, and/or which, if adjudicated, would interfere with the intended use.


Remarkable advancements in science, data analytics, computational technologies, and AI machine-based learning algorithms are radically changing how new medicines are discovered. As AI solutions are employed in various aspects of drug discovery and development, public and private-sector researchers are collaborating in new ways to leverage each other’s strengths, navigate financial, scientific, and technological hurdles, and bring effective therapeutics to patients faster.

[1] Ou-Yang, S.; Lu, J.; Kong, X.; Liang, Z.; Luo, C.; and Jiang, H., Computational Drug Discovery, Acta Pharmacological Sinica 33.9 (2012), (accessed Sept. 18, 2017),

[2] Modernizing Drug Discovery, Development & Approval, PhRMA, March 2016 (accessed Sept. 28, 2017),

[3] Scott Gottlieb, M.D., Speech to the Regulatory Affairs Professionals Society (RAPS), 2017 Regulatory Conference (Remarks as prepared for delivery) U.S. Food & Drug Administration (Sept. 11, 2017),

[4] Id.

[5] Trends in Drug Discovery Outsourcing; A Perspective. BioSpectum. Vol. 12 Issue 6 (June 2017),

[6] Life Sciences Industry Outlook 2017, Deloitte® Life Sciences Industry Outlook (Jan. 2017),

[7] Partnering For Progress: How Collaborations are Fueling Biomedical Advances, 2017 (accessed Sept. 29, 2017),

[8] Can a Dose of Digital Cure Drug Industry of Its Ills? (April 26, 2017) (accessed Sept. 27, 2017),

[9] Katsila, T.; Spyroulias, GA; Patrinos, GP; and Matsoukas, M-T, Computational Approaches In Target Identification And Drug Discovery, Computational and Structural Biotechnology Journal 14 (2016): 177–184. (accessed Sept. 18, 2017),

[10] Aliper, A.; Plis, S.; Artemov, A.; Ulloa, A.; Mamoshina, P.; and Zhavoronkov, A., Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data, Molecular Pharmaceutics 2016, 13 (7), 2524–2530,

[11] Id.

[12] Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery (Oct.10, 2015),

[13] Benevolent AI (accessed Sept. 29, 2017),

[14] Machine Learning Models For Drug Discovery (April 10, 2017),

[15] Back to Biology – BERG interrogative Biology® Platform (accessed Sept. 14, 2017),

[16] Exscientia (accessed Sept. 14, 2017),

[17] See 35 U.S. Code § 116,

[18] “In the absence of any agreement to the contrary, each of the joint owners of a patent may make, use, offer to sell, or sell the patented invention within the United States, or import the patented invention into the United States, without the consent of and without accounting to the other owners.” 35 U.S. Code § 262,

[19] 17 U.S. Code § 101,