NCBI API Interface

This document provides comprehensive documentation for the JavaScript code that powers our platform's interaction with the NCBI API.

Introduction

Our application is designed to fetch and process biological data in formats like FASTA, GenBank, and GenBank XML. This is achieved by communicating directly with the NCBI's Entrez Programming Utilities (E-utilities) API.

Client-Side Power

A key feature of our design is its ability to fetch data directly from the browser without a backend server. We use the Fetch API to make HTTP requests to NCBI endpoints, retrieving and processing data in real-time. This approach reduces complexity and improves performance.

Core Functions

Our API interaction is built around three main asynchronous functions.

getInfo(reqString)

This function fetches a list of IDs from the NCBI ESearch API based on a query. It constructs the API URL dynamically and returns the raw XML data.

async function getInfo(reqString) {
  const urlEsearch = `${baseUrl}esearch.fcgi?db=nuccore&term=${reqString}&api_key=${apiKey}`;
  try {
    const response = await fetch(urlEsearch);
    if (response.ok) {
      return await response.text();
    } else {
      throw new Error(`Esearch error, error code: ${response.status}`);
    }
  } catch (error) {
    throw new Error("Error fetching data from ESearch API: " + error.message);
  }
}

parseEsearchXml(xmlData)

This function parses the XML string from getInfo to extract the biological record IDs. It uses the browser's built-in DOMParser API to convert the text into a navigable DOM object.

async function parseEsearchXml(xmlData) {
  try {
    const parser = new DOMParser();
    const xmlDoc = parser.parseFromString(xmlData, "application/xml");
    const idElements = xmlDoc.getElementsByTagName("Id");
    const ids = Array.from(idElements).map((el) => el.textContent);
    return ids;
  } catch (error) {
    throw new Error("Error parsing XML data: " + error.message);
  }
}

esummary(idArray)

This function takes an array of IDs and fetches summary information for each one using the ESummary API. It efficiently retrieves metadata like title, organism, and sequence length in a single batch request.

async function esummary(idArray) {
  const idString = idArray.join(",");
  const urlEsummary = `${baseUrl}esummary.fcgi?db=nuccore&id=${idString}&version=2.0&api_key=${apiKey}`;

  try {
    const response = await fetch(urlEsummary);
    if (!response.ok) {
      throw new Error(`ESummary error, error code: ${response.status}`);
    }

    const xmlData = await response.text();
    const parser = new DOMParser();
    const xmlDoc = parser.parseFromString(xmlData, "application/xml");
    const docSummaries = xmlDoc.getElementsByTagName("DocumentSummary");
    const summaries = [];

    Array.from(docSummaries).forEach((docSummary) => {
      const summary = { Uid: docSummary.getAttribute("uid") };
      const tags = ["Title", "CreateDate", "UpdateDate", "Biomol", "Length", "Organism"];
      tags.forEach((tag) => {
        const element = docSummary.getElementsByTagName(tag)[0];
        if (element) {
          summary[tag] = element.textContent;
        }
      });
      summaries.push(summary);
    });
    return summaries;
  } catch (error) {
    throw new Error("Error fetching data from ESummary API: " + error.message);
  }
}

User Flow

When a user enters a query, our application performs a search and displays up to 20 results. Each result is presented in a responsive card layout, featuring download buttons for FASTA, GenBank, and GenBank XML formats.

Clicking on a result card redirects the user to our file viewer page, where they can see the parsed data, including the results from our intelligent feature search, the raw sequence, and the complete, original GenBank file.