Uncategorized Archives - Page 18 of 439 - Mobile Monitoring Solutions

Uncategorized

MongoDB, Inc. (NASDAQ:MDB) Shares Sold by Mirae Asset Global Investments Co. Ltd.

MMS • RSS

Mirae Asset Global Investments Co. Ltd. lessened its position in shares of MongoDB, Inc. (NASDAQ:MDB – Free Report) by 12.6% during the 1st quarter, according to the company in its most recent 13F filing with the SEC. The firm owned 48,728 shares of the company’s stock after selling 7,008 shares during the quarter. Mirae Asset Global Investments Co. Ltd. owned about 0.06% of MongoDB worth $8,592,000 at the end of the most recent reporting period.

A number of other large investors have also recently added to or reduced their stakes in MDB. Vanguard Group Inc. grew its holdings in shares of MongoDB by 0.3% in the fourth quarter. Vanguard Group Inc. now owns 7,328,745 shares of the company’s stock valued at $1,706,205,000 after purchasing an additional 23,942 shares in the last quarter. Franklin Resources Inc. grew its stake in shares of MongoDB by 9.7% in the fourth quarter. Franklin Resources Inc. now owns 2,054,888 shares of the company’s stock worth $478,398,000 after acquiring an additional 181,962 shares in the last quarter. Geode Capital Management LLC increased its position in shares of MongoDB by 1.8% during the fourth quarter. Geode Capital Management LLC now owns 1,252,142 shares of the company’s stock worth $290,987,000 after acquiring an additional 22,106 shares during the period. First Trust Advisors LP raised its stake in shares of MongoDB by 12.6% in the fourth quarter. First Trust Advisors LP now owns 854,906 shares of the company’s stock valued at $199,031,000 after acquiring an additional 95,893 shares in the last quarter. Finally, Norges Bank purchased a new position in MongoDB in the fourth quarter valued at $189,584,000. 89.29% of the stock is currently owned by institutional investors and hedge funds.

Analyst Upgrades and Downgrades

A number of equities research analysts have recently weighed in on the stock. Stifel Nicolaus reduced their price objective on shares of MongoDB from $340.00 to $275.00 and set a “buy” rating on the stock in a report on Friday, April 11th. Bank of America raised their price target on MongoDB from $215.00 to $275.00 and gave the company a “buy” rating in a research note on Thursday, June 5th. Royal Bank Of Canada reaffirmed an “outperform” rating and set a $320.00 price target on shares of MongoDB in a report on Thursday, June 5th. KeyCorp downgraded MongoDB from a “strong-buy” rating to a “hold” rating in a report on Wednesday, March 5th. Finally, Daiwa Capital Markets started coverage on MongoDB in a research note on Tuesday, April 1st. They set an “outperform” rating and a $202.00 target price for the company. Eight analysts have rated the stock with a hold rating, twenty-five have given a buy rating and one has assigned a strong buy rating to the stock. Based on data from MarketBeat, MongoDB has an average rating of “Moderate Buy” and a consensus price target of $282.47.

Read Our Latest Stock Analysis on MongoDB

Insider Buying and Selling

In related news, CEO Dev Ittycheria sold 25,005 shares of the stock in a transaction on Thursday, June 5th. The stock was sold at an average price of $234.00, for a total value of $5,851,170.00. Following the transaction, the chief executive officer owned 256,974 shares of the company’s stock, valued at $60,131,916. This represents a 8.87% decrease in their ownership of the stock. The transaction was disclosed in a legal filing with the Securities & Exchange Commission, which is available through the SEC website. Also, Director Hope F. Cochran sold 1,174 shares of the company’s stock in a transaction on Tuesday, June 17th. The shares were sold at an average price of $201.08, for a total transaction of $236,067.92. Following the completion of the sale, the director directly owned 21,096 shares of the company’s stock, valued at $4,241,983.68. This trade represents a 5.27% decrease in their position. The disclosure for this sale can be found here. Insiders sold a total of 28,999 shares of company stock worth $6,728,127 over the last ninety days. 3.10% of the stock is owned by corporate insiders.

MongoDB Trading Down 0.8%

NASDAQ MDB traded down $1.61 during mid-day trading on Wednesday, hitting $204.43. The company’s stock had a trading volume of 2,169,154 shares, compared to its average volume of 1,959,454. The firm has a market capitalization of $16.70 billion, a price-to-earnings ratio of -179.32 and a beta of 1.41. The company’s 50 day simple moving average is $193.91 and its 200 day simple moving average is $216.09. MongoDB, Inc. has a twelve month low of $140.78 and a twelve month high of $370.00.

MongoDB (NASDAQ:MDB – Get Free Report) last posted its earnings results on Wednesday, June 4th. The company reported $1.00 EPS for the quarter, topping the consensus estimate of $0.65 by $0.35. MongoDB had a negative net margin of 4.09% and a negative return on equity of 3.16%. The firm had revenue of $549.01 million for the quarter, compared to analysts’ expectations of $527.49 million. During the same quarter last year, the firm earned $0.51 EPS. The company’s quarterly revenue was up 21.8% compared to the same quarter last year. As a group, research analysts forecast that MongoDB, Inc. will post -1.78 EPS for the current year.

MongoDB Company Profile

(Free Report)

MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

MongoDB Doubles Down on India’s Database Boom – Entrepreneur

MMS • RSS

Opinions expressed by Entrepreneur contributors are their own.

You’re reading Entrepreneur India, an international franchise of Entrepreneur Media.

“India is very important for us as a market. We have almost 700 employees here. We also have an engineering team now in India, which is part of the overall MongoDB strategy,” says Sachin Chawla, Vice President, India and ASEAN, MongoDB.

MongoDB’s customer base in India spans early-stage startups such as RFPIO, high-growth unicorns like Zepto and Zomato, large financial institutions such as Canara HSBC Life Insurance, digital platforms like SonyLIV, and Indian ISVs including Intellect.ai and Darwinbox.

According to Grand View Research, the Indian database management system (DBMS) market generated USD 2.5 billion in revenue in 2024 and is expected to reach USD 7.5 billion by 2030, growing at a compound annual growth rate (CAGR) of 19.8 per cent. Interestingly, China’s DBMS market stood at USD 8.7 billion in 2024 and is projected to reach USD 19.8 billion by 2030, growing at a CAGR of 14.6 per cent from 2025 to 2030.

Moving beyond legacy systems

Chawla says MongoDB is helping Indian enterprises move beyond legacy systems through two distinct approaches. “The first one is when customers decide to build a completely new modern application, gradually sunsetting the old legacy application,” he explains. “We work closely with them to build these modern systems.”

He gives the example of Tata Neu, a loyalty app that integrates over 40 Tata brands, including Taj Hotels and Tata Croma, into a single digital platform. “That entire application is built on MongoDB,” he says.

“The second approach is application modernisation,” Chawla adds. “Here, companies want to retain their existing application but upgrade it. Over the last two years, we’ve developed a methodology using AI tools and our expertise to modernise the full stack, not just the database.”

Tackling myths around modern databases

Despite this fast-paced growth, Chawla points out several lingering myths in India. “A lot of customers still haven’t realised that if you want to build a modern application especially one that’s AI-driven you can’t build it on a relational structure,” he explains. “Most of the data today is unstructured and messy. So you need a database that can scale, can handle different types of data, and support modern workloads.”

Another misconception Chawla observes is the belief that each use case requires a purpose-built database: one for time-series data, another for geospatial queries, and so on.

“The problem with that approach is if you have four or five different use cases, you end up managing five different databases. It becomes a nightmare in terms of management, scalability, and integration,” he says.

Even those trying to move away from traditional databases often fall into the trap of viewing PostgreSQL as a modern alternative. “PostgreSQL is still relational in nature. It has the same row-and-column limitations and scalability issues.” He also adds that if companies want to build a future-proof application especially one that infuses AI capabilities they need something that can handle all data types and offers native support for features like full-text search, hybrid search, and vector search.

Other NoSQL players such as Redis and Apache Cassandra also have significant traction in India. Redis sees over 12 million daily downloads and reportedly earns 60 per cent of its revenue from national database projects. Apache Cassandra holds a 14.95 per cent presence in India, with enterprise users including Infosys, Fujitsu, and Panasonic. Amazon’s DynamoDB commands around 9.5 per cent of the market.

What’s next for India?

MongoDB operates from Bengaluru and Gurugram and plans to deepen its presence across India.

“Our focus remains on startups, late-stage digital-native businesses (DNBs), ISVs, and enterprises. We’ll continue working closely with customers to solve their data challenges,” Chawla affirms.

Uncategorized

A NoSQL document based eCRF system for study of vaccines with variable adverse events …

MMS • RSS

Abstract

In case report studies (adverse effects of drugs/vaccines) that are unstructured, a structured relational model is not applicable for designing a database, necessitating the use of an unstructured data model, specifically NoSQL. Therefore, the most important and optimal unstructured data model for eCRF, which has the nature of a form, is the document-oriented model. This paper develops and evaluates a reporting system for drug intervention studies with high variability in adverse events that utilizes a document-based NoSQL data model and the eCRF nature, allowing for the management of structured and unstructured data. The main objective of the research is to create a flexible, fast, and efficient system for collecting and analyzing data related to drug/vaccine adverse events, especially the COVID-19 vaccine, for stakeholders. This research is of an applied-descriptive type. In this research, first, after studying library resources, the requirements and requirements for the design of the proposed system were determined in the form of the Software requirements specification (SRS) standard. Then, the design and implementation included modeling and creating a prototype of the web-based system. To evaluate usability, the User Experience Questionnaire (UEQ) was used, system security was assessed using the Application Security Verification Standard (ASVS) questionnaire, and a comparative evaluation of the performance between MongoDB and SQLServer was performed. This research was conducted with the aim of designing and evaluating a reporting system for COVID-19 vaccine side effects, based on a document-oriented data model. It includes key components such as the information and side effects collection module, the document management module, and the reporting module. The results indicated that the user experience of the system, in terms of attractiveness, transparency, efficiency, reliability, motivation, and innovation, had an average score of 2.31, placing it within the top 10% of results. Additionally, the evaluations showed that the system employs effective security controls; however, improvements were needed in certain areas such as meeting management and authentication. A comparative assessment of the performance between the document-oriented data model and the relational data model demonstrated that the proposed system was able to provide better performance in response time and management of unstructured data. Evaluations have shown that utilizing case report forms, along with the advantages of a document-oriented data model, can be effective in collecting the minimum necessary data set for interventional studies, particularly those related to drug side effects such as the COVID-19 vaccine. Given the variable nature of the virus and the potential for unknown side effects, this requires flexible and precise approaches. Additionally, the use of the existing system, considering the results of the security and usability assessment, could be effective if access to external systems is improved.

Post-marketing surveillance study on influenza vaccine in South Korea using a nationwide spontaneous reporting database with multiple data mining methods

Article
Open access
24 November 2022

Computational evaluation and benchmark study of 342 crystallographic holo-structures of SARS-CoV-2 Mpro enzyme

Article
Open access
20 June 2024

Serious adverse drug reactions associated with anti-SARS-CoV-2 vaccines and their reporting trends in the EudraVigilance database

Article
Open access
27 May 2025

Introduction

The COVID-19 pandemic began in Wuhan, China, in December 2019 and has seriously threatened the health and lives of people worldwide in a short period, and it continues to this day¹. Doctors, researchers, and healthcare policymakers around the globe have been challenged to provide effective prevention and treatment methods to combat the pandemic. Unfortunately, a definitive treatment has not yet been discovered, and currently, prevention through adherence to health protocols alongside vaccination are recognized as the only agreed-upon and effective solutions. Although the rapid development of a vaccine against COVID-19 is an extraordinary achievement, successful vaccination for the global population presents numerous challenges, from production to distribution, deployment, and, most importantly, acceptance. In fact, trust in vaccines is vital and fundamentally depends on the ability of governments to communicate the benefits of vaccination and to provide vaccines that are safe and effective². The specific circumstances have led to many vaccines being used for public administration solely with emergency authorization and without completing all phases of clinical trials, with safety data being published as interim reports or clinical trial reports^3,4,5,6. It is worth noting that the drug approval process, including that of the COVID-19 vaccine, involves studies and clinical trials on consumers to record and analyze related symptoms and side effects. Therefore, the public and private sectors have united to test vaccines that are effective and safe.

In this regard, the Case Report Form (CRF) is recommended as it is a specialized document in clinical research that can impact the success of the study⁷ and must be appropriate in content based on the study protocol, including the necessary information for collecting specific study data. CRF systems are of two types: paper-based systems and electronic data-based systems⁸. Suitable methods and tools for managing clinical data (including data collection, electronic information recording, printing the Case Report Form (CRF), maintaining the CRF, data storage, validation, privacy protection, and more) have been provided, and the use of electronic systems has become widespread in this field due to advantages such as improved data quality, online dispute management, rapid database locking, cost reduction, increased efficiency, and so on^9,10,11. This underscores the need to utilize an eCRF system, considering the benefits of this approach to data collection in clinical trials during pandemic conditions.

Given the importance of conducting these studies for unknown drug and vaccine side effects, this research will focus on the design and evaluation of an eCRF system for studying the side effects of COVID-19 vaccination. In designing this eCRF, considering the complexity of these studies for the COVID-19 vaccine (including the large volume of data, the diversity of data due to various factors such as the diversity of strains, symptoms, and clinical manifestations, etc.), and on the other hand, the form-based and record-based nature of the eCRF system, the data model of this system will utilize a modern NoSQL document-based data model instead of the traditional relational data model. This approach seems to be more suitable for the nature and characteristics of eCRF and the specific features of the viruses and the vaccines designed for them.

The rest of the paper is organized as follow: the proposed NoSQL document-based eCRF system for study of vaccines with variable adverse events (regarding the COVID-19 vaccines as the case study) is presented in Sect. 2. The needed steps and phases (w.r.t. software development process) are described in details. The proposed eCRF is analyzed and evaluated in terms of different aspects and quality metrics in Sect. 3 and discussed in Sect. 4. Finally, the paper in concluded in Sect. 5.

The proposed NoSQL document-based eCRF system

The current research is of an applied-descriptive type and consists of three main phases. Its aim is to create a case reporting system for collecting a minimum dataset related to the adverse outcomes of COVID-19 vaccine side effects, based on a NoSQL data model with a document-based storage approach, conducted in the year 2024.

To achieve a web-based case reporting system that provides a set of minimum essential data on adverse effects resulting from medication or vaccination, we first needed to determine and define the minimum data set and the expected capabilities of the reporting system. One of the core requirements of the case report form served as a basis for designing a suitable database, considering the variable nature of the virus and the potential rare allergic reactions resulting from vaccine administration. Therefore, in the next step, we specified which data model we wanted to use based on the system’s intended purpose. The model selection should be such that it can adapt over time to changes in future research or updates to the existing system for other virus strains, thereby reducing associated costs. Following that, the third phase was the design of the reporting system, which included modules to meet the proposed requirements.

Phase I: needs assessment and planning

Initially, we conducted a survey to determine what features and functionalities are required for a reporting system to collect information on the adverse effects of COVID-19 vaccination¹². For this purpose, in the requirements engineering phase, which lasted for six months, the necessary information was gathered using methods such as observation, brainstorming, questionnaires, and literature review from internet searches in scientific databases like Scopus, Web of Science, and Google Scholar, as well as examining similar studies. The study population included library resources and a group of patients, health caregivers, disease experts, and health managers who were selected to complete the questionnaires. All procedures were conducted in accordance with relevant ethical guidelines and regulations. The study protocol was approved by the Ethics Committee of Tarbiat Modares University (Approval Code: IR.MODARES.REC.1400.333). Informed written consent was obtained from all participants, and from legal guardians where applicable. Additionally, the research environment was the health center of Sari County from Mazandaran University of Medical Sciences, which facilitated access to stakeholders and the conduct of field studies. The requirements and design specifications for the system were extracted in the form of an SRS standard and reviewed in subsequent phases¹². Therefore, by examining existing vaccine adverse event reporting systems in the country and obtaining feedback from specialists, the main features were identified. Subsequently, the necessary capabilities were reviewed by a group of experts consisting of two physicians, one faculty member in medical informatics, one faculty member in virology, and two faculty members in biostatistics, thus confirming the content validity of the designed questionnaire qualitatively. In this study, the data obtained from the completed questionnaires by the designated groups were analyzed using SPSS software, version 26. Consequently, absolute frequency values for each of the questionnaire items were calculated from the perspective of each respondent group and, in general, from the perspective of all respondents. The focus then shifted to the core elements for developing the reporting system, considering the conducted studies and the minimum required dataset¹². In the next phase, to present a model based on expectations and required capabilities, the requirements and modules were determined, resulting in the necessary modeling to advance the project objectives.

Phase II: design and implementation

Architecture and system model

To develop the case reporting system, it is essential to utilize the software development life cycle, which includes the stages of requirements determination, design, implementation, evaluation, deployment, and maintenance. These stages are designed based on recognized software engineering standards, such as the IEEE/EIA 12,207 standard, to ensure that the software development process is carried out in the best possible manner and that user requirements are accurately met¹³. In this context, a three-layer model is used as the conceptual model of the system. At the highest level, the logical layer presents the main functions of the program. This layer is responsible for processing data and providing it to the display layer. The data layer provides an interface for the logical layer and performs necessary operations, including storing, editing, deleting, and retrieving data, without getting involved in the complexities of the database. In this layer, the database is designed and utilized. In the display layer, the case reporting system is web-based.

The architecture of the vaccine adverse event reporting system, as shown in Fig. 1, will consist of four sections. The first section addresses the stakeholders of the system, including managers, patients, health caregivers, experts, and other relevant individuals. These users can interact with the system through common web browsers such as Edge, Chrome, Firefox, and others to access the services and information they need. The second section pertains to the web server, which receives requests and inquiries sent by users and responds to them with the aim of delivering web pages, necessary site files, executing applications, and providing various services such as databases, ultimately presenting results to users.

The third section is the user interface, which enables users to interact with the system through web pages, forms, menus, and buttons. This interface provides the necessary tools for users such as managers, patients, and caregivers to access features, submit reports, and navigate the system efficiently. The fourth section is the core program, responsible for processing data and executing the main logic of the system. It collects and validates information submitted by users, such as adverse event reports, and generates meaningful outputs like statistics, analyses, and reports for further use or display.

We aimed to collect a minimum expected dataset for clinical trial studies within the conceptual model architecture, focusing on the core of the localized eCRF system to gather data on COVID-19 vaccine side effects based on a NoSQL document-based database, as shown in Fig. 2. To achieve this, we reviewed the study by Psotka and colleagues, which included specific components such as demographics, vital signs, physical examinations, patient-reported outcomes, medical history, laboratory tests, concomitant medications, and device treatments¹⁴. This conceptual model consists of three sections: components, modules, and the main objectives of the eCRF core. The main components and modules have been modified to align with the reporting system’s goals. Details of this step have been reported elsewhere¹².

System modules

System modeling approach

Before creating a system, it is necessary to provide a structure for each module to clarify its functionality and the relationships between these modules. The models were designed using UML language and with the help of Visual Paradigm software as the tools used in the research environment, including class diagrams, use case diagrams, Entity-relationship diagram, and activity diagrams. The created diagrams were used to display the system structure, the relationships among data elements, and each of the existing sections in the system, as well as to generate a scenario for examining the work process. The actors in the system design include two groups: administrators and users, who in this research are referred to as disease specialists, health caregivers, or patients.After extracting the requirements, modeling was carried out at three levels: data modeling, functional modeling, and workflow modeling, each of which is presented below:

Structural view of classes and objects:

In the case reporting system, the class diagram accurately and comprehensively displays entities such as “care providers” (employee), “patient,” “case report form,” and the relationships among them (see Appendix A Fig. 16). This diagram shows that the class “care providers” serves as a base class, which includes subclasses such as manager, specialist, and health caregiver that inherit from it. Additionally, the relationship between “health caregiver” and “case report form” is defined such that each health caregiver can create an unlimited number of case report forms, while each form is registered by only one health caregiver.

Dynamic view of users’ interactions:

In Appendix A Fig. 17, the users of the case reporting system include system owners, healthcare providers, and patients, and the use cases are depicted based on the extracted requirements, such as patient registration and report generation. Relationships such as “includes” and “extends” are used to illustrate the division of responsibilities and enhance the system’s capabilities, such as “access level determination,” which is an extended use case of “report generation.” These relationships cover the diverse needs of the project through the reuse of use cases and the development of tasks.

Dynamic view of activity flows:

In activity diagrams, the start of a process is represented by a filled circle, the end by a cross mark in an empty circle, and each activity is shown with a rectangle. The workflow is indicated by arrows, and decisions are represented by diamond shapes. Appendix A Fig. 18 provides an example of the activity diagram for the case reporting system, illustrating how vaccine side effects are reviewed and recorded for a patient either through self-reporting or by a healthcare provider.

Structural view of Entity Relationship.

To facilitate a clear understanding of the data structure for developers and contributors, a UML Class Diagram (besides the Entity Relationship Diagram (ERD)) were designed to visually represent the relationships between entities and the stored data (Appendix A Fig. 16). These diagrams are included in the project documentation and serve as a quick reference guide to the underlying data architecture.Prototype Design.

Based on the modeling conducted, a prototype of the vaccine side effects reporting system was created as a model template using the web service Figma, which was one of the other tools utilized in the research. This was done to evaluate, assess, and revise the completed models by creating a visual representation to reduce product development costs, ultimately accelerating the determination of relationships between the system entities and the implementation of the graphical user interface. In the software development approach of this research, the agile development thinking method was employed with the Scrum framework, which is a repeatable and incremental framework for project control and an alternative to the waterfall model of software development. Scrum allows teams to break the project into smaller parts and continuously deliver them to customers through short iterations (usually 2 to 4 weeks) known as sprints.

Phase III: Software development

Data models in clinical data collection systems, particularly in electronic case report forms (eCRFs), play a vital role in recording and managing information. Some diseases and side effects of medications or vaccines have variable natures and can encompass a wide range of symptoms. In such cases, structured relational data models may have limitations and may not effectively manage this diversity. Therefore, the use of unstructured data models such as NoSQL, especially document-oriented models, can provide a suitable solution. This is because these forms are designed as electronic documents with a flexible structure. In document-oriented models, each document can store data specific to a patient, study, or particular sample in a single format, without the need for the rigid schema constraints of relational models. NoSQL databases are not entirely schemeless; instead, they often adopt an implicit schema that emerges from the structure of inserted data over time. While relational databases enforce an explicit schema at the table and field level, systems like MongoDB allow flexible data modeling without predefined structures. In practice, developers typically follow an internal or code-level data model, even if it is not enforced by the database¹⁵. Thus, the difference in schema handling between relational and NoSQL databases lies on a spectrum of flexibility rather than a strict dichotomy. Consequently, the use of NoSQL databases, particularly document-oriented models, serves as a data storage approach for eCRFs, storing data in the form of documents (such as JSON or XML) that offer high flexibility in managing complex and variable data. A related study demonstrated PostgreSQL’s ability to support complex data environments through tools like ZomboDB, i2b2, and React-Admin. However, this architecture, while powerful, requires advanced technical skills and custom development. Additionally, the PostgREST Data Provider is tightly coupled with PostgREST due to its specific filters and features^16,17. Thus, while PostgreSQL can replicate many NoSQL capabilities, choosing between it and document-based systems like MongoDB should be based on project requirements, performance expectations, development ease, and operational costs.

Traditional relational database management systems (RDBMS) have historically faced challenges in managing diverse and evolving data structures, particularly in scenarios requiring high flexibility, such as electronic case report forms (eCRFs) used in vaccine adverse event studies. This is because relational databases enforce rigid schemas that necessitate predefined structures, making it difficult to accommodate dynamically changing attributes and unstructured data. However, modern relational DBMSs, particularly PostgreSQL, have addressed some of these limitations through JSONB support, which enables semi-structured data storage and querying, allowing for more schema flexibility. Additionally, PostgreSQL offers advanced indexing and optimization techniques that enhance query performance. Furthermore, modular extensions like ZomboDB integrate PostgreSQL with Elasticsearch, providing full-text search capabilities that improve data retrieval efficiency^16,18. These advancements demonstrate that relational databases have evolved to support more flexible data models.

A key difference between relational and document-based NoSQL databases is their approach to data structure enforcement. Relational databases require explicit, predefined schemas, meaning that any structural changes necessitate schema modifications, which can be complex and time-consuming. NoSQL databases, such as MongoDB, offer a more flexible approach by allowing documents with varying structures to coexist within the same collection. However, this does not mean NoSQL databases are entirely “schema-free.” Instead, they follow a “flexible schema” model, where structure is not strictly enforced by the database but is often maintained through application-level conventions¹⁹. MongoDB, for instance, acknowledges the presence of “implicit schemas”, where documents generally follow a predictable format to ensure data consistency across an application¹⁹.

While this flexibility makes NoSQL databases well-suited for applications with evolving data models—such as electronic case report forms (eCRFs) in vaccine studies—best practices in NoSQL schema design often encourage some level of structural consistency to facilitate querying and indexing. Thus, rather than a binary distinction between “strict” and “schema-less” databases, it is more accurate to view schema enforcement as a spectrum, where relational databases offer rigid enforcement, NoSQL databases allow flexible structures, and real-world implementations often balance structure with adaptability.

Despite these improvements, NoSQL document-based databases remain particularly advantageous for use cases like eCRFs, where data structures frequently change, and strict schemas can hinder adaptability. Unlike relational databases that still require schema updates for structural modifications, document-based NoSQL systems such as MongoDB inherently support schema evolution, allowing attributes to be added or modified without affecting existing records. Additionally, NoSQL databases optimize storage for hierarchical and nested data, reducing the need for complex joins and improving query efficiency. Given the unpredictable nature of vaccine adverse effects and the necessity for a scalable, adaptable reporting system, a document-oriented NoSQL model provides a more natural fit for the eCRF system developed in this study.

Regarding this fact that adverse effects of vaccines may change over time, the presented NoSQL document-based eCRF aims to resolve the defect of conventional eCRFs that are based on the structured (i.e., the Relational) data model and provide a better support of schema evolution. Unlike traditional relational databases that enforce strict schemas, NoSQL databases such as MongoDB supports schema evolution through flexible document structures. Each document within a collection can have a different structure, allowing new attributes to be introduced without requiring schema migrations, without affecting existing records or downtime.

Data storage structure

The next step will focus on developing the system based on the first three steps of the architecture. Initially, the structure of the repository was implemented by examining each of the entities and their required data, utilizing a storage approach in the form of JSON documents. Although JSON originated from JavaScript, Node.js was not specifically built for storing data in JSON format. However, its JavaScript foundation makes it particularly efficient in handling JSON, similar to other languages like Python, which also manage JSON data effectively. IN this data model, the relationship between entities is established through references to information within the documents. In other words, each entity can establish its connections with other entities by specifying identifiers or shared data within the documents. This approach provides flexible connections and allows us to easily relate relevant information without the need to define complex relationships and intermediary tables. For example, in Fig. 3, the entity “User” is related to the entity “Access Levels,” such that each user can belong to one or more access levels (such as admin, disease specialist, etc.). In this case, a field like “accessibility” is created in the “User” document to store the identifiers of the access levels related to each user. Similarly, the entity “Health History” is related to the entity “Patient,” so that each health history belongs to a patient. In this case, we can create a field like “healthHistory” in the “Patient” document that refers to the identifier of the related health history. With this approach, the relationships between entities are established in a simple and understandable manner, allowing us to easily access related information from one entity to another.

According to the sample document provided in Fig. 4, a unique identifier was created for each patient, which serves as the primary key in the database. This identifier allows us to store all patient information in a dedicated document and ensures that the patient’s information can be retrieved separately and accurately. Since the data is stored in a document-oriented manner, each document can include all relevant information about the patient, including medical history, demographic information, vaccination status, and side effects, without the need for complex and multi-table structures. Additionally, where necessary, we utilized MongoDB’s built-in validation capabilities based on JSON Schema to programmatically enforce and ensure the integrity of data structures.

To enhance accuracy and improve communication, a unique user identifier (patient) is added to each document. This bidirectional connection allows us to retrieve patient information based on their identifier and manage the general information of system users. In the next section, the demographic information of the patient, including date of birth, gender, education level, employment status, economic status, place of residence, and contact information, is stored in the system. This information helps us identify the social and individual status of the patient. Additionally, information related to the patient’s health history, including chronic diseases, results of various tests (such as PCR tests), and allergies, has been recorded.

Vaccination and side effects data

The most important part of this model is the information related to vaccination and side effects. For each vaccine, details such as the name of the vaccine, the date of vaccination, the date of onset of side effects, the vaccine dose, the type of side effect, and the age of the patient at the time of vaccination are recorded. In addition, the patient’s health indicators, such as weight, height, body mass index (BMI), and specific conditions such as smoking status and pregnancy/menstrual status (for women) are also considered for a more thorough examination of reactions to vaccines.

The reason for collecting this information in a document-oriented data model is to enable us to systematically and accurately record and track all aspects of health and side effects related to each patient. By using this model, the data is stored in a completely disaggregated and interrelated manner, allowing us to thoroughly evaluate and analyze the vaccination process and its side effects. This approach not only facilitates the examination of individual reactions to vaccines but can also assist in identifying general patterns and side effects based on specific individual characteristics. This information will be highly beneficial for researchers, physicians, and health decision-makers, leading to improved quality of healthcare and vaccination at the community level.

Data access restrictions

In the implemented system,patients—whether self-registered or registered by authorized healthcare providers—are granted highly restricted access by default. Their permissions are limited primarily to submitting personal health-related reports or forms. For other user categories, such as healthcare providers with different roles, access levels are predefined based on the minimum necessary access principle. These restrictions are role-based and fixed, ensuring that each user can only access the information relevant to their responsibilities. Any modification to these access levels can only be carried out by the system administrator, ensuring that any changes are controlled, auditable, and aligned with security policies. This approach minimizes potential security risks by avoiding unnecessary access grants and maintaining strict role-based access control (RBAC) policies.

Develop system

After determining the storage approach and with the help of the initial design created using the Figma web application, the implementation of the user interface proceeded with the creation of pages and functionalities. In this context, the pages for user account creation, patient registration, and login were implemented as shown in (Appendix A Figs. 6, 7 and 8).

A login page was designed for both groups: healthcare providers and patients (Appendix A Fig. 6). If a patient has not previously registered, they or their companion can initiate the initial registration for self-reporting the incident and then enter the other required information. Otherwise, the patient can visit healthcare service centers and provide information to the healthcare provider to register and complete the incident reporting process (Appendix A Fig. 7). For healthcare providers, in addition to the registration process, the site administrator is responsible for enforcing access restrictions for each user based on their respective roles (Appendix A Fig. 8). Furthermore, by applying restrictions on data entry in each of the designed fields, such as the character length of the national ID, allowed characters, and verification of entered numbers, we aimed to prevent the entry of incorrect data by issuing warnings with appropriate messages.

Then, the main page was implemented for all users to manage the system and access the features designated for each role, as shown in (Appendix A Fig. 9). In addition, capabilities for providing quick statistical reports have been created for all service providers.

(Appendix A Fig. 10) is a page where users will encounter a list of registered patients within their area of activity. They will have access to a search filter for quick patient searches using national ID numbers, first names, and last names. If a patient’s registration has been left incomplete, it will be clearly marked as “incomplete” in the list. For each patient in the list, a button will be displayed to continue the patient’s registration. To enhance the speed of the workflow, features such as deleting a patient, editing demographic information, recording vaccine side effects, and registering/editing the patient’s history have been included, taking into account the limitations of each role.

When the user proceeds to enter the patient’s additional information and subsequently the individual’s health history, a new vaccine side effects report page is generated with a unique global identifier. By directing the user to this page, relevant information regarding vaccine side effects can be recorded (Appendix A Fig. 11). Additionally, to prevent the entry of incorrect data for the vaccination date and the date of the side effect, these dates will be reviewed to ensure that the vaccination date is not after the date of the side effect. Furthermore, vaccine doses will be limited and checked to avoid the entry of incorrect or duplicate doses. The Body Mass Index (BMI) will be calculated based on specified height and weight values, which have defined ranges. If a side effect is being recorded for the second time, these values will be automatically retrieved from the previous report form. Finally, the options for pregnancy status and menstrual cycle status for women will not be displayed to men.

Reporting module and access filtering

One of the important components of the reporting system is the ability to generate reports from the entered data in a completely flexible manner for customization with the created filters for reporting to stakeholders. These reports are generated based on the complex command capabilities provided by the key features of document-based databases. (Appendix A Fig. 12) displays various sections of the created filters that users can apply to generate their reports. However, by default, additional filters will be applied to maintain the confidentiality of patient records and control user access based on their roles compared to their colleagues, which will result in reports being generated by the name of the village, health center, county, and province.

The final output will be a PDF file corresponding to (Appendix A Fig. 13), generated with the applied filters for the report. These reports will be archived as a list and will be accessible separately for each healthcare provider on the user’s dashboard.

Development tools and framework

The system was developed entirely using Visual Studio Code version 1.85.0 and the Node.js programming language version 18.15.0, which is a compatible, comprehensive, and efficient language aimed at storing data in JSON format. Additionally, to store diverse data related to the variant strains of the virus concerning vaccine side effect reports, a document-based data model provided by the MongoDB database management system was utilized. In order to establish connections and send the received data to the repository and perform related operations, we employed the mongoose module version 6.6.2 in the system design. This research utilized various modules such as axios, bcryptjs, express-session, and other tools for system development. The use of these tools was aimed at ensuring security, managing complex data, processing forms, displaying statistics, generating reports, and performing necessary operations.

Analysis and evaluation of the proposed eCRF system

Essentially, evaluation of a software is the third step on the software development process in Software engineering methodology after the steps requirement engineering, and development (i.e., design and implementation), In this section, as the next step, a comprehensive evaluation and analysis in terms of different important quality attributes (i.e., non-functional requirements) of the proposed eCRF system as a software system is presented.

phase IV: Evaluation

Evaluation of a software system is known as a Verification and Validation (a.k.a. V&V) process. Evaluation of the proposed eCRF system as a software system is also performed both as the verification (i.e., w.r.t. the system’s requirements) and validation (i.e., users’ experience evaluation) as reported and discussed here.

User experience evaluation using the UEQ

The evaluation of the software is conducted using usability testing methods, based on standards and criteria from the User Experience Questionnaire (UEQ), which is utilized in this study to collect empirical data regarding participants’ perceptions and to compare their experiences in order to examine aspects of classical usability and user experience in the reporting system. Participants were experts in medical Informatics and related disciplines (they were not patients; so no patient information gathering happened during this study). The UEQ measures six dimensions of user experience, which include: attractiveness, clarity, efficiency, reliability, stimulation, and novelty²⁰. The total number of attributes present in each of these dimensions in the UEQ reaches 26 pairs of opposing attributes designed to capture various aspects of user experience. These dimensions are assessed using bipolar adjectives, such as “attractive” versus “unattractive” or “stimulating” versus “boring,” on a scale from 1 (completely disagree) to 7 (completely agree). The tool was prepared in Google Forms, converted, and distributed among a group of users who have utilized the reporting system. Participants in this research completed the questionnaire based on a seven-point Likert scale. The scores for each of the six scales were averaged and compared using a standard metric, and charts were created to visualize the results. The sample for the study was recruited through purposive sampling, freely and easily accessible at Mazandaran University of Medical Sciences (Sari County Health Center), to evaluate the usability from the participants’ perspective. The data obtained from the UEQ were then analyzed to determine the average evaluations for each scale. The averages presented for each scale reflect the degree of users’ evaluations in that specific domain. The findings were obtained through the completion of questionnaires, in which each scale of user experience was assigned a weight. This weighting was based on a scale from − 3 (very bad) to + 3 (excellent), indicating the acceptance and popularity of each user experience scale with the reporting system in question.

Performance comparison between relational and non-relational data models

In the subsequent evaluation, the performance of two data models, SQL and NoSQL, was compared using the document-based data model provided by the MongoDB database management system and the relational data model provided by SQL Server 2019 (v15.0.2000.5). To evaluate changes in system performance, insertion operations were repeatedly executed with different batch sizes of 100, 1,000, 10,000, and 25,000 record. These datasets were inserted into each database to measure the insertion speed and the response time to the generated queries. Additionally, resets were performed between tests to clear caches, as recommended in²², ensuring accurate and consistent benchmarking result. Direct SQL queries were used to access the SQL Server database. Initially, data were read from a CSV file using the BULK INSERT command, and then inserted directly into the SQL Server database through SQL statements. This approach relies on raw SQL queries for data operations rather than using an Object-Relational Mapping (ORM) tool such as SQLAlchemy or Django ORM. While using an ORM simplifies database management by abstracting much of the underlying complexity, it’s crucial to monitor performance and apply direct optimizations, when necessary, as this abstraction can sometimes result in inefficiencies without fine-tuning. To generate the necessary data, considering the existing limitations and the lack of connection to electronic health records, and consequently the unavailability of data related to individuals’ identities, health histories, and their allergic reactions post-vaccination, data was produced with the help of the site mockaroo.com.

Benchmark environment configurations and results

Benchmarking was performed on a machine running Microsoft Windows 10 Enterprise, Version 22H2 (OS Build 19045.5487). Both the benchmarking client and the database server were installed and executed on the same machine, with no external client involved. The execution was done sequentially, without multi-client parallel execution. This configuration ensured that any performance differences were a result of the data models themselves and not influenced by external factors such as network latency or client-side processing.

This research examines a system for reporting vaccine side effects based on the data model of the sender. The main goal of designing the system is to accurately collect data related to vaccine side effects, reduce data entry errors, and enable rapid adaptation to informational changes resulting from virus mutations. The proposed system includes the following modules:

Information and Side Effects Collection: The ability for users to register identity information, health history, and vaccine side effects.
Document Management: Utilizing the JSON format for data storage and management to provide flexibility and reduce storage complexities.
Report Generation: The ability to extract data for research analysis and a more detailed examination of vaccine side effects.

Analysis of needs assessment and planning

Details of the methods and results of the literature review, as well as the process of selecting and validating features, had been previously published¹². This research collected system requirements using three methods: observation, reflection, and questionnaires. During this phase, similar systems such as the Integrated Health System (Sib), Electronic Record of Health Events and Referrals(Parsa), Health Information Software (Nab), and the Research System on COVID-19 Vaccine Adverse Effects in Kermanshah were examined and studied. Functional and non-functional requirements were categorized in standard SRS tables. Using eCRF standards and background studies, the minimum necessary dataset was identified, including demographic information, health history, and details of vaccine adverse effects.

Identification and validation of functional system features

Based on expert opinions, 36 relevant features out of 48 were approved. These features include: vaccine name, injection date, date of adverse effect occurrence, type of adverse effect, dose, history of vaccine reaction, smoking history, chronic disease history, history of COVID-19 infection, pregnancy status, menstrual status, employment status, emergency contact number, mobile phone number, telephone number, national ID, age, weight, height, BMI, gender, and date of birth. Management of staff accounts, management of patient accounts, management of service provider centers, management of case report forms, management of vaccines, management of health history, access restrictions, search functions, numerical and graphical indicators, report generation with applied filters, alerts, information retrieval, and setting target ranges.

Comparison of similar platforms and their reporting limitations

The systems SIB, Parsa, and NAB were created as web-based platforms, but the research system in Kermanshah was used with a desktop application designed for offline use. As a result, the identity information of individuals was not verified. Additionally, there was no self-reporting capability for patients or a registration feature for health caregivers, who were directly involved and received the most reports of vaccine side effects. There were also limitations in access for experts to the entered data and in generating reports. Users were unable to correct the entered data, and this capability was only available to managers in higher domains. However, the ability to enter identity information, medical history, and vaccine side effects was granted to users. In the Parsa and SIB systems, health caregivers (Behvarz) were able to report vaccine side effects like disease experts. However, they were not given the ability to access, generate reports, edit, or delete recorded side effects within their covered population; these actions could only be performed by higher-level experts from health treatment centers. Nevertheless, they were not granted the ability to delete or edit, and only an individual at the county level had access to these capabilities.

System services, stakeholders, and design assumptions

In describing the case reporting system, the services and stakeholders of the system, as well as the identified limitations and assumptions, are initially outlined. The services of the system are categorized into six groups: creating user accounts, registering patients, creating case report forms, reporting, managing users, and recording vaccine side effects. The stakeholders of the system include patients, health caregivers (such as health workers and disease specialists), executive managers, EHR systems, laboratory research, and pharmaceutical companies. The existing limitations in the design of the system include the lack of access to electronic health records, electronic medical records, the civil registration system, and mobile operator systems for retrieving patient information and verifying their identities. It is assumed that if the case reporting system is developed, these accesses will be provided.

Discrepancies

One of the important aspects of the system that was considered by the design team during the needs assessment was the focus on error detection and improving the quality of input data. To this end, a system was designed that included forms with defined fields and precise validation. For example:

The fields for height, weight, age, BMI, and number of vaccine injections had data entry restrictions to ensure that only logical and valid values were accepted.
The vaccination date cannot be recorded after the date of the adverse event.
Options related to pregnancy status and menstrual cycle status were designed to be displayed only for the female gender, while these fields were disabled for males.
The fields for contact number and national ID had character length restrictions and character type restrictions (numeric only), and entering this information was essential for completing the form.
The fields for first name and last name accepted only permitted characters and prevented the entry of unauthorized characters.

These measures not only helped reduce errors caused by incorrect data entry but also increased the assurance of data accuracy throughout the system. Additionally, this design aimed to simplify data entry for users and prevent mistakes in later stages of data development and analysis.

Moreover, “Vaccine Adverse Events” section which includes a list of common and predictable adverse events is addressed. Users can select one or more of these adverse events through dedicated checkboxes. However, if an adverse event occurs outside the list of anticipated adverse events, the user can access a text field specifically for recording uncommon adverse events by selecting the “Other” option. Also, to facilitate the recording of multiple unforeseen adverse events, this field has been designed so that the user can enter each adverse event and record it as an independent tag by pressing the Tab key. This feature allows for accurate and systematic separation of uncommon adverse events and helps increase flexibility in collecting diverse and unstructured data.

Usability evaluation

During the research evaluation phase, a test scenario was implemented in the reporting system. Valuable feedback and suggestions for system improvement were provided by users in a brainstorming session. Among the suggestions were the removal or insensitivity to the completion of proxy fields, postal codes, and patients’ heights. Subsequently, the electronic reporting system was tested in a usability evaluation by healthcare providers and individuals experiencing vaccine-related complications. Out of 19 participants (Table 1.) who worked with the system for 10 days during this evaluation, 12 were female and 7 were male. Additionally, among all participants, 2 were aged between 18 and 25 years (2/19, 10.5%) and 17 were over 25 years old (17/19, 89.5%).

Table 1 Characteristics of participants in the evaluation (number = 19).

Full size table

Results of user experience evaluation

The analysis of the responses showed that the average score for all scales ranged from 1.5 to 3 (Table 2). Specifically, the average scores were 2.2 for attractiveness, 2.6 for clarity, 2.4 for efficiency, 2.5 for reliability, 2.3 for motivation, and 1.9 for innovation.

Table 2 Results of the user experience questionnaire completed by participants (n = 19).

Full size table

Table 2 categorizes the interests of users of the case reporting system based on pragmatic quality (transparency, efficiency, reliability) and hedonic quality (motivation and innovation), showing that the highest median relates to transparency (2.56) and the lowest to innovation (1.94). Respondents’ answers regarding reliability indicate that the case reporting system is simple and trustworthy. It also provides guaranteed results and is very useful for real-time sharing. Despite the transparency rating (2.56), after using the case reporting system, users felt that the concept of the requested data was clear during use, and after one training session on how to operate the system, there was no further need for training or guidance. This indicates that a user can easily perform the necessary operations after a break in using the system by simply restarting their work with it. This is because, during the design of this system, efforts were made to ensure that features and modules were arranged simply and clearly, with many explanations provided in alerts, especially when entering incorrect or out-of-range data.

In contrast, the lowest median related to innovation is due to the study’s objective, which was to utilize the document-based data model and replace it with older data models that seemingly did not differ much from similar systems in this field. As a result, the advantages of using this model were not tangible for users. Meanwhile, other capabilities, such as generating reports with appropriate filters, producing report outputs in PDF format, and self-reporting for patients, appeared to be the most prominent features from the participants’ perspective in this scale.

The results obtained from the evaluation of the reporting system were compared with a reference dataset from previous studies (Fig. 5). This reference dataset includes information collected from approximately 21,000 individuals across 468 different studies on various products. This comparison allowed for conclusions to be drawn regarding the relative quality of the evaluated reporting system in comparison to other products. Scores for each scale were provided for awareness, as we aimed for this system to offer transparency and clarity to users as much as possible. Our goal was to design it to be user-friendly by reducing complexity and using appropriate colors, so that, while being simple, the time spent on processes would be minimized. Additionally, many explanations were provided, along with alerts for incorrect and out-of-range data entries. One of the reasons for the high reliability assessment for healthcare providers could be the ability to view and access data online and instantly, with access restrictions and hidden views based on each individual’s role and job category. This allowed them to track vaccination follow-ups in the event of serious adverse occurrences by reviewing and analyzing the obtained data and to immediately implement their intervention policies. Typically, only senior management had the capability to view the reports.

Security assessment

In order to evaluate the risk level of the vulnerability of the reporting system, a checklist for the security of the reporting system was provided to a developer with 18 years of experience in web system development(see Appendix B Table 5). They were asked to assess the components by reviewing the system’s code and working with the system. Based on the review conducted using the checklist in accordance with the ASVS standard, it is determined that the created system is generally compliant with the required security standards. This system meets the needs of the ASVS standard not only in terms of stability and optimal performance but also in terms of protecting information and privacy. Therefore, it can be concluded that the system operates at an acceptable level of security and can serve as a standard foundation for future projects.

Security assessment of the system using the asvs standard

In order to assess the risk level of the vulnerability of the reporting system, the Application Security Verification Standard (ASVS) has been utilized, which serves as the basis for the initial plan to create a secure coding checklist specific to the system. This open-source standard was first introduced²¹ and has since been developed and made available to the public with the collaboration of contributors for updated versions. Therefore, aligning ASVS with the product enhances the focus on the security requirements that are important for the reporting system²¹.

Evaluating the efficiency of the document-based data model

In this study, the performance of the MongoDB database system was evaluated in comparison to SQL Server by performing two main operations, insertion and search, with a large number of records. The specified operations were carried out in both databases, and the results were displayed through charts based on the data obtained from insertion and search.

Insertion operation: In analyzing the results obtained from the insertion operation, it was clearly evident that for data with identical fields and simple insertion operations, SQL Server required more time compared to MongoDB. As shown in (Appendix A Fig. 14), this cost becomes significantly more pronounced with an increase in the number of records. This cost in SQL Server is due to its relational structure and database transactions, which typically introduce the most complexity during insertion operations. In contrast, while the insertion time in MongoDB increases with the number of records, the pattern of this increase is much more proportional to the number of records, indicating potential performance improvement.
Search operations: In search operations, both SQL Server and MongoDB behave similarly, with the execution time of search operations increasing as the number of records grows. This indicates that both databases face challenges with increasing data volumes, leading to longer execution times for operations. However, despite this similarity, there are also differences in the speed of search operations between these two databases, as shown in the chart in (Appendix A Fig. 15). In MongoDB, due to its NoSQL structure and the use of indexes, search operations can be optimized on a larger scale and may perform better than in SQL Server. These results and differences are crucial for selecting the most suitable database for any specific project or scenario.

We therefore emphasize that both SQL and NoSQL DBMS were evaluated using default settings without specific hardware optimizations. No explicit caching mechanisms beyond the DBMS’s native caching behavior were employed. The use of optimizers could have influenced the final outcome of the database performance evaluation process. However, implementing such enhancements requires additional research processes, which have been extensively addressed in other studies. Therefore, in this study, we utilized only a primary key for both databases, aiming to assess insertion and query speed under equal conditions and without performance-enhancing mechanisms. Although indexing could potentially improve query response times in relational DBMSs, it would, on the other hand, reduce insertion speed due to the overhead of updating indexes with each insertion. Searches were performed using the standard query capabilities of both MongoDB and SQL, without employing advanced full-text search features like n-grams or Okapi BM25. Future evaluations may include these algorithms to further optimize search performance.

Supporting quality attributes of the presented eCRF

Each software system should provide the application-dependent functionalities and services, and a proper level of quality attributes (e.g., performance, scalability, usability, availability, security, maintainability, etc.). Some of such quality attributes which seems to be more important for the presented system to be applicable, are addressed here.

Performance

Longitudinal studies where patient records evolve (such as the Cohort studies) is important for scientists, pharmaceutical companies and governmental bodies. The NoSQL document-based model inherently and effectively supports longitudinal studies as documents can be updated dynamically to accommodate evolving patient records without rigid schema constraints as well as data tracking support by allowing nested documents and array structures for storing patient follow-ups. Also, version control mechanisms in eCRF software ensure that changes in patient data over time are retained without overwriting historical records. The system utilizes timestamps and metadata tagging to facilitate temporal queries and trend analysis across different time points.

Case study example: Consider a study tracking COVID-19 vaccine adverse effects over two years. A patient may receive multiple vaccine doses, each potentially leading to different side effects over time. Using a document-based model: Each patient record is stored as a single document, where adverse event data can be added dynamically under a “sideEffects” array. Time-stamped updates allow retrieval of patient history at different points in time.

Instead of requiring multiple relational tables and complex JOIN queries, a single query can fetch the patient’s full vaccination history and side effects. This structure allows researchers to track long-term effects, analyze trends, and perform predictive modeling based on evolving data. Version control mechanisms ensure that changes in patient data over time are retained without overwriting historical records. The system utilizes timestamps and metadata tagging to facilitate temporal queries and trend analysis across different time points.

Performance is definitely an important issue especially in studies with huge amount of data (e.g., longitudinal studies throughout a country or continent). In our implementation, the SQL database was accessed using direct SQL queries rather than an Object-Relational Mapping (ORM) framework. This decision was made to optimize performance and maintain precise control over query execution, particularly for complex queries involving multiple joins and indexing strategies. Given the document-based NoSQL model used in the presented eCRF (MongoDB as the database engine or datastore), indexing and retrieval speed have been addressed through the following mechanisms:

Use of the compound and dynamic indexing: The system employs compound indexes on frequently queried fields such as patient ID, vaccine type, date of report, and adverse event category. In addition, dynamic indexing is applied based on user queries to optimize search performance.
Schema-less flexibility with optimized access paths: Since the data model is document-based (e.g., JSON/BSON documents), the system can efficiently retrieve nested or unstructured data without requiring costly joins or schema validations, which improves speed especially for heterogeneous case report forms.
Sharding for scalability.

Scalability

To handle the huge amount of data (e.g., millions of case reports) and scale the system in size, horizontal Scalability (i.e., scaling out) via Sharding (built-in mechanism in MongoDB) could be used in deployment of the presented eCRF. Sharding allows data to be distributed across multiple servers, enabling parallel read and write operations that significantly reduce latency in large-scale datasets. Also, load balancing strategies such as read and write separation (e.g., directing read queries to secondary nodes) are employed to optimize system performance.

A shard key is chosen to evenly distribute data (e.g., patient ID, timestamp). Shard clusters consist of multiple nodes, each responsible for a subset of the data. The Mongos query router directs queries to the appropriate shard, ensuring optimal retrieval performance. Automatic balancing mechanisms redistribute data as new nodes are added, preventing bottlenecks.

Moreover, replication is used to ensure high Availability and disaster recovery.

Security analysis

Security is critical, especially for clinical data management. While ensuring full compliance with HIPAA, GDPR, and other clinical data protection regulations was not the primary focus in the prototype development phase, we have incorporated several privacy and security measures. However, we acknowledge that certain areas require enhancement, particularly in data encryption at rest and multi-factor authentication. The following table provides an illustrative overview of the system’s current security features and areas for improvement (Table 3).

Table 3 Status of security related mechanism in the presented eCRF.

Full size table

Discussion and principal findings

The software introduced in this study is web-based and utilizes a document-based data model for data storage. This web application, in addition to leveraging the advantages of document-based data models such as high search and insertion speeds, and consequently faster response times to user requests compared to relational data models, can facilitate patient registration, manage data access for users, reduce the entry of erroneous data such as height, weight, and permissible age range, and prevent the use of invalid characters during registration. It also allows for patient self-reporting for registration and reporting vaccine side effects, user management, management of case report forms, automatic calculation of body mass index, identification and prevention of duplicate dose entries, recognition of non-compliance with injection dates relative to the date of vaccine reaction, generation of customized reports with access restrictions defined for each role, and reduction of costs associated with system updates considering potential unforeseen vaccine side effects and its practical adaptation for conducting other pharmaceutical studies.

Unlike other vaccine adverse event reporting systems in Iran, this system has been designed and implemented using case report forms and a document-based data model to reduce the costs of system updates and to adapt it for conducting drug adverse event research.

Trust in vaccines is vital and fundamentally depends on the ability of governments to communicate the benefits of vaccination and to provide vaccines that are safe and effective²³. Therefore, in order to inform and classify these events, individuals are asked to report any adverse effects they experience after vaccination in a system or through CRF forms²⁴. The case reporting system has been implemented in such a way that stakeholders in government organizations can easily track the safety of vaccines used nationwide, with accessible registration for everyone and ease of reporting even mild adverse events, allowing for self-registration and self-reporting by the general public. In the research by Rouri and colleagues, the need for examining and designing electronic case report forms (eCRF) and collecting electronic data, particularly in pharmaceuticals and clinical research, has been addressed²⁵. Their findings highlight the importance of tailored eCRF design to ensure data quality and emphasize the challenges of integrating such systems into existing research infrastructures.

In designing the electronic case report form, attention must be paid to the importance of data integrity and responsiveness to all user needs. These users may include researchers, site coordinators, study monitors, data entry personnel, medical coders, and standard statisticians²⁶. Therefore, data should be organized in a format that facilitates easy and straightforward data analysis²⁶. The capability for data integration in this research is provided through the use of a document-based data model, focusing on storing all input data of an individual in a document with a unique identifier. In the research by Youn and colleagues, the need for a medical research framework based on a NoSQL DBMS was proposed²⁷. Consequently, the use of the document-based data model was employed in the final product of this research, following an examination and comparison of various approaches in NoSQL data models and the observed advantages of the document-based model, aligning it with the nature of case report forms and considering the potential unknown side effects of vaccines.

In the study by N-Lordache, it was noted that by utilizing the electronic case report form, more accurate data could be collected while ensuring the privacy of patients. This approach is also expected to lead to reduced costs, increased efficiency and accuracy in data collection, and further support the protection of patient privacy⁹. The findings of these researchers confirm similar capabilities that have been designed and implemented in the case reporting system developed from this research.

In the implemented system, patients (whether self-registered or registered by authorized healthcare providers) are granted highly restricted access by default. Their permissions are limited primarily to submitting personal health-related reports or forms.

In contrast, access levels for other user categories (i.e., healthcare providers with different roles) are predefined based on the minimum necessary access principle. These restrictions are role-based and fixed, ensuring that users can only access the information relevant to their responsibilities.

Any modification to these access levels can only be carried out by the system administrator, ensuring controlled and auditable changes in user privileges. This approach mitigates potential security risks by avoiding unnecessary access grants and maintaining strict role-based access control (RBAC) policies.

To have a more thorough comparison of the presented eCRF with existing structured, and hybrid eCRF systems, HL7 or FHIR²⁸-based solutions, some major aspect and features should be addressed (Table 4).

NoSQL approaches provide flexibility in data management compared to the conventional structured data model; whilst FHIR-based solutions offer interoperability benefits but introduce complexity in implementation and require predefined schemas, which are less suited for highly variable datasets like vaccine adverse event reporting.

Table 4 Comprehensive comparison between different approaches.

Full size table

According to the table, it can be summarized that:

Structured SQL-Based eCRFs: Best for traditional, well-structured clinical trials but struggles with flexibility and schema evolution.
Hybrid SQL + NoSQL eCRFs: Offers a balance between structure and flexibility but increases complexity.
Proposed NoSQL eCRF: Excels in handling unstructured and variable data, making it ideal for adverse event tracking and evolving clinical research.
FHIR-Based eCRF: Optimized for modern interoperability but requires learning the FHIR standard.
HL7-Based eCRF: Works well for legacy healthcare integration but lacks flexibility and real-time capabilities.

The limitations of this research include the lack of access to electronic health records, hospital electronic records, and laboratory electronic systems for retrieving the health history of individuals in the community. Additionally, the absence of a requirement for the initial registration of individuals who have prior health, medical, or laboratory information in existing electronic records is noted as one of the primary interactive limitations with other available systems. Furthermore, the lack of access to SMS service provider systems and the civil registration system to ensure the safety and verification of patients’ identity information during self-reporting is another interactive limitation that has been excluded from the implementation in the prototype product due to its financial burden. The inability to utilize the case reporting system to create user profiles and input necessary information for those individuals in the community who have minimal literacy with computer systems and the internet is also considered a limitation of the system.

One of the key advantages of the presented NoSQL eCRF system is its schema flexibility, allowing seamless adaptation to evolving clinical trial requirements. However, we recognize that a lack of predefined schema may present onboarding challenges for new developers.

The most significant strength of this research lies in the design and implementation of an efficient reporting system accessible to everyone, anytime and anywhere, for recording vaccine side effects and generating reports for stakeholders. This aims to assess the safety of vaccines used in the country and assist in reducing the costs associated with changes in the system, while considering the potential for reuse in other clinical studies related to drug side effects and the possible need for system updates in light of emerging unknown vaccine side effects in new strains of the disease.

The flexibility to generalize the presented system to other vaccine studies or drug trials may be another concern. The presented system is designed with flexibility in mind, allowing it to be applied to other vaccine studies and drug trials. For example, the modular system architecture ensures that new variables, entities, and workflows can be incorporated with minimal disruption and effort. Moreover, NoSQL data model claim to be schema-free and they will easily support changes in needs; so, the NoSQL document-oriented database structure allows the storage of heterogeneous data types without predefined schemas and allows flexibility in capturing diverse data points, accommodating different study requirements without major redesigns. Therefore, by modifying the data collection and validation layers, the presented eCRF system can be easily generalized to accommodate other vaccine or pharmaceutical trials and studies. One of the use-cases and applications of eCRF systems, besides the approval of the drug or vaccine throughout the corresponding studies, is research in medical or public health fields by researcher and scientist. Emerging methods, techniques and tools in AI (e.g., ML/DL) has added up the need for such data gathering platforms. There is some references of such studies^{29,30,31,32,33,34,35}.

Conclusion and future directions

In this research, the design and evaluation of a web-based adverse event reporting system was conducted to collect the minimum required dataset of vaccine-related adverse events for stakeholders. This system, like other similar systems, is not solely focused on the collection of vaccine adverse events; rather, it employs a document-based data model aimed at enhancing efficiency and response speed to user requests while reducing costs associated with reuse in similar clinical research for drug safety assessments. For the primary stakeholders of the system, it can effectively reduce data entry errors and generate reports with diverse filters. With the help of this practical system, users can register relevant adverse event information in the database without needing to visit in person. The level of satisfaction and usability of the designed reporting system in this research was evaluated as excellent. Although positive evaluation criteria were observed, the limited sample size does not allow for definitive conclusions to be drawn from the findings of this study. Therefore, to inform the design of clinical trials in this area, a deeper exploratory analysis of usability issues is required. It is also expected that, with an innovative approach to the document-based reporting system, a solution will emerge to reduce the costs of system reuse and increase the security and response speed for users identifying drug safety, leading to sustainable effects. Based on the findings of the present research, it is recommended that, considering the system’s need for patient identity information, capabilities for integration with electronic health record systems, electronic medical records, SMS services, mobile operators, and inquiries from the Civil Registration Organization be added to the reporting system in future studies.

The document-based data model in this system acts as an agile and flexible tool for collecting information from multiple sources. By utilizing this model, updated information is made continuously and readily available in real-time. Additionally, this system, when developed, facilitates the establishment of direct and two-way communications between data sources, enhancing responsiveness to changes and sudden events related to vaccine side effects.

Data availability

No datasets were generated or analyzed during the current study.

References

WHO Naming the coronavirus disease (COVID-19) and the virus that causes it. World Health Organization. 2020. (2020).
Co-operation, O. E. & Development Enhancing Public Trust in COVID-19 Vaccination: the Role of Governments (OECD Publishing, 2021).
Ella, R. et al. A phase 1: safety and immunogenicity trial of an inactivated SARS-CoV-2 vaccine-BBV152. MedRxiv 12, 2020.12.11.20210419 (2020).
Folegatti, P. M. et al. Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: a preliminary report of a phase 1/2, single-blind, randomised controlled trial. Lancet 396(10249), 467–478 (2020).

Article
CAS

Google Scholar
Jackson, L. A. et al. An mRNA vaccine against SARS-CoV-2—preliminary report. N. Engl. J. Med. 383(20), 1920–1931 (2020).

Article
CAS

Google Scholar
RAPS. CJ. COVID-19 vaccine tracker. [Last accessed on 23 Jan 2021]. [cited 2021; (2021). Available from: https://www.raps.org/news-and-articles/news-articles/2020/3/covid-19-vaccinetracker
Nahm, M. et al. Design and implementation of an institutional case report form library. Clin. Trails. 8(1), 94–102 (2011).

Article

Google Scholar
Thwin, S. S. et al. Automated inter-rater reliability assessment and electronic data collection in a multi-center breast cancer study. BMC Med. Res. Methodol. 7(1), 1–8 (2007).

Article

Google Scholar
Ene-Iordache, B. et al. Developing regulatory-compliant electronic case report forms for clinical trials: experience with the demand trial. J. Am. Med. Inform. Assoc. 16(3), 404–408 (2009).

Article

Google Scholar
Hamrell, M. R. & Mathieu, M. P. Good Clinical Practice: A Question & Answer Reference Guide (Barnett Educational Services, 2015).
Browner, W. S. et al. Designing Clinical Research (Lippincott Williams & Wilkins, 2022).
Hamzeh, N. K. S., Asghar, S. A. & Hoorieh, S. Conceptual Model and Requirements of An Electronic Case Report Form to Collect a Minimum Data Set of Complications of the COVID-19 Vaccine. (2023).
Ammar, H. H., Abdelmoez, W. & Hamdi, M. S. Software engineering using artificial intelligence techniques: Current state and open problems. in Proceedings of the First Taibah University International Conference on Computing and Information Technology (ICCIT 2012), Al-Madinah Al-Munawwarah, Saudi Arabia. (2012).
Psotka, M. A. et al. Design of a lean case report form for heart failure therapeutic development. JACC: Heart Fail. 7(11), 913–921 (2019).

Google Scholar
Padhy, R. P., Patra, M. R. & Satapathy, S. C. RDBMS to nosql: reviewing some next-generation non-relational database’s. Int. J. Adv. Eng. Sci. Technol. 11(1), 15–30 (2011).

Google Scholar
Scheible, R. PostgREST data provider for React-Admin: bootstrap the creation of user interfaces on top of PostgreSQL databases. Softw. Impacts. 21, 100699 (2024).

Article

Google Scholar
Scheible, R. et al. A Multilingual Browser Platform for Medical Subject Headings, in Informatics and Technology in Clinical Care and Public Healthp. 384–387 (IOS, 2022).
Raphael Scheible, P. S., Yazijy, S., Thomczyk, F., Talpa, R. & Puhl, A. Martin Boeker, A Multilingual Browser Platform for Medical Subject Headings (Studies in Health Technology and Informatics, 2022).
MongoDB. unstructured-data/schemaless. [cited 2025 March 31, 2025]; (2025). Available from: https://www.mongodb.com/resources/basics/unstructured-data/schemaless
Schrepp, M., Hinderks, A. & Thomaschewski, J. Applying the user experience questionnaire (UEQ) in different evaluation scenarios. in Design, User Experience, and Usability. Theories, Methods, and Tools for Designing the User Experience: Third International Conference, DUXU 2014, Held as Part of HCI International 2014, Heraklion, Crete, Greece, June 22–27, Proceedings, Part I 3. 2014. Springer. (2014).
Williams, J. Application Security Verification Standard (ASVS). OWASP. ; (2008). Available from: https://owasp.org/www-project-application-security-verification-standard/
Scheible, R. et al. Integrating Row Level Security in i2b2: Segregation of Medical Records into Data Marts Without Data Replication and Synchronization (JAMIA open, 2023).
Pilichowski, E. et al. Enhancing Public Trust in COVID-19 Vaccination: the Role of Governments (Paris, France, 2021).
Voysey, M. et al. Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in brazil, South africa, and the UK. Lancet 397(10269), 99–111 (2021).

Article
CAS

Google Scholar
Rorie, D. A. et al. Electronic case report forms and electronic data capture within clinical trials and pharmacoepidemiology. Br. J. Clin. Pharmacol. 83(9), 1880–1895 (2017).

Article

Google Scholar
Bellary, S., Krishnankutty, B. & Latha, M. Basics of case report form designing in clinical research. Perspect. Clin. Res. 5(4), 159 (2014).

Article

Google Scholar
Yoon, J. et al. Forensic investigation framework for the document store NoSQL DBMS: MongoDB as a case study. Digit. Invest. 17, 53–65 (2016).

Article

Google Scholar
HL7^®. FHIR is a standard for health care data exchange. [cited 2025 March 31, 2025]; (2025). Available from: https://www.hl7.org/fhir/index.html
Olawade, D. B. et al. Leveraging artificial intelligence in vaccine development: A narrative review. J. Microbiol. Methods 224, 106998 (2024).
Karki, N. et al. Predicting potential SARS-COV-2 drugs—in depth drug database screening using deep neural network framework ssnet, classical virtual screening and Docking. Int. J. Mol. Sci. 22(4), 1573 (2021).

Article
CAS

Google Scholar
Magazzino, C., Mele, M. & Coccia, M. A machine learning algorithm to analyse the effects of vaccination on COVID-19 mortality. Epidemiol. Infect. 150, e168 (2022).

Article

Google Scholar
Bottrighi, A. et al. A machine learning approach for predicting high risk hospitalized patients with COVID-19 SARS-Cov-2. BMC Med. Inf. Decis. Mak. 22(1), 340 (2022).

Article

Google Scholar
Hatmal, M. M. et al. Reported adverse effects and attitudes among Arab populations following COVID-19 vaccination: a large-scale multinational study implementing machine learning tools in predicting post-vaccination adverse effects based on predisposing factors. Vaccines 10(3), 366 (2022).

Article
CAS

Google Scholar
Hatmal, M. M. et al. Side effects and perceptions following COVID-19 vaccination in jordan: a randomized, cross-sectional study implementing machine learning for predicting severity of side effects. Vaccines 9(6), 556 (2021).

Article
CAS

Google Scholar
Syrowatka, A. et al. Key use cases for artificial intelligence to reduce the frequency of adverse drug events: a scoping review. Lancet Digit. Health. 4(2), e137–e148 (2022).

Article
CAS

Google Scholar
Klausen, T. et al. A digital vaccination pass using fast healthcare interoperability resources: A proof of concept. Digital 4(2), 389–409 (2024).

Download references

Acknowledgements

This article is derived from a thesis titled “Design and Evaluation of an eCRF System for COVID-19 Vaccine Studies with a Document-Based Data Model,” submitted for a Master’s degree in Medical Informatics at the Faculty of Medical Sciences, Tarbiat Modares University, Tehran. The authors feel it is necessary to express their gratitude to the health experts of Sari County, Mr. Alireza Abbaspoor and Mr. Shahram Avaze, as well as to Ms. Fatemeh Moshiri, Senior Expert in Medical Informatics, and others who assisted in conducting the research.No external funding was received for the development of this study.

Author information

Authors and Affiliations

Department of Medical Informatics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran

Seyyed Hamzeh Nasiri Khoshroudi & Ali Asghar Safaei
Department of Medical Virology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran

Hoorieh Soleimanjahi

Authors

Seyyed Hamzeh Nasiri Khoshroudi

View author publications

Search author on:PubMed Google Scholar
Ali Asghar Safaei

View author publications

Search author on:PubMed Google Scholar
Hoorieh Soleimanjahi

View author publications

Search author on:PubMed Google Scholar

Contributions

S.H.N.K.: conceptualization of the eCRF structure, literature review, system design and implementation, development of the document-based NoSQL data model, creation of SRS document, system prototyping, implementation of security and usability evaluation, writing the initial draft of the manuscript, data analysis and visualization A. A.S.: research idea and design, overall supervision of the research process, contribution to evaluation methodology, critical revision and editing of the manuscript H.S.: advisory on virology domain, contribution to evaluation methodology, review and refinement of the manuscript.

Corresponding author

Correspondence to
Ali Asghar Safaei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Statement

We confirm that, all procedures were conducted in accordance with relevant ethical guidelines and regulations. Informed written consent was obtained from all participants, and from legal guardians where applicable. The study protocol and all experimental was approved by the Ethics Committee of Tarbiat Modares University (Approval Code: IR.MODARES.REC.1400.333).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Nasiri Khoshroudi, S.H., Safaei, A.A. & Soleimanjahi, H. A NoSQL document based eCRF system for study of vaccines with variable adverse events case study on COVID19 vaccines.
Sci Rep 15, 20453 (2025). https://doi.org/10.1038/s41598-025-05746-y

Download citation

Received: 19 February 2025
Accepted: 04 June 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41598-025-05746-y

Keywords

Uncategorized

Krane Funds Advisors LLC Buys Shares of 1,235 MongoDB, Inc. (NASDAQ:MDB)

MMS • RSS

Krane Funds Advisors LLC purchased a new position in MongoDB, Inc. (NASDAQ:MDB – Free Report) during the first quarter, according to its most recent Form 13F filing with the Securities and Exchange Commission. The firm purchased 1,235 shares of the company’s stock, valued at approximately $217,000.

Other hedge funds have also recently added to or reduced their stakes in the company. HighTower Advisors LLC increased its holdings in shares of MongoDB by 2.0% during the 4th quarter. HighTower Advisors LLC now owns 18,773 shares of the company’s stock worth $4,371,000 after buying an additional 372 shares during the last quarter. Jones Financial Companies Lllp grew its position in MongoDB by 68.0% during the fourth quarter. Jones Financial Companies Lllp now owns 1,020 shares of the company’s stock valued at $237,000 after acquiring an additional 413 shares during the period. Smartleaf Asset Management LLC increased its stake in MongoDB by 56.8% during the fourth quarter. Smartleaf Asset Management LLC now owns 370 shares of the company’s stock worth $87,000 after acquiring an additional 134 shares during the last quarter. 111 Capital purchased a new position in shares of MongoDB in the 4th quarter worth about $390,000. Finally, Park Avenue Securities LLC raised its holdings in shares of MongoDB by 52.6% in the 1st quarter. Park Avenue Securities LLC now owns 2,630 shares of the company’s stock worth $461,000 after purchasing an additional 907 shares during the period. 89.29% of the stock is owned by institutional investors and hedge funds.

Analyst Upgrades and Downgrades

A number of analysts recently issued reports on the stock. Cantor Fitzgerald increased their target price on shares of MongoDB from $252.00 to $271.00 and gave the stock an “overweight” rating in a research report on Thursday, June 5th. Guggenheim lifted their target price on shares of MongoDB from $235.00 to $260.00 and gave the company a “buy” rating in a report on Thursday, June 5th. Citigroup decreased their price target on shares of MongoDB from $430.00 to $330.00 and set a “buy” rating for the company in a research report on Tuesday, April 1st. Barclays raised their price objective on MongoDB from $252.00 to $270.00 and gave the company an “overweight” rating in a research report on Thursday, June 5th. Finally, Royal Bank Of Canada reaffirmed an “outperform” rating and issued a $320.00 target price on shares of MongoDB in a research report on Thursday, June 5th. Eight analysts have rated the stock with a hold rating, twenty-five have given a buy rating and one has issued a strong buy rating to the stock. Based on data from MarketBeat, the company has an average rating of “Moderate Buy” and an average price target of $282.47.

View Our Latest Stock Analysis on MongoDB

MongoDB Stock Down 0.8%

MDB stock traded down $1.61 during mid-day trading on Wednesday, reaching $204.43. The company had a trading volume of 2,169,154 shares, compared to its average volume of 1,958,870. The stock’s 50-day simple moving average is $193.07 and its two-hundred day simple moving average is $216.17. The stock has a market capitalization of $16.70 billion, a P/E ratio of -179.32 and a beta of 1.41. MongoDB, Inc. has a one year low of $140.78 and a one year high of $370.00.

MongoDB (NASDAQ:MDB – Get Free Report) last posted its earnings results on Wednesday, June 4th. The company reported $1.00 earnings per share (EPS) for the quarter, beating analysts’ consensus estimates of $0.65 by $0.35. MongoDB had a negative net margin of 4.09% and a negative return on equity of 3.16%. The firm had revenue of $549.01 million during the quarter, compared to analyst estimates of $527.49 million. During the same period in the previous year, the business posted $0.51 EPS. MongoDB’s revenue was up 21.8% on a year-over-year basis. On average, research analysts forecast that MongoDB, Inc. will post -1.78 EPS for the current fiscal year.

Insider Buying and Selling

In related news, Director Hope F. Cochran sold 1,174 shares of the stock in a transaction dated Tuesday, June 17th. The shares were sold at an average price of $201.08, for a total value of $236,067.92. Following the completion of the transaction, the director owned 21,096 shares in the company, valued at $4,241,983.68. This represents a 5.27% decrease in their position. The sale was disclosed in a filing with the SEC, which is available at this hyperlink. Also, CEO Dev Ittycheria sold 25,005 shares of the firm’s stock in a transaction dated Thursday, June 5th. The shares were sold at an average price of $234.00, for a total transaction of $5,851,170.00. Following the transaction, the chief executive officer directly owned 256,974 shares in the company, valued at approximately $60,131,916. The trade was a 8.87% decrease in their position. The disclosure for this sale can be found here. Insiders sold 28,999 shares of company stock valued at $6,728,127 in the last three months. 3.10% of the stock is currently owned by corporate insiders.

About MongoDB

(Free Report)

MongoDB Trades at a P/S of 7.08X: Should You Still Buy MDB Stock? – sharewise

MMS • RSS

MongoDB MDB shares are trading at a premium, as suggested by a Value Score of F. In terms of the forward 12 months Price/Sales, MDB is currently trading at 7.08X, above the Internet – Software industry’s 5.91X.

We believe MongoDB’s expanding enterprise clientele and AI momentum justify the premium valuation. The company is benefiting from both operational efficiency and an innovative product portfolio. Can these factors that drove upward guidance revision make MDB stock a compelling buy at current levels? Let’s find out.

MDB’s F12M P/S Ratio

Image Source: Zacks Investment Research

MDB Rides on New Launches and Customer Wins

MongoDB shares have gained 8.7% in the past month, outperforming the Zacks Computer and Technology sector’s appreciation of 8.1% and the Zacks Internet – Software industry’s return of 7%. The outperformance has been partly due to a stellar fiscal first-quarter performance with revenues growing year over year, accompanied by raised guidance for fiscal 2026.

MDB’s One-Month Price Performance

Image Source: Zacks Investment Research

MongoDB continues to gain traction even as rivals like Amazon’s AMZN DynamoDB and Couchbase BASE expand their presence. Amazon recently introduced multi-region strong consistency for DynamoDB Global Tables, while Couchbase deepened its India footprint with a new state-of-the-art office. Still, MongoDB added the most number of new customers in over six years in the first quarter of fiscal 2026, with strong enterprise wins like CSX and Zepto, highlighting adoption across industries despite moves by Amazon and Couchbase.

In the reported quarter, MongoDB launched version 8.0, with adoption doubling that of the previous major version. The company advanced its AI capabilities with Voyage 3.5, which improved embedding accuracy and cut storage costs by more than 80%. MongoDB plans to launch in-platform embedding generation soon, further supporting its position as a unified, AI-ready platform.

Revenues grew 22% year over year to $549 million, with Atlas up 26% and operating income more than doubling to $87 million in the fiscal first quarter.

For fiscal 2026, MongoDB now expects revenues in the range of $2.25 billion to $2.29 billion, an increase of $10 million from its prior guidance. Non-GAAP earnings per share are expected to be between $2.94 and $3.12.

The Zacks Consensus Estimate for MongoDB’s fiscal 2026 revenues is pegged at $2.28 billion, indicating growth of 13.48% on a year-over-year basis.

The consensus mark for fiscal 2026 earnings is pinned at $3.03 per share, which has been revised upward by 18.36% over the past 30 days. The estimate indicates a year-over-year decline of 17.21%.

MDB Strengthens AI Edge with Integrated Platform

MongoDB has been steadily benefiting from the rising demand for AI-powered applications. The company’s document model is well-suited for handling fast-changing, unstructured data required by real-world AI. The acquisition of Voyage AI strengthened this capability further, with Voyage 3.5 improving embedding accuracy and cutting storage costs significantly.

MongoDB’s unified platform combines real-time data, retrieval and search, helping developers avoid complex integrations. Companies like LG Uplus are using their AI tools to support thousands of agents with faster, more accurate responses, reinforcing MongoDB’s position in the AI space.

An expanding partner base is noteworthy. MongoDB recently announced backup integration with Rubrik RBRK and Cohesity. These partnerships enhance data protection for enterprise customers, especially in hybrid cloud environments. Positive sentiment around MongoDB’s collaboration with Rubrik and Cohesity, along with broader growth signals, has contributed to the recent share price gains. Rubrik and Cohesity’s involvement reinforces MDB’s focus on enterprise-grade resilience.

Conclusion: Time to Buy MDB Stock?

MongoDB’s shares are undeniably expensive, yet upward estimate revisions and improving profitability give that premium real support. The valuation underscores market confidence in the company’s ability to deliver innovation-led growth, even amid macroeconomic uncertainties. Continued enterprise adoption, product innovation and raised guidance suggest the stock is well-positioned for long-term growth.

MongoDB stock is currently trading above the 50-day moving average, indicating a bullish trend.

MDB’s 50-Day SMA

Image Source: Zacks Investment Research

MongoDB currently has a Zacks Rank #2 (Buy) and a Growth Score of A, a favorable combination that offers a strong investment opportunity, per the Zacks Proprietary methodology. You can see the complete list of today’s Zacks #1 Rank (Strong Buy) stocks here.

Only $1 to See All Zacks’ Buys and Sells

We’re not kidding.

Several years ago, we shocked our members by offering them 30-day access to all our picks for the total sum of only $1. No obligation to spend another cent.

Thousands have taken advantage of this opportunity. Thousands did not – they thought there must be a catch. Yes, we do have a reason. We want you to get acquainted with our portfolio services like Surprise Trader, Stocks Under $10, Technology Innovators,and more, that closed 256 positions with double- and triple-digit gains in 2024 alone.

See Stocks Now >>

Want the latest recommendations from Zacks Investment Research? Today, you can download 7 Best Stocks for the Next 30 Days. Click to get this free report

Amazon.com, Inc. (AMZN): Free Stock Analysis Report

MongoDB, Inc. (MDB): Free Stock Analysis Report

Couchbase, Inc. (BASE): Free Stock Analysis Report

Rubrik, Inc. (RBRK): Free Stock Analysis Report

This article originally published on Zacks Investment Research (zacks.com).

Zacks Investment Research

Source Zacks-com

Uncategorized

Google’s Agent2Agent Protocol Enters the Linux Foundation

MMS • Sergio De Simone

Recently open-sourced by Google, the Agent2Agent protocol is now part of the Linux Foundation, along with its accompanying SDKs and developer tools.

The Agent2Agent protocol will be the cornerstone of a wider Agent2Agent project formed by Google, AWS, Cisco, Microsoft, and others. The project aims to foster interoperability for AI agents and break down the silos that are limiting collaboration between them, says the company.

By providing a common language for AI agents to discover each other’s capabilities, securely exchange information, and coordinate complex tasks, the A2A protocol is paving the way for a new era of more powerful, collaborative, and innovative AI applications.

Using the Agent2Agent protocol, agents can discover each other’s capabilities, negotiate how to interact, and collaborate securely on long-running tasks. The protocol is particularly focused on preserving each agent’s internal state, including its prompt.

The protocol is based on JSON-RPC 2.0 over HTTP and uses server-sent events for real-time streaming between agents. Agents know about each other through “agent cards” that describe agent capabilities and provide connection info. In the future, agent cards will include also authorization schemes and optional credentials. Other areas of future development include client-initiated interactions and dynamic UX negotiation within tasks, such as adding audio/video formats after the initial negotiation phase, i.e. after the agents have started their conversation.

According to Google, the Agent2Agent protocol has seen significant adoption, with over 100 companies supporting it. Since its original announcement, the protocol has raised some controversy due to its overlap with Anthropic’s Model Context Protocol (MCP).

Reddit commenter Impressive-Owl3830 expressed concern that this overlap might prevent the two protocols from coexisting, with MCP already having “taken off”. Another redditor, Specialist_Apricot74, noted this announcement “puts to rest the threat of the triple E threat (Embrace, Extend, Extinguish)” and could help Agent2Agent to differentiate itself from MCP by reducing its overlap and specializing in at least one task that MCP cannot do.

Google says Agent2Agent is ideal when agents are developed and deployed independently, come from different teams, require dynamic discovery and composition, and need to support third-party integration or frequent changes, such as adding or removing agents at any time.

If you are interested in Agent2Agent, a great starting point is Google’s unofficial Python Notebook, which illustrates how you can set up a system with three agents, one searching the web for current trending topics, another performing deep analysis, and the last orchestrating the first two to provide insights.

About the Author

Sergio De Simone

Show moreShow less

Uncategorized

MongoDB Trades at a P/S of 7.08X: Should You Still Buy MDB Stock? – July 1, 2025 – Zacks

MMS • RSS

We use cookies to understand how you use our site and to improve your experience.

This includes personalizing content and advertising.

By pressing “Accept All” or closing out of this banner, you consent to the use of all cookies and similar technologies and the sharing of information they collect with third parties.

You can reject marketing cookies by pressing “Deny Optional,” but we still use essential, performance, and functional cookies.

In addition, whether you “Accept All,” Deny Optional,” click the X or otherwise continue to use the site, you accept our Privacy Policy and Terms of Service, revised from time to time.

Uncategorized

Presentation: A Framework for Building Micro Metrics for LLM System Evaluation

MMS • Denys Linkov

Transcript

Linkov: Who here has changed a system prompt before and led to issues in production? You’ll have this experience soon. You run all these tests, hopefully you have evaluations before you change your models, and they all pass. Then things are going well until somebody pings you in the Discord server that everything is broken. One scenario that happened that led to this whole concept of a micro metric was, we released a change in our system prompts for how we interact with models. Somebody was prompting a model in a non-English language. They were having a conversation with their end user in a non-English language.

By conversation turn number five, the model responds in English, and this customer is very mad why their chatbot is responding in a English language when it’s been talking in German the whole time. They’re very confused, and so are we. Building LLM platforms, or any kind of platform is challenging. Who here has worked on a platform? The company I work at, Voiceflow, is an AI agent building platform. We’ve been around for six years. I’ve been there for three and a half. It’s been interesting to allow people to build different kinds of AI applications, starting with more traditional NLU intent entity applications, to things that now focused on LLMs.

For scale, we support around 250,000 users. We have a lot of projects, a lot of different variety on those projects. We have lots of different languages. We have lots of different enterprise customers. When we’re doing rollouts, it’s not quite at the scale of some of the companies you’ve seen here, but it’s at a pretty decent scale.

What Makes a Good LLM Response?

We get into this question of, when you’re building an LLM application, what actually makes a good LLM response? It’s a pretty philosophical question, because it’s actually hard to get people to agree what good means.

The first one is that, what makes LLMs very attractive, but also very misleading is that, generally, they sound pretty good. They sound pretty convincing. The second part that I mentioned already is that people often do not agree on what is good, so you have this constant pressure of LLMs make up plausible things, and people do not know what’s good. Sometimes people don’t read the responses. You have many different options for scoring. You likely have used some of these approaches before. We have things like doing regex matches or exact matches on outputs. We have things like doing cosine similarity between different phrases, outputs, and golden datasets. You might use an LLM as a judge, or you might use more traditional data science metrics.

Lesson 1: The Flaws of One Metric

Let’s start it off with some lessons that we’ve learned. The first one is the flaw of a single metric. We’ll start off with semantic similarity. It’s a thing that powers RAG generally, the way where you search for similar phrases. Here’s a list of three phrases that I’m going to compare to the phrase, I like to eat potatoes, and three different models that are being used. The first one is the OpenAI, most recent model. Then we have two open-source models that rank quite highly on different embedding-based metrics. Who here can guess what the top matching phrase is to, I like to eat potatoes? Who thinks it’s, I am a potato. Who thinks it is, I am a human? Who thinks it is, I am hungry? All three models thought it was the first one.

Apparently, when you say I like to eat potatoes and I am a potato, you get into some weird dynamic there. When you train these models that do comparisons of cosine similarity or any kind of semantic similarity, there are flaws to it. As we talked about, I am hungry is probably the closest one, and I am human, because humans eat potatoes. Metrics don’t work all the time. Then we go on to LLM as a judge. This is quite popular, especially GPT-4. A lot of synthetic data is generated with LLMs, and it’s a common technique to verify when we’re too lazy to actually read the responses ourselves.

The problem is that these models have significant bias associated with them on what they score results with. This is a paper that was released in 2023 talking about how GPT-4 and human agreement is misaligned whenever prompts are shorter versus longer. GPT-4 really likes long prompts, and then GPT-4 does not like short prompts. We’ve seen this bias through a number of different studies. This is an interesting concept, because these models are trained in a certain way to mimic certain human tendencies or certain preferences that might emerge after the training.

Now we go into the question, what about humans? Are humans reliable judges? This is a good way to control for this type of topic. I want to take you through standardized exams. Who here has written a standardized exam before? There was some research done almost 20 years ago on the SAT Essay, where the researcher found that if you simply looked at the length of the essay, it correlated very well with how examiners scored an essay.

The essay that determined how high school students in the U.S. go on to university was almost purely based on the length of the essay, ignoring facts or other pieces of information. Humans are great judges. What does it mean to be better? What does it mean to be good? Who would rather watch a YouTube video about cats or LLMs? If we look at two highly performing results, we see that these baby cats videos have 36 million views, versus this very good lecture by Karpathy, has 4 million views. We say, “Cats are better than LLMs. Obviously, we should serve only cat content to people”. That’s generally what social media is like. Now we have this concept that views, accuracy, or all these metrics by themselves are not enough. They have flaws. You could probably get to that within your own reasoning.

If we talk about how we give instructions to people, we generally give pretty specific instructions for some tasks, but vaguer instructions for others. Who here has worked in fast food before? I used to work at McDonald’s. It was a character-building experience. When we get different instructions, some are specific and some are vague. This is the nature of human nature. Sometimes they’re the right amount of information, sometimes they’re not. For example, when I worked at McDonald’s, the amount of time you should have cooked the chicken nuggets was very specific. It was in the instruction manual. There are beepers that went off everywhere if you did not lift the chicken nuggets.

At the same time, you’d come into work and your manager would be like, mop the floor. If you hadn’t mopped the floor before, you ask a follow-up question or you make a mess. There are things in the middle. All these different things talk about the ambiguity of what instructions are actually like for humans. Who here is a manager? When doing performance reviews, it’s important to give specific feedback. We heard this probably in a lot of the engineering track talks about, how do you manage a good team? These are some of the questions that you might be asked in a McDonald’s performance review, or any fast food. I always got in trouble for giving too much swirls on the ice cream cone, and that’s probably why the machine was broken. It is what it is. We have specific things that you got feedback for, and then you’d get some review.

Metrics for LLMs, you can think of them as being fairly similar, not because LLMs are human or they’re becoming some human entity, but because it’s a good framework to think about how vague or specific you should be. Who’s got this kind of feedback on a performance review? It’s awful. What am I supposed to do with this? “You’re doing great”. Off to your next meeting.

One of the things that we do at Voiceflow, completely separate from LLMs, is that, for engineering, our performance reviews are quite specific, maybe too specific. I’ll go through all 13 categories and rate people based on our five different levels, and provide three to five examples based on the work. It might be a lot, but that’s the specific feedback that I think is appropriate to give as a manager to people within your team. Similar for large language models, if somebody just says there was a hallucination, I’ll be like, “Great. What am I supposed to do with this information?”

Lesson 2: Models as Systems

Let’s go on to the next lesson, models as systems. Who here has done general observability work or platform work before writing metrics, traces, logs? Do the same things for large language model systems. You do this because you need to observe results and see how your system is actually behaving. You don’t just put something in production, close your eyes and run away. You could do that, but you’re not going to have a good day when you get paged. There are different types of observability events. You have logs, so typically like what happened. You have metrics, how much of that thing happened. A little less verbose. Then you have traces, trying to figure out why something happened. It goes through this level of granularity. You typically have metrics that are not as granular or not as verbose, going all the way down to traces.

Focusing on metrics, there’s different dimensions of defining metrics. You’re going to see a lot of these 2D graphs in the presentation. Let’s talk about two types of metrics for LLMs: we can have model degradation and we can have content moderation. For model degradation, you might have this chart on latency, saying, it takes a relatively low amount of time to figure out if a provider is failing, or your inference point is failing. If you want to do model response scoring, that might take a little bit longer, so still on the order of magnitude of maybe seconds.

Then you go to something that’s offline, like choosing the best model. This might be a week’s long decision, or if you work at an enterprise, months. On the content moderation side, likewise, if you’re facing a spam attack, you probably want to do this in some online fashion. You don’t want to run a batch job next week to be like, there is somebody doing some weird stuff on our platform. You need to figure out what the purpose of your metric is, how much latency it’ll take, and how do you define an action going forward.

Going to some more details on this. Here we have four types of applications. I’ve defined them as real-time metrics versus async. Something that’s real-time, you might want to know if there is a model degradation event that is happening. For example, events are timing out, or the model is just returning garbage, that happens. While, if you’re doing model selection, you can do that on an async basis either running evaluations or having philosophical debates. Same thing with guardrails. You can have guardrails both run online or offline, in this case, real-time or async, and so forth. You can make a million metrics, you can define all of them, but at the end of the day, your metrics should help drive business decisions or technical decisions for the next three months. We talk about this in a mathematical sense, as an analogy of metrics should give you magnitude and direction. It shouldn’t just tell you, you need to do this thing. It should give you a sense of importance.

Lesson 3: Build Metrics That Alert of User Issues

The lesson here is, let’s build metrics that alert of user issues, whether immediate or things that will hurt your product in the long term. If you’re building a successful product or a business, whether internal or external, if your product doesn’t work, users are going to leave. Going back to my example of, my LLM isn’t responding in the right language, we have this message from a few users panicking from our community, and we’re like, let’s verify this before it spreads to our enterprise customers in terms of the rollout. We could not reproduce it. We tried. We got one instance where it produced the wrong language response. We’re like, “We see it in the logs. Something weird is happening. We can’t get it to work, because when you look at your experiment, it’s going to break”.

Then we ended up putting in a monitor and a guardrail for this, saying that, double check what language the model responded in, make sure it’s the intended language. We had just a model that would detect this over the course of some milliseconds, and then it would send a retry if it detected a difference. This generally worked. We chose the online method of doing a retry, rather than storing it for later and then trying to do something else. This goes back to the question of, are you going to be doing this online or offline, and is this going to be, in a programming sense, a synchronous or asynchronous call?

If you’re doing content moderation and somebody says something silly into your platform, you might want to not respond, or you might want to flag. It really depends on your business. When you’re trying to make this decision of how is the metric going to affect your business, just go through the scenario, backpropagate through, calculate the gradients of what’s going to happen.

Generally, when you’re building a product, whether internal or external, you want to get your customers’ trust. First, you have to build a product that works, sometimes. Next, you want to do nice things for your customer, so sales 101. You live on this island of customer trust if people are buying your product, and if things break, for example, models responding in the wrong language, you lose trust. Your product is no longer working in your customers’ eyes, and your customers are mad that their customers are complaining. You can do things. You can refund what’s happening. You’re like, “Sorry. I’m going to give you your money back for the pain we caused you”. You could fix the issue, for example, adding an auto retry.

Then you can write an RCA, a root cause analysis to say, “This is why this broke. We’re communicating to you that we fixed it and it shouldn’t happen again”. Whether or not this actually gets you back to the island of customer trust depends on your customer, but you should use this metric and your process of engineering to actually go through and get back the customer’s trust and make sure that your product works as it’s supposed to work. The more complex systems that you’re building, the more complex the observability is. This is something that they have to be very aware of. Just because you make a very complex LLM pipeline, you use all the recent, most modern forms of RAG, you can’t keep track of all of them. It’s going to be harder to debug and figure out what’s going wrong. Going through some simple RAG metrics, you can break RAG down into the two components, the retrieval portion and then the generation portion.

For the retrieval part, you want to make sure you have the right context. You have the correct information. It’s relevant. You don’t have too much superfluous information that’s going to damage your generation, with some kind of optimization of precision and recall, if you know what the ranking should be. Then, on the generation side, there’s different ways to measure that. Some sample ones are correct formatting, the right answer, and no additional information. These are just a few, but recognize that RAG, because of its multiple components, will have different metrics for different parts. If we get more specific in order, you can have accuracy, faithfulness, which is a retrieval metric, correct length, correct persona, or something super specific, like your LLM does not say delve, even though that seems to be very hard.

Lesson 4: Focus on Business Metrics

We’re on lesson 4 now. We’ve come up with some metrics. Hopefully, in your head, you’re thinking about your use case of what you’re building, but at the end of the day, it needs to bring business value. For example, what is the cost of a not safe for work response from your LLM? Everybody’s business is going to be different. Depending who you’re selling to, depending on the context, it’s all going to be different. Your business team should figure this out. Likewise, if you’re providing bad legal advice, you’re building a legal LLM, and it says, go see your neighbor. It’s not good. You need to calculate this as a dollar cost and figure out, how much are you going to invest into metrics, how much are you going to pay extra for this? How much latency you might incur to do online checks. What is the cost of a bad translation, from our earlier example? The reason why we build these metrics and we use LLMs at the end of the day is we want to save some human time.

All the automation that’s being built, all these fancy applications, there’s some way to save human time. Unless you’re building a social media app, then you want people to be stuck on your app for as long as they can. You’re like, “I don’t fully know the business. This is not my job. I’m a developer. I write code”. First of all, no, understand your business. Understand what you’re building. Understand the problem you’re solving. Second of all, it’s fair to ask your business team to do most of the work, otherwise, why are they around? There’s a lot of things that business teams should be doing. They should be defining use cases, talking about how things integrate with their product, measuring ROI, choosing the right model. It goes on for a long time where you need somebody from the product and business side to tell you how to relatively prioritize things. You should be part of that conversation, but metrics are not just a technical thing that should be built.

Especially in an LLM world where LLMs are being put into all sorts of products, make sure that the business team stops and thinks about, how are we defining these things for our product? If you’re finding that you’re doing these things as a technical person, just be aware that the job is quite expensive. Metrics should be retired when they’re no longer useful, or you find a different way to solve it. As models become better, that language problem that I indicated might no longer occur. Or we do the calculation and say, these few users who are non-English users are no longer our target customer, and if they have a bad time, we’ll just absorb the cost of that, whatever that is. Make sure that your metrics align with your current goals, what you’ve learned. Because if you’re launching an LLM application into production, you will learn many things. Make sure you have a cleanup process that handles this.

Lesson 5: Crawl, Walk, Run

Finally, to give some more actionable tips. Don’t jump into the deep end right away. Follow this crawl, walk, run methodology. Again, going back to the metrics approach. You want to make sure you understand use cases, and you want to make sure you have the technical teams. That’s generally how I think measuring any kind of LLM maturity is, but likewise for LLM metric maturity. Talking about crawl, the different prerequisites before you implement these metrics. You want to know what you’re building, why you’re building it. You want to have datasets for evaluations. If you don’t, please go back and make a few. You want to have some basic scoring criteria, and you want to have some basic logging. You want to be able to track your system and know what’s going on, and be able to know generally what’s right and wrong. Some sample metrics here are things like moderation or maybe some kind of accuracy metric based on your evaluation datasets. Again, they’re not perfect, but they’re a great place to start. Again, to walk.

At this point, you should know the challenges of your system, things where the skeletons are, what’s working, what’s not. You should have a clear hypothesis on how to fix them, or at least how to dig deeper into these problems. You should have some feedback loop for addressing these kinds of questions. How can I test my hypothesis, gather feedback data, whether through logs or through users, and address some of these concerns. You should have done some previous metrics attempts before. These metrics get a lot more specific. Some of them might be format following. You could do some recall metric, in this case, net discounted cumulative gain, more of a retrieval metric. You can do things like answer consistency to figure out what’s the right temperature to set and what the tradeoffs are, or you can do language detection. You can see, these are getting more specific. You need a little bit more infrastructure to actually best leverage these. We get into run.

At this point, you should be up on the stage and talking about the cool things that you’re doing. You have a lot of good automation of what you’ve built in-house. You’re doing auto prompt tuning. You have specific goals mapped to your metrics. You have a lot of good data, probably to fine-tune. Again, that’s another business decision. Then your metrics are whatever you want them to be. You understand your system. You understand your product. Figure out what those micro metrics are.

Summary

The five lessons that we covered, we’ve noticed that single metrics can be flawed. Hopefully, from my potato examples, it becomes clear. We know that models are not just LLMs, they’re broader systems, especially as you introduce complexity in various ways, whether things like RAG, tool use, or whatever it might be. You want to build metrics that actually alert you on user issues and things that affect the business, and align them with future business direction. How am I improving my product using LLMs? Don’t overcomplicate it: go through the crawl, walk, run methodology. The worst thing that you can do is make a giant dashboard with 20 metrics, and it’s not helping you do anything. Start off with one metric, build the infrastructure, build the confidence, and deploy to production.

Questions and Answers

Participant 1: Your example of switching from German to English resonated with me. I feel like I’ve seen things like this in production, where it’s obvious in retrospect, like we should have had a test for that, and any human would have seen it as a problem, clearly, the customer did. What I’m not clear we should be doing, I don’t know how to be writing robust tests for unforeseen behavior. Any hints on not just a specific test for, don’t switch languages, but like, what’s the broad type of issue, and how you write a test for that?

Linkov: I think it goes back to any foreshadowing of issues. You try to plan as much as you can, but at the end of the day, production data is the best data. This is where good software practices make sense. Run beta tests, onboard more forgiving users first, feature flag things out. Figure out what parts of your system are going to be most affected, what are weird behaviors. At the end of the day, it’s hard to see, especially if you’re running multiple models, especially if you’re a platform where people are doing so many different things. If your use case is more clearly defined, then it’s a little bit easier to ideate and think through it. I wouldn’t beat yourself up too much, but have those good release patterns and just see how your users break your product.

Luu: What about getting product managers to help you write tests?

Participant 2: I think the major difference here is that because AI models are not deterministic, because you cannot treat the LLM measurements as we do with other software systems. What are the major methods or metrics that you have used to make this need to be better, even though it won’t be 100%, for sure.

Linkov: There’s a lot of techniques to do so. One recent one is constrained decoding, where you can provide a list of possible options to the model to actually produce a result. It’s still an active research area, constrained decoding and many other ones, to actually try to make it a little bit more deterministic. I think there’s also the question of, should you be using an LLM? If an LLM hallucinates or makes an error, is non-deterministic 1% of the time, and you’re used to having 99.99% accuracy or consistency, an LLM is probably not the right model for that. There’s been a lot of advancements in other models, but they’ve been overshadowed by LLMs. I think this is the question of determining, am I going to use an LLM? Am I going to use a more standard ML model, some kind of encoder model to do a task? I’m going to define a manual workflow, and the LLM just helps guide me through that workflow. These are all decisions that you can be making. Hopefully, we’re past the just throw everything into an LLM and pray, in production.

Participant 3: You mentioned synthetic data, I wonder what Voiceflow’s take on using synthetic data. If so, how does it compare using the micro metrics that you mentioned?

Linkov: I think synthetic data is part of that general process of evaluating and generating examples. This is something where, when we’re writing test cases, it’s a really good way to expand beyond like, I’ll write some by hand or write some augmented ones, and then figure out where the extra edge conditions are. Then you can verify how often is this metric being triggered. I think there’s different ways to use it. We primarily use it in the testing stage, just to give us more variety, because it takes a lot of time to write good tests.

Participant 4: In your particular use case, how do you handle the balance between testing for too many things and just bearing the cost of being wrong or making a mistake. Is there a number of tests? Do you say, I’m going to test but it’s not going to be more than one second or something, or is there like a dollar amount? How did you handle it in your particular use cases?

Linkov: I think we have a few tests that are run online, things like content moderation, things like this language test. A few other ones as well. I think generally, you’re never going to get it right 100% of the time. We prefer to launch with the techniques I mentioned earlier. Get into production. Get into users’ hands. This is part of digital transformation in general. In large companies the whole agile process and feature flags are still making their way through. It’s still really important to have good staging environments, paid environments, going through all of this together. Recommendation is, test more. Don’t ignore testing. Have good evidence where somebody says, how do you know this will work? Don’t just say, I made it up, or, I think it’s going to work. Have some evidence to showcase that. Come up with good test cases to get most of the coverage.

At the end of the day, your product is useless unless it’s in a customer’s hands. Products that die in prototyping, you don’t get a thumbs up for that. You need to make it into production. Every organization has a different process for doing risk assessment and everything else, so, really depends on that. Write some baseline tests so you’re confident, as somebody who owns this service or owns this product, that said, I did enough, given the tradeoffs of shipping quickly, versus making sure it’s definitely going to work.

See more presentations with transcripts

Uncategorized

MongoDB, Inc. (NASDAQ:MDB) Shares Bought by Robeco Institutional Asset Management B.V.

MMS • RSS

Robeco Institutional Asset Management B.V. lifted its position in shares of MongoDB, Inc. (NASDAQ:MDB – Free Report) by 13.2% in the first quarter, according to the company in its most recent 13F filing with the Securities & Exchange Commission. The fund owned 19,449 shares of the company’s stock after buying an additional 2,270 shares during the period. Robeco Institutional Asset Management B.V.’s holdings in MongoDB were worth $3,411,000 as of its most recent SEC filing.

A number of other institutional investors and hedge funds have also made changes to their positions in MDB. Strategic Investment Solutions Inc. IL purchased a new position in MongoDB in the fourth quarter valued at approximately $29,000. Coppell Advisory Solutions LLC increased its position in shares of MongoDB by 364.0% in the fourth quarter. Coppell Advisory Solutions LLC now owns 232 shares of the company’s stock worth $54,000 after acquiring an additional 182 shares in the last quarter. Smartleaf Asset Management LLC increased its position in shares of MongoDB by 56.8% in the fourth quarter. Smartleaf Asset Management LLC now owns 370 shares of the company’s stock worth $87,000 after acquiring an additional 134 shares in the last quarter. J.Safra Asset Management Corp raised its stake in shares of MongoDB by 72.0% in the 4th quarter. J.Safra Asset Management Corp now owns 387 shares of the company’s stock valued at $91,000 after acquiring an additional 162 shares during the period. Finally, Aster Capital Management DIFC Ltd acquired a new position in shares of MongoDB during the 4th quarter valued at $97,000. 89.29% of the stock is currently owned by institutional investors and hedge funds.

MongoDB Price Performance

MongoDB stock opened at $209.99 on Tuesday. MongoDB, Inc. has a 12 month low of $140.78 and a 12 month high of $370.00. The company has a fifty day moving average price of $191.99 and a two-hundred day moving average price of $216.25. The company has a market cap of $17.16 billion, a PE ratio of -184.20 and a beta of 1.40.

MongoDB (NASDAQ:MDB – Get Free Report) last posted its quarterly earnings results on Wednesday, June 4th. The company reported $1.00 EPS for the quarter, beating the consensus estimate of $0.65 by $0.35. MongoDB had a negative net margin of 4.09% and a negative return on equity of 3.16%. The business had revenue of $549.01 million during the quarter, compared to analyst estimates of $527.49 million. During the same period in the previous year, the business posted $0.51 EPS. The company’s quarterly revenue was up 21.8% on a year-over-year basis. Equities research analysts anticipate that MongoDB, Inc. will post -1.78 earnings per share for the current year.

Analyst Upgrades and Downgrades

MDB has been the subject of several recent research reports. Canaccord Genuity Group lowered their price target on MongoDB from $385.00 to $320.00 and set a “buy” rating for the company in a report on Thursday, March 6th. Oppenheimer lowered their target price on MongoDB from $400.00 to $330.00 and set an “outperform” rating for the company in a research note on Thursday, March 6th. Wedbush reissued an “outperform” rating and set a $300.00 price target on shares of MongoDB in a research report on Thursday, June 5th. Stifel Nicolaus decreased their price target on MongoDB from $340.00 to $275.00 and set a “buy” rating for the company in a report on Friday, April 11th. Finally, Morgan Stanley dropped their price objective on shares of MongoDB from $315.00 to $235.00 and set an “overweight” rating on the stock in a research note on Wednesday, April 16th. Eight equities research analysts have rated the stock with a hold rating, twenty-five have issued a buy rating and one has assigned a strong buy rating to the stock. According to MarketBeat.com, the stock has a consensus rating of “Moderate Buy” and an average target price of $282.47.

Check Out Our Latest Stock Analysis on MongoDB

Insider Buying and Selling

In other news, Director Dwight A. Merriman sold 2,000 shares of the stock in a transaction that occurred on Thursday, June 5th. The stock was sold at an average price of $234.00, for a total transaction of $468,000.00. Following the sale, the director directly owned 1,107,006 shares in the company, valued at approximately $259,039,404. This represents a 0.18% decrease in their position. The sale was disclosed in a document filed with the Securities & Exchange Commission, which can be accessed through the SEC website. Also, insider Cedric Pech sold 1,690 shares of the firm’s stock in a transaction that occurred on Wednesday, April 2nd. The stock was sold at an average price of $173.26, for a total value of $292,809.40. Following the completion of the sale, the insider directly owned 57,634 shares in the company, valued at approximately $9,985,666.84. This trade represents a 2.85% decrease in their position. The disclosure for this sale can be found here. Insiders sold a total of 50,027 shares of company stock valued at $10,371,435 over the last 90 days. 3.10% of the stock is owned by corporate insiders.

MongoDB Company Profile

(Free Report)

MongoDB, Inc. (NASDAQ:MDB) Shares Sold by Oppenheimer & Co. Inc. – Defense World

MMS • RSS

Oppenheimer & Co. Inc. reduced its position in MongoDB, Inc. (NASDAQ:MDB – Free Report) by 49.8% during the 1st quarter, according to its most recent filing with the Securities and Exchange Commission (SEC). The institutional investor owned 4,614 shares of the company’s stock after selling 4,579 shares during the period. Oppenheimer & Co. Inc.’s holdings in MongoDB were worth $809,000 as of its most recent SEC filing.

Other institutional investors have also modified their holdings of the company. Vanguard Group Inc. lifted its stake in shares of MongoDB by 0.3% during the fourth quarter. Vanguard Group Inc. now owns 7,328,745 shares of the company’s stock worth $1,706,205,000 after purchasing an additional 23,942 shares in the last quarter. Franklin Resources Inc. boosted its holdings in MongoDB by 9.7% in the 4th quarter. Franklin Resources Inc. now owns 2,054,888 shares of the company’s stock worth $478,398,000 after acquiring an additional 181,962 shares during the period. Geode Capital Management LLC boosted its holdings in MongoDB by 1.8% in the 4th quarter. Geode Capital Management LLC now owns 1,252,142 shares of the company’s stock worth $290,987,000 after acquiring an additional 22,106 shares during the period. First Trust Advisors LP raised its stake in shares of MongoDB by 12.6% during the fourth quarter. First Trust Advisors LP now owns 854,906 shares of the company’s stock valued at $199,031,000 after purchasing an additional 95,893 shares during the period. Finally, Norges Bank acquired a new position in shares of MongoDB during the fourth quarter worth $189,584,000. Hedge funds and other institutional investors own 89.29% of the company’s stock.

MongoDB Stock Performance

MongoDB stock opened at $209.92 on Monday. MongoDB, Inc. has a one year low of $140.78 and a one year high of $370.00. The company has a 50-day simple moving average of $190.83 and a 200-day simple moving average of $216.65. The firm has a market capitalization of $17.15 billion, a P/E ratio of -184.14 and a beta of 1.40.

<!—->

MongoDB (NASDAQ:MDB – Get Free Report) last posted its earnings results on Wednesday, June 4th. The company reported $1.00 earnings per share (EPS) for the quarter, beating analysts’ consensus estimates of $0.65 by $0.35. MongoDB had a negative net margin of 4.09% and a negative return on equity of 3.16%. The firm had revenue of $549.01 million during the quarter, compared to analyst estimates of $527.49 million. During the same period in the previous year, the business posted $0.51 EPS. MongoDB’s revenue was up 21.8% on a year-over-year basis. As a group, sell-side analysts forecast that MongoDB, Inc. will post -1.78 earnings per share for the current year.

Insider Activity at MongoDB

In other news, insider Cedric Pech sold 1,690 shares of the stock in a transaction on Wednesday, April 2nd. The stock was sold at an average price of $173.26, for a total transaction of $292,809.40. Following the completion of the sale, the insider now owns 57,634 shares in the company, valued at approximately $9,985,666.84. This represents a 2.85% decrease in their position. The sale was disclosed in a legal filing with the Securities & Exchange Commission, which is accessible through this hyperlink. Also, Director Dwight A. Merriman sold 820 shares of the business’s stock in a transaction on Wednesday, June 25th. The stock was sold at an average price of $210.84, for a total value of $172,888.80. Following the sale, the director now directly owns 1,106,186 shares in the company, valued at $233,228,256.24. This trade represents a 0.07% decrease in their ownership of the stock. The disclosure for this sale can be found here. Insiders have sold 51,202 shares of company stock worth $10,576,696 over the last quarter. 3.10% of the stock is currently owned by company insiders.

Analyst Upgrades and Downgrades

Several equities research analysts recently issued reports on MDB shares. Stifel Nicolaus dropped their price target on shares of MongoDB from $340.00 to $275.00 and set a “buy” rating on the stock in a research note on Friday, April 11th. Monness Crespi & Hardt raised MongoDB from a “neutral” rating to a “buy” rating and set a $295.00 target price on the stock in a research report on Thursday, June 5th. Truist Financial cut their price target on MongoDB from $300.00 to $275.00 and set a “buy” rating on the stock in a research note on Monday, March 31st. Scotiabank increased their price target on MongoDB from $160.00 to $230.00 and gave the company a “sector perform” rating in a report on Thursday, June 5th. Finally, Redburn Atlantic upgraded MongoDB from a “sell” rating to a “neutral” rating and set a $170.00 price objective on the stock in a report on Thursday, April 17th. Eight investment analysts have rated the stock with a hold rating, twenty-five have assigned a buy rating and one has issued a strong buy rating to the stock. According to MarketBeat.com, the stock presently has an average rating of “Moderate Buy” and a consensus price target of $282.47.

Check Out Our Latest Stock Report on MDB

MongoDB Company Profile

(Free Report)

MongoDB, Inc. (NASDAQ:MDB) Shares Sold by Mirae Asset Global Investments Co. Ltd.

MMS • RSS

Analyst Upgrades and Downgrades

Insider Buying and Selling

MongoDB Trading Down 0.8%

MongoDB Company Profile

Recommended Stories

Subscribe for MMS Newsletter

Did you know...

MongoDB Doubles Down on India’s Database Boom – Entrepreneur

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

A NoSQL document based eCRF system for study of vaccines with variable adverse events …

MMS • RSS

Abstract

Similar content being viewed by others

Introduction

The proposed NoSQL document-based eCRF system

Phase I: needs assessment and planning

Phase II: design and implementation

Architecture and system model

System modules

System modeling approach

Phase III: Software development

Data storage structure

Vaccination and side effects data

Data access restrictions

Develop system

Reporting module and access filtering

Development tools and framework

Analysis and evaluation of the proposed eCRF system

phase IV: Evaluation

User experience evaluation using the UEQ

Performance comparison between relational and non-relational data models

Benchmark environment configurations and results

Analysis of needs assessment and planning

Identification and validation of functional system features

Comparison of similar platforms and their reporting limitations

System services, stakeholders, and design assumptions

Discrepancies

Usability evaluation

Results of user experience evaluation

Security assessment

Security assessment of the system using the asvs standard

Evaluating the efficiency of the document-based data model

Supporting quality attributes of the presented eCRF

Performance

Scalability

Security analysis

Discussion and principal findings

Conclusion and future directions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Statement

Additional information

Publisher’s note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe for MMS Newsletter

Did you know...

Krane Funds Advisors LLC Buys Shares of 1,235 MongoDB, Inc. (NASDAQ:MDB)

MMS • RSS

Analyst Upgrades and Downgrades

MongoDB Stock Down 0.8%

Insider Buying and Selling

About MongoDB

Featured Articles

Subscribe for MMS Newsletter