QCon Plus (Nov 29): Level-Up On The Engineering Trends You Might Need To Embrace

MMS Founder
MMS Ian Robins

Article originally posted on InfoQ. Visit InfoQ

At the QCon Plus online software development conference this November 29 to December 9, over 1,500 senior software engineers, architects, and team leads will learn about important trends our Program Committee believes will have the most impact on software development.

Join a global community of senior software engineers as they explore use cases and learn about emerging best practices from 80+ real-world practitioners. Attendees will develop their technical and non-technical skills and get valuable insights they can take back to their team to implement right away. 

QCon Plus Track Spotlights

We brought together industry leaders to identify and share 15 major software topics every software developer and technical leader should pay attention to. Stand-out tracks include:

MLOps

MLOps is an emerging engineering discipline that combines ML, DevOps, and Data Engineering to provide automation and infrastructure to speed up the AI/ML development lifecycle and bring models to production faster. It is one of the widely discussed topics in the ML practitioner community.

This track will explore the best practices and innovations the ML community is developing and creating.  Key areas of focus include declarative ML systems, distributed model training, scalable and low latency model inference, and ML observability to protect the downsides and ROI. – Track host: Hien Luu Sr. Engineering Manager @DoorDash

“Before & After”: Hybrid Work Strategies

Many of us have dreamt about escaping the daily commute, working remotely and flexibly, and having an entire world of career options available to us. However, we didn’t realize that this ideal would become a reality so quickly. We’re currently unlearning the practices of decades of colocated office work as we adapt to our new found flexibility. We need to understand how to work efficiently and effectively, how to structure our work and our teams, and how to grow our careers in this remote and hybrid world. 

This track will explore the great opportunities and challenges that remote work brings to our companies and our industry, while deep-diving into the tools and strategies needed to embrace as we shape the future. – Track host: James Stanier, Director of Engineering @Shopify

Architecting for Change at Scale

Regardless of industry, programming language, or company size, change is a necessity in technology. We can’t effectively anticipate all future evolutions, but we can learn from past experiences to inform how to make our systems easier to change without over-engineering. The ability to safely and effectively deploy change at scale can be the difference in beating competitors to market, mitigating zero-day vulnerabilities, keeping developers happy, and ensuring customers have a reliable product.

Change is present every day in how we evolve our systems and release features. It is there when we decide to adopt a new technology or migrate systems from one solution to another. It’s also there when we need to rapidly address large-scale vulnerabilities at scale like we saw last year with log4j.

In this track, attendees will learn patterns and practices to help them architect systems and tooling with agility top of mind – enabling technology to keep up with the needs of the business while minimizing risk and technical debt. – Track host: Haley Tucker, Senior Software Engineer for Productivity Engineering @Netflix

Architectures You’ve Always Wondered About

How do the internet-scale tech giants deliver exceptional user experiences while supporting millions of of users and billions of operations?

In QCon’s marquee Architectures You’ve Always Wondered About track, attendees will learn what it takes to operate at a massive scale from some of the best-known names in our industry. Everyone will take away architectural patterns and anti-patterns, challenges and hard-earned lessons, and some very exciting war stories. – Track host: Randy Shoup, VP Engineering and Chief Architect @eBay

Explore more tracks here. 

I love the variety of topics. It is my annual recap of the technology industry and helps me keep up with what the industry considers modern and state of the art. And mostly I like the fact that people are open about sharing both their successes and failures.” – Nikhil Mohan, Senior MTS/ Engineering @Salesforce.

Make the right decisions by uncovering how senior software developers at early adopter companies are adopting emerging software trends at QCon Plus this Nov 29-Dec 9. Book your seat now and save with our Limited Early Bird Tickets.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Article: How Open Source is Contributing to Your Team’s Development: What Leaders Should Know

MMS Founder
MMS Al Sene

Article originally posted on InfoQ. Visit InfoQ

Key Takeaways

  • Many of today’s open source contributions often straddle a line between “hobby” or “volunteer work” despite the wide usage of open source software across the industry.  
  • Companies should consider rewarding and incentivizing open source contributions, particularly given the benefits they’ve experienced from open source software. 
  • Open source communities uniquely help developers build up their skills and networks outside of direct working environments and domains. 
  • Technical debt and time constraints are holding back developer progress on both open source and non-open source projects. 
  • Incorporating the collaboration, community-building, and giving-back characteristics of open source groups can help engineering leaders build stronger teams.

Open source usage and contribution have significantly increased in recent years and continue to be a formative source for developers across both their personal and professional projects. Contributing to open source projects has become a rite of passage of sorts for many new entrants to the field, and open source communities often provide great learning and networking opportunities while helping early developers build a portfolio of their technical work. 

In many ways, contributing to open source projects has also become easier than ever. Advancements in software collaboration and development platforms such as GitHub have democratized the opportunity to contribute to open source, and industry events such as Hacktoberfest or community forums have become great places for developers to find their first project to contribute. 

There’s little to no debate over the fact that open source software is critical to the tech industry and the developers within it. And yet, the open source community still finds itself hindered by a variety of challenges. The communities built and the benefits offered by open source are incredibly impactful — and engineering leaders would do well to encourage more participation within it, both for their teams and across the industry at large. 

The Face of Participation 

According to a recent study by DigitalOcean on the state of the open source community, approximately 50% of developers surveyed said they participated in open source over the past year. Of those who participated, nearly all (93%) said their level of participation either grew or remained the same since the start of the pandemic, showing that despite the wide-ranging work-life disruptions of the past few years, developers who are committed to open source have found ways to integrate the practice into their new routines and their own “new normal.” 

The Challenges of Contributing to Open Source

However, even the most committed developers will admit that time constraints are one of the biggest barriers to continued work on open source projects. DigitalOcean’s study found that most developers who do participate in open source spend about 1-5 hours a week on contributions and cite “lack of resources/time” as well as “technical debt” as two of the top challenges they face today. 

In addition to these time constraints, the open source world can sometimes be unwelcoming to those that strive to participate. Communication between contributors on open source projects can devolve into “contextual, entitled, subtle and passive-aggressive” comments according to one study on open source dynamics by Carnegie Mellon, or contributors may find themselves faced with contribution policies that are rigid and difficult-to-navigate. Communication between contributors on open source projects also breaks down quickly when projects face deep documentation debts. When there are time and resource constraints on open source projects (which is the majority of open source projects), proper documentation is the first thing to suffer. However, without thorough documentation, newcomers are faced with a very steep learning curve that makes it difficult to contribute unless they’re already very familiar with the project. 

Projects often also suffer from the same diversity and inclusion shortcomings as the rest of the tech industry. DigitalOcean’s own research has found that while most developers do feel that inclusivity in open source has improved over the past few years, there’s a disparity in this sentiment among members of minority groups – 26% of those who identify as part of a minority group disagreed that open source is inclusive, compared to just 12% of non-minority groups. Contributors that help manage open source projects have turned to a variety of solutions to try to mitigate toxic behavior, such as enforcing codes of conduct through bans and aggressive moderation, but even these solutions rely heavily on the time commitment of moderators to these projects.

In its current state, contributing to open source seems to straddle a line between “hobby” on one end and “volunteer work” on the other. Developers that carve out time for open source projects are doing important and innovative work, but this work often goes by unacknowledged by the parties, particularly companies that benefit from it. How open source software happens (i.e., how software is built, developed, and revised by mostly un/under-resourced strangers online) is increasingly incongruous with the role that open source technologies play in companies’ development today. 

What Open Source Brings to the Table 

Time after time, engineering leaders and developers have admitted to the outsized impact that open source software has within their companies. 64% of developers have stated that their company uses open source code for 50% or more of their software, which is even higher among startups and small businesses. 35% of startups and SMBs have used open source code in 50% or more of their software, compared to 28% of enterprises.

When larger companies do speak out about open source, it’s often from a security standpoint. Companies such as Amazon, Google, and Microsoft have joined foundations and organizations such as the Open Source Security Foundation (OpenSSF), which seeks to improve cybersecurity practices within the development and implementation of open source and to secure the “supply chain” of open source. These groups and organizations are important to the long-term success and sustainability of open source software, but are also less focused on addressing the day-to-day barriers that developers face in keeping open source projects maintained and growing. When developers are asked about security considerations, most (43%) believe that employing dedicated security experts to help oversee projects would improve their security, or believe that increased compensation and training for contributors themselves would help improve security. 

On a smaller scale, developers of all levels turn to open source repositories to solve a problem, expand their skills, or tackle new scenarios — all with resulting personal and professional benefits. 35% of developers who contribute to open source said they’ve gained enhanced skills from their contributions, 19% said they’ve encountered networking opportunities, and 11% even found job opportunities. A strong open source community is also key to keeping developers coming back to contribute: 32% of developers said open source contributions help them feel “purposeful or part of a wide community,” and 20% have even stepped into mentorship roles, helping other community members develop their skills. 

Despite being largely unpaid, volunteer work, mentorship and community play a key role in why open source still sees such high activity and engagement from developers even as they cite massive time and technical debt challenges in their professional roles. GitHub’s 2021 State of the Octoverse shows that a commitment to mentorship in open source communities results in a 46% improvement in productivity on these projects. This effect is also seen in the workplace, where “mentorship almost doubles the likelihood of a strong culture.” Under the right environment, open source is better due to strong developer communities, and developers are better because of strong open source communities. 

The Road Forward

When the question of how companies and organizations can give back to open source communities arises, payment is often one of the first topics to emerge. Payment for open source contributions is a highly debated topic. On the one hand, a majority of developers (53% in DigitalOcean’s study) seem to agree or strongly agree that individuals should be paid for their open source contributions, while on the other, there are developers that fear creating monetization or funding structures for open source may result in a more closed ecosystem of development, instead of a more open one. 

If companies balk at the insinuation of paying for open source software, or paying for contributions, there are other alternatives that some industry leaders have been exploring as a way to engage deeper with communities and “give back.” For example, last year Cisco hired its first Head of Open Source to act “as connective tissue” between open source initiatives, Cisco customers, and different business groups with the hopes of supporting those developers and maintainers that do “what can often be invisible work.” However, the creation of these types of roles or initiatives depends largely on having someone to advocate for the open source community internally, and to illustrate the ROI on building up open source communities. More recently, these types of efforts are falling to developer relations (DevRel) and developer advocacy teams, the likes of which are growing and expanding at some of the largest tech companies. 

Developers also see companies potentially allocating time to open source contributions during work hours as a potential solution to challenges of both time and deprioritization. 79% of developers surveyed agreed or strongly agreed that companies should give time during work hours to contribute to open source. In the future, open source contributions could be included in developer job descriptions, or companies could begin wrapping open source time into the type of employee volunteer and social good programs typically seen in larger enterprises. Particularly in the post-pandemic workplace, employees now more than ever are more likely to want to contribute their time and skills to efforts they see as worthwhile or giving back to society. By not encouraging open source contributions through dedicated work time or through volunteer programs, companies may be missing out on a very neat solution to issues of open source reliance and larger employee engagement. 

The question of incentivizing participation is perhaps also a generational one as well — less experienced developers (with a year or less of experience) are more likely to have participated in open source in the past year than more experienced developers. We can speculate that perhaps more experienced developers are running into larger problems of time constraints and bandwidth as they grow into more demanding roles, and it’ll be up to companies to figure out how to level the playing field so that all types of developers are given the opportunity to get involved in open source initiatives that are run at the company-level. This may look like baking in time in production schedules for open source contributions and reviews, or making open source responsibilities a critical part of certain roles or titles. 

Ultimately, engineering leaders, companies, and individual developers will have to work in tandem to effectively scale the innovation and benefits that open source offers to the industry without losing the overall values of community and collaboration that are so key to open source projects. As it stands, open source projects represent some of our best, most collaborative, and most impassioned uses of technology, where developers find mentorship and new opportunities to develop their skills in ways most aligned to their interests. As most engineering professionals and team leads can recognize, workplace communities could use a little more of this spirit. 

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Developing and Evolving SaaS Infrastructures for Enterprises

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

SaaS companies that are focused on the enterprise market need to evolve their infrastructure to meet the security, reliability, and other IT requirements of their customers. IT admins and large customers are two important sources of requirements to drive development.

Prashant Pandey, head of engineering at Asana, spoke about establishing a SaaS infrastructure that supports and grows with large customers at DevOpsCon London 2022.

The exact needs of IT admins vary depending on the industries the product serves, the capabilities of the product, and the type of data that the product stores or accesses. It’s important to understand the domain and conduct research on the experience of IT admins with your product, as Pandey explained:

Treat IT admins as primary customers, and think about features for IT admins in the same way as you think about other features, with an eye towards usability, flexibility, and efficiency at scale.

Adoption by enterprise customers leads to more requirements around security, reliability, and scalable management. Pandey suggested monitoring the requests coming from your current largest customers. Those are likely to become more prominent as you succeed in getting more customers of that size, he said.

InfoQ interviewed Prashant Pandey about reliability and security in enterprise SaaS solutions.

InfoQ: How do administrators use controls provided by SaaS products to ensure reliability and data security?

Prashant Pandey: An example of using controls is SCIM integration, where an admin can ensure that users’ access to a software product is automatically removed when the account is centrally deprovisioned, removing the risk of ex-employees retaining access to data. Admins also use these features to ensure availability of the right set of SaaS products to individuals and teams that need them. IT admins can use integration controls provided by a SaaS product to ensure that employees only use approved document sharing systems, reducing the risk of data exfiltration.

IT admins can enforce security controls by requiring 2FA or a single sign-on for all software with access to critical data. Features like data export and Security Information and Event Management can be used for forensic analysis, for example to ascertain whether a leaked credential was utilized to access a software product or update data. The ability for admins to send custom messages and announcements in-product also enables admins to share timely updates like scheduled maintenance announcements.

InfoQ: How can a SaaS provider build infrastructure that meets enterprise needs?

Pandey: Sequencing infrastructure work can allow an evolution towards meeting more enterprise needs. Backups should be an early part of your reliability strategy. Regular end-to-end testing of business continuity using backups requires more investment and becomes more important as your system complexity increases. An ability to measure availability and understand reasons for downtime are worth investing in early when building a SaaS product. Those systems can be extended to provide reportable metrics per-customer when customers ask for that visibility.

Security certifications are an important way to decrease friction in the sales process, so any SaaS product team should also invest in understanding which certifications (like SOC 2, SOC 3, ISO 27001, FERPA, HIPAA etc.) are valued by their potential customers, and what development/operational cost is involved in achieving and maintaining it. There should be a roadmap for pursuing the right certifications based on their return on investment. Risks related to data access grow as your team size grows, and as the amount of customer data your product processes and stores grows. To manage these risks, it’s prudent to follow the principle of least privilege, and to invest more in internal controls to keep up with the growth.

InfoQ: What are the benefits that data isolation can bring?

Pandey: An important technique for increasing scalability, performance, and security is to create more isolating customer data and services in “compartments” to reduce noisy neighbor performance effects, blast radius of availability events, and number of customers impacted by certain types of security incidents.

At Asana, we started with isolation largely provided by the app layers. Then we separated customer data into database shards, followed by isolating enterprise customers to individual databases and search clusters. Now, we’re considering separate accounts hosted by our cloud provider for all infrastructure that touches a particular customer’s data. Building in isolation helps us meet key enterprise customer needs – data residency and Enterprise Key management.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Visual Studio Code Server Now Available in Private Preview

MMS Founder
MMS Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

Microsoft has announced a private preview of the backend service that powers its Visual Studio Code editor, along with a specific CLI to manage it. Visual Studio Code Server can be installed everywhere and easily used through VS Code for the Web running in a browser.

Visual Studio Code Server is another move in Microsoft’s journey to enable remote development based on its popular editor. The journey started in 2019 with the introduction of the VS Code Remote Development extensions and led later to the introduction of GitHub Codespaces, which quickly became GitHub default development platform.

That is quite a lot of flexibility for a code editor, which is possible thanks to Visual Studio Code architecture:

We can do this because VS Code is, by design, a multi-process application. Conceptually, the front end (where you type your code) runs in one process and a backend service (which hosts extensions, the terminal, debugging, etc.) runs in a separate process.

While the Remote Extensions already enabled to code “remotely” using a local VS Code frontend, that requires dealing with SSH or HTTPS configuration, which admittedly is not entirely desirable. With Visual Studio Code Server, Microsoft aims to simplify the overall process of installing, managing, and connecting to your “remote” instance.

VS Code Server […] is a service built off the same underlying server used by the remote extensions, plus some additional functionality, like an interactive CLI and facilitating secure connections to vscode.dev.

Visual Studio Code Server CLI, named code-server, differs from the standard code CLI that you normally use on your desktop machine. The new CLI is able to establish a secure tunnel between VS Code for the Web, also known as vscode.dev, and your remote machine, so you can use vscode.dev as a frontend to your own VS Code Server running on your premises or in the Cloud.

The CLI also supports the possibility of running the VS Code Web UI on your own and then using the code-server serve-local command to connect it to your Server instance. This will require you to properly setup an HTTPS connection from you Web UI to the server, though.

You install Visual Studio Code Server running wget -O- https://aka.ms/install-vscode-server/setup.sh | sh on a Linux box, Mac, or Windows machine running WSL. When you start the server running code-server, it will communicate with vscode.dev through a secure tunnel and provide a login token and an authentication URL. After authenticating, the CLI spins up a server instance and generates a vscode.dev URL that you can use in any browser.

To get access to the VS Code Server preview, you will need to request access using this registration form.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Meta Open-Sources 200 Language Translation AI NLLB-200

MMS Founder
MMS Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

Meta AI recently open-sourced NLLB-200, an AI model that can translate between any of over 200 languages. NLB-200 is a 54.5B parameter Mixture of Experts (MoE) model that was trained on a dataset containing more than 18 billion sentence pairs. On benchmark evaluations, NLLB-200 outperforms other state-of-the-art models by up to 44%.

The model was developed as part of Meta’s No Language Left Behind (NLLB) project. This project is focused on providing machine translation (MT) support for low-resource languages: those with fewer than one million publicly available translated sentences. To develop NLLB-200, the researchers collected several multilingual training datasets by hiring professional human translators as well as by mining data from the web. The team also created and open-sourced an expanded benchmark dataset, FLORES-200, that can evaluate MT models in over 40k translation directions. According to Meta,

Translation is one of the most exciting areas in AI because of its impact on people’s everyday lives. NLLB is about much more than just giving people better access to content on the web. It will make it easier for people to contribute and share information across languages. We have more work ahead, but we are energized by our recent progress….

Meta AI researchers have been working on the problems of neural machine translation (NMT) and low-resource languages for many years. In 2018, Meta released Language-Agnostic SEntence Representations (LASER), a library for converting text to an embedding space that preserves sentence meaning across 50 languages. 2019 saw the release of the first iteration of FLORES evaluation dataset, which was expanded to 100 languages in 2021. In 2020, InfoQ covered the release of Meta’s M2M-100, the first single model that could translate between any pair from 100 languages.

As part of the latest release, the FLORES benchmark was updated to cover 200 languages. The researchers hired professional translators to translate the FLORES sentences into each new language, with an independent set of translators reviewing the work. Overall, the benchmark contains translations of 3k sentences sampled from the English version of Wikipedia.

For training the NLLB-200 model, Meta created several multilingual training datasets. NLLB-MD, a dataset to evaluate the generalization of the model, contains 3k sentences from four non-Wikipedia sources, also professionally translated into six low-resource languages. NLLB-Seed contains 6k sentences from Wikipedia professionally translated into 39 low-resource languages and is used for “bootstrapping” model training. Finally, the researchers built a data-mining pipeline to generate a multilingual training dataset containing over 1B sentence pairs in 148 languages.

The final NLLB-200 model is based on the Transformer encoder-decoder architecture; however, every 4th Transformer block has its feed-forward layer replaced with a Sparsely Gated Mixture of Experts layer. To compare the model with existing state-of-the-art performance, the team evaluated it on the older FLORES-101 benchmark. NLLB-200 outperformed other models by 7.3 sentence-piece BLEU points on average: a 44% improvement.

Several of the NLLB team joined a Reddit “Ask Me Anything” session to answer questions about the work. When one user asked about challenges posed by the low-resource languages, research scientist Philipp Koehn replied:

Our main push was towards languages that were not served by machine translation before. We tend to have less pre-existing translated texts or even any texts for them – which is a problem for our data-driven machine learning methods. Different scripts are a problem, especially for translating names. But there are also languages that express less information explicitly (such as tense or gender), so translating from those languages requires inference over a broader context.

The NLLB-200 models and training code, as well as the FLORES-200 benchmark, are available on GitHub.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Java News Roundup: Helidon 3.0, GraalVM 22.2, IntelliJ IDEA 2022.2, Vert.x Virtual Threads

MMS Founder
MMS Michael Redlich

Article originally posted on InfoQ. Visit InfoQ

This week’s Java roundup for July 25th, 2022, features news from OpenJDK, JDK 19, JDK 20, Spring project updates, Helidon 3.0, GraalVM 22.2, Quarkus 2.11.1 and 2.10.4, Micronaut 3.5.4, Eclipse Vert.x virtual threads incubator, Jakarta EE 10 updates, IntelliJ IDEA 2022.2, JUnit 5.9.0, Apache Software Foundation project updates and Multik 0.2.0.

OpenJDK

JEP Draft 8285724, Deprecate JMX M-Lets (Management Applets) for Removal, a feature JEP, proposes to deprecate the Java Management eXtension (JMX) “M-Let” feature for removal in a future release as it has become obsolete and irrelevant for modern application development. The removal of M-Lets, originally inspired by applets, and the javax.management.loading API will have no effect on JMX and its related technologies.

JDK 19

Build 33 of the JDK 19 early-access builds was made available this past week, featuring updates from Build 32 that include fixes to various issues. More details may be found in the release notes.

JDK 20

Build 8 of the JDK 20 early-access builds was also made available this past week, featuring updates from Build 7 that include fixes to various issues. Release notes are not yet available.

For JDK 19 and JDK 20, developers are encouraged to report bugs via the Java Bug Database.

Spring Framework

Spring Shell 2.1.0 has been released featuring: a new interface, CommandRegistration, as a new way to programmatic define commands; an overhaul of the Spring Shell internals for initial support for the upcoming GA releases of Spring Framework 6.0 and Spring Boot 3.0; and a reevaluation of the @ShellMethod and @ShellOption annotations such that new annotations may be necessary to better align with the CommandRegistration interface. Further details on this release may be found in the release notes.

Spring Cloud OpenFeign 3.0.8 has been made available as a bugfix and documentation release that backports fixes from the 3.1.x release train. The most notable backport is related to the cascading deserialisation in the Page interface from Spring Data.

Since its debut in April 2020, the Spring Authorization Server team has announced that they are preparing for a version 1.0 release scheduled for November 2022. Version 1.0 will be based on Spring Security 6.0 and Spring Framework 6.0 and will require a minimal version of JDK 17, Tomcat 10 and Jetty 11. The team will also release a version 0.4.0 to support the Spring Security 5.x release train and JDK 8. More details on what developers can expect between now and November 2022 may be found in the release schedule and list of features. InfoQ will follow up with a more detailed news story.

Helidon

Two years since the release of Helidon 2.0, Oracle has released Helidon 3.0 that ships with: JDK 17 as a minimal version; an implementation of MicroProfile 5.0 and selected Jakarta EE 9.1 specifications.; support for JEP 290, Filter Incoming Serialization Data, such that deserialization is disabled by default; an updated Helidon SE routing API; a new project starter; and an updated CLI. Further details on this release may be found in the release notes. InfoQ will follow up with a more detailed news story.

GraalVM

Oracle Labs has released GraalVM 22.2 featuring: a smaller GraalVM JDK distribution that is more modular and no longer includes the JavaScript runtime, the LLVM runtime, or VisualVM; improvements in the use of Native Image with third party libraries, smaller memory footprint and heap dumps; faster startup and extended library support for GraalPython; and improved interoperability in GraalJS. GraalVM 22.2 ships with JDK 11 and JDK 17 builds. More details on this release may be found in this YouTube video. InfoQ will follow up with a more detailed news story.

Quarkus

Red Hat has released Quarkus 2.11.1.Final and 2.10.4.Final. Both versions address CVE-2022-2466, a vulnerability discovered in the SmallRye GraphQL server extension in which server requests were not properly terminated. As explained in the blog post:

Unfortunately, the previous fix introduced in 2.10.3.Final and the non-released 2.11.0.Final was incomplete and the issue was still present in another form.

Other new features include: new Redis Client API; dependency upgrades to Vert.x 4.3.2 and Netty 4.1.78; a change in GraphQL endpoints to be a Singleton by default; and a default to JDK 17-based builder images for native executable generation. Further details on these releases may be found in the release notes for version 2.11.1 and version 2.10.4.

Micronaut

Micronaut 3.5.4 has been made available by the Micronaut Foundation as a bug fix and patch release of several Micronaut modules that include: Micronaut Security 3.6.3, Micronaut AWS 3.5.3, Micronaut RxJava 2 1.2.2, Micronaut GCP 4.2.1, and Micronaut Reactor 2.2.3. More details on this release may be found in the release notes.

Eclipse Vert.x

With the upcoming release of JDK 19 that will support virtual threads, the Vert.x team has created a virtual threads incubator project for developers to experiment with virtual threads and provide feedback as necessary. The incubator project currently contains an implementation of async/await based on a proof of concept by August Nagro, software engineer at Axoni. This project is meant to provide central community experiments with virtual threads and can, therefore, host any other virtual thread-based project.

Jakarta EE 10

On the road to Jakarta EE 10, The Jakarta EE Specification Committee has posted a ballot this past week to ratify the Jakarta EE 10 Platform profile. This ballot is scheduled to close on August 9, 2022 and separate ballots will be scheduled for the Web and Core profiles.

JetBrains

JetBrains has released IntelliJ IDEA 2022.2 with new features such as: a migration from JetBrains Runtime (JBR) 11 to JBR17; improvements in remote development; support for Spring Framework 6.0 and Spring Boot 3.0; an experimental GraalVM Native Debugger for Java; and clickable URLs in JSON, YAML, and .properties string values.

JetBrains has also released version 0.2 of Multik, a multidimensional array library for Kotlin. In this first release since version 0.1.1 in November 2021, new features in this new version include: a new multiplatform structure; support for Android and Apple Silicon processors; and improvements to operations such as random numbers, norm matrix and complex numbers. Further details on this release may be found in the release notes.

JUnit

JUnit 5.9.0 has been released with new features such as: support for the Open Test Reporting format; a new keySet() method for the ConfigurationParameters interface that allows for the retrieval of all configuration parameter keys; and a new failIfNoTests attribute added to the @Suite annotation interface that will fail a suite if no tests are discovered. Further details on this release may be found in the release notes. InfoQ will follow up with a more detailed news story.

Apache Software Foundation

The Apache Software Foundation has provided point releases for Camel Quarkus, Tomcat and Groovy.

Maintaining alignment with Quarkus, Camel Quarkus 2.11.0, containing Camel 3.18.0 and Quarkus 2.11.1.Final, features: extensions to support Camel Hashicorp Vault and DataSet; increased JAXB extension test coverage; and a fix for bean introspection on @Singleton scoped beans that did not work. More details on this release may be found in the list of issues.

Tomcat 10.0.23 features: a fix for CVE-2022-34305, a low severity XSS vulnerability in the Form authentication example; support for repeatable builds; and an update of the packaged version of the Tomcat Native Library to 1.2.35 that includes Windows binaries built with OpenSSL 1.1.1q. Further details on this release may be found in the changelog.

Apache Groovy versions 4.0.4, 3.0.12 and 2.5.18 feature bug fixes, improvements and dependency upgrades such as: Spotbugs 4.7.1 , log4j2 2.18.0 and Ant 1.9.16. More details may be found in the release notes for versions 4.0.4, 3.0.12 and 2.5.18.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: InfoQ AI, ML and Data Engineering Trends Report 2022

MMS Founder
MMS Dr Einat Orr Rags Srinivas Roland Meertens Anthony Alford Da

Article originally posted on InfoQ. Visit InfoQ

Subscribe on:






Transcript

Introductions [00:05]

Srini Penchikala: Hi, everyone. Welcome to the InfoQ podcast Annual Trends Report in AI, ML and data engineering topics. I am Srini Penchikala. I am joined today by the InfoQ editorial team, and also an external panelist. There have been a lot of innovations and developments happening in AI and ML space. I’m looking forward to discussing these innovations and trends with our expert panel group. Before we jump into the main part of this podcast, let’s start with the introductions of our panelists. First, Rags Srinivas. Rags, can you please introduce yourself?

Rags Srinivas: Glad to be here. I was here for the previous podcast last year as well. So, things have changed quite a bit, but I focus really mainly on the big data infrastructure and the confluence of that. So quite a few developments happening there that I’d love to talk about when we get there. Myself, I work for DataStax as a developer advocate, and essentially, again, it’s all about data, AI, infrastructure and how to manage your costs and how to do it efficiently. And hopefully, we’ll cover all that.

Srini Penchikala: Next up, Roland Merteens. Roland, please go ahead.

Roland Merteens: Yes. I’m Roland, I’m a machine learning engineer, and I hope to talk a lot about transformer models and large-scale foundational models.

Srini Penchikala: And Anthony Alford. Anthony, please introduce yourself.

Anthony Alford: Hi. I’m Anthony Alford, I’m the director of development at Genesis, a contact software company. For InfoQ, I like to write about some of the latest innovations in deep learning, and definitely want to talk about NLP and some of the multi-modal text and image models.

Srini Penchikala: Next is Daniel Dominguez. Daniel, please introduce yourself.

Daniel Dominguez: Hello. Thank you for the invitation. I’m Daniel. For InfoQ, I write about Meta AI. I like to write about the metaverse, new technologies, deep learning. And my work, I’m a member of the AWS community builder with machine learning, and I also like to write about many things straight up in AWS on machine learning.

Srini Penchikala: We also have a panelist from outside the InfoQ editorial team, and she is a speaker at the recent QCon London conference. Dr. Einat Orr. Einat, please introduce yourself.

Dr. Einat Orr: Thank you for having me. I’m a co-creator of the open source project LakeFS, and a co-founder and CEO at Treeverse. And I would love to talk about the trends that we see concerning data transiency, either in data centric AI, or with data engineering tools.

Srini Penchikala: It’s great to have you all join this discussion. For our readers, let me quickly go through the scope of this podcast and what we expect. The focus of this podcast is to report on innovative technologies and trends in artificial intelligence, machine learning and data engineering areas that our readers may find interesting to learn and apply in their own organizations, when these trends become mainstream technologies. So InfoQ teams also publish trend reports on other topics like architecture, cloud and DevOps, and culture and methods. So please check them out on the InfoQ website.

Srini Penchikala: So there are two major components to these trend reports. The first part is this podcast, which is an opportunity for you to listen to the panel of expert practitioners on how these innovative technologies are disrupting the industry. The second part of the trend report is a written article that will be available on InfoQ website. It’ll contain the trends graph that shows different phases of technology adoption and provides more details on individual technologies that have been added or updated since the last year’s trends report. So I recommend you all to check out the article as well when it’s published later this month.

There are a lot of excellent topics to discuss in this podcast. But in order to organize the discussion a little bit better, we have decided to break the overall trends report into two episodes. In today’s podcast, we will focus on the core technologies underlying AI and ML solutions. In a separate podcast in the future, we will discuss the tools and frameworks that enable the AI, ML initiatives in your organizations. So one of the main areas that has been going through a lot of innovations in the AI space is Natural Language Processing or NLP. Companies like Google, Amazon, and Meta recently announced several different AI language models and training data sets and how these models perform against different industry benchmarks.

Natural Language Processing (NLP) [04:13]

Srini Penchikala: Anthony, you’ve been exploring this area more and you’ve been writing about NLP topics. Can you talk about the recent developments in NLP and related areas like NLU and NLG, as well as some research happening in institutions like Stanford University and other organizations.

Anthony Alford: So one trend that has stayed steady at least is, the transformer is still the architecture of choice. It’s basically taken over the space as opposed to the previous generation of models used recurrent neural networks, such as the LSTM or the GRU. So transformers, their architecture of choice seems to be holding steady there. One thing that we do see is that the models continue to get bigger. Of course, GPT-3, when it came out, was the biggest. But it seems like every few months there’s a larger model. And one of the nice things is we’re seeing more of these models being open sourced. So, for example, there’s a research organization called EleutherAI. They’re releasing their version of GPT. They call it GPT-J, and I think GPT-Neo. Those are completely open source.

And similarly, Meta recently released their OPT model, which is that’s open source, I think. What’s the parameter count, 175 billion, with a B, parameters. So it’s really nice for those of us who are not working for these big companies to be able to get our hands on these open source language models. And the models are very good. We saw recently Google has a model called PaLM. It can explain jokes. If you have to explain a joke, maybe it wasn’t that funny.

Rags Srinivas: It’s not a joke anymore.

Anthony Alford: Exactly. But the models are so good, now that some people think the models are actually sentient. They’re aware, they’ve achieved intelligence. We probably saw stories about Google’s LaMDA model, where an engineer had a discussion with it and thinks, “Wow, this thing’s basically alive.” Obviously there’s still a little skepticism there, but who knows. The other thing that’s nice is we’re starting to see tools for debugging and controlling the output from these models. So because these models are so big and sort of black boxed, a lot of times, it’s hard to figure out why does the model give you the output it does or how can we keep it from outputting something that’s factually wrong or offensive. So there’s research now.

So for example, Stanford University, they’re working on making these language models controllable. And what’s interesting is typically these language models like GPT-3, they’re autoregressive. So their output gets fed back as the input, because the model just predicts the next word in the sentence. And so then you’re building the sentence word by word and feeding that back in. That’s autoregressive, while Stanford is looking at using a new type of generation technique called Diffusion. We’ll talk a little bit more about Diffusion models when we get to the multimodal image. But it looks like this has some promising new capabilities.

Srini Penchikala: Thanks, Anthony. Roland, I know you have also been looking into this area. So do you have anything to add about that?

Roland Merteens: What I can comment on is your mention of these models are becoming sentient. So, Yes we indeed had this whole thing where someone said, “Oh, this model is sentient.” But it seems very similar to, I don’t know if you guys know Clever Hans, which was a horse, which could do all kinds of arithmetic-

Anthony Alford: Exactly.

Roland Merteens: … but it turned out that the person interviewing the horse and giving the questions just, without him knowing, gave some cues. So I think this explains the sentient part. But what I am very excited about is the understanding of the world these models seem to exhibit. So in the last couple of weeks, I played a lot with DALL·E Mini. And if I give a prompt such as a pigeon wearing a suit or an otter wearing a face mask, it knows that a pigeon suit should probably go on the chest of the pigeon. There’s no explicit encoding where a suit should go for a pigeon. There’s no place you learn this. That’s probably not in your training data, but you see that the AI manages to map concepts from one thing, being pigeons, and the other thing being suits and how to wear them, combine them, or that the face mask should go on your face, and also then on the face of an otter.

I think that is at least something which I think is really exciting. And I’m actually thinking that there should maybe be some new kind of AI psychology field where you now take one of these big models and try to find out what did they learn, what can they do, how do they function inside. I think that’s very exciting.

Anthony Alford: That brings up a good point. With these models, what the people building them are chasing is some kind of metric, an accuracy metric or something like that. And it may be that it’s time for some new way to measure or to study these models instead of just in terms of accuracy or similar.

Roland Merteens: But maybe something to move onto is that I also just listened to the podcast we recorded last year. I listened that back, and we are already talking about transformer models. For me, last year was really the year of the transformer, and really using the so-called foundational models. So what I see people do in their work life, but I’m not seeing this enough yet. More people should do this, is that you have a task, one of those downstream tasks, and you take an existing model, such as GPT-3 or CLIP, and just see what you can do with it, see if you can use it, see how useful it is for your task. And I’m noticing that I’m less and less training models, but more using these existing models and finding a good model which fits my use case, and see how I can adapt it to my own downstream tasks. Do you see this as well?

Anthony Alford: Well, one thing that I do see in regards to that, plus the idea of metrics, researchers are starting to use BERT as a way to measure the goodness of their models. They use it when they have these models generate output text, instead of comparing that generated text with some reference text, using Ingrams or something like that. Instead, they take both the reference text and the generated text and run them through BERT, to get the embeddings, and they look at the similarity of the embeddings. So these foundation models like that, they’re becoming a utility, for sure.

Roland Merteens: And one thing I really noticed last year in the podcast, we mentioned that we wanted to get access to GitHub Copilot. So last year, this finally went online as a general public thing. I think you have to pay $100 a year for it. I know that I am absolutely going to do this. Did any one of you use this?

Github Copilot [11:07]

Srini Penchikala: No. Actually, Roland, I was going to ask you about that because last year, we talked about the GPT-3 and the Copilot and other topics. Yes, I was going to see if you have seen any major developments in the Copilot project.

Roland Merteens: Well, now for half a year, I also have access to it. You should actually listen to the interview I did with Cassie Breviu on the InfoQ podcast. We discuss it a bit. But my productivity has gone up 100%. I don’t have to find anything on stack overflow anymore. Whenever I need to write any basic functions, I don’t have to think about them anymore. Copilot generates them for me. So the $100 a year price is absolutely worth it, in my opinion.

Srini Penchikala: $100 is probably nothing for the productivity we’re gaining out of it.

Roland Merteens: Yes, indeed. But I still see an increase in usage of these models, and people are discovering how to use the models. Maybe also a shout out to Hugging Face, which has a really nice collection of data sets and models and tooling to easily host them online. It’s something I have been using a lot last year, and still hope that more people start using it. It’s so incredibly easy to take a simple task and shape it in such a form that CLIP or DALL·E or GPT-3 can work with it. You can try your task before actually setting up your data collection pipeline and your labeling pipeline.

AI/ML Training Datasets [12:37]

Srini Penchikala: Yes, definitely a lot of great information and innovation happening there. So how about the data sets? I know recently I saw a few different organizations have been kind of releasing and open sourcing their data sets. Anthony, do you have any thoughts on that? That’s another great area where machine learning efforts become easier because you don’t have to create your own data synthetically or otherwise, you just use these retrained data sets.

Anthony Alford: And in fact, you could very much make the case that the availability of large high quality data sets is one of the key ingredients of the deep learning revolution. If you think about ImageNet, we didn’t really get deep learning for Vision until after ImageNet was available. So I think it’s definitely key for us to have these datasets. Now, no dataset is perfect. Now, they’re looking at ImageNet and saying, “Well, there’s a lot of mislabeled things or this and that.” But Yes, you’re right. We definitely do see this. Amazon is releasing multilingual data sets for essentially training the voice recognition. We see open source image-text data sets for training things like CLIP or DALL·E. And they’re huge. They’re billions, again, billions with a B, of image-text pairs. So I think this is a positive development. In terms of the trend, the trend is toward more call it democratization, open source models, open datasets to give people the power to do their own research or to take advantage of these models and datasets.

Rags Srinivas: And I think companies are figuring out how to monetize. So just giving away the data, but still monetizing it somewhere else. Right?

Anthony Alford: Only if you’re a cloud provider.

Rags Srinivas: Exactly. Yes.

Anthony Alford: Exactly.

Rags Srinivas: That’s what I meant.

Roland Merteens: But I think that  the monetization part is actually very interesting because I have the feeling that at the moment, with these models, all you need to start a new startup is creativity on how to use them. I think most of the downstream tasks can be solved relatively easily with models like this. And there is such a huge space for exploration on how to use it for smaller applications, from maybe sorting fruits, to maybe classifying whether language is foul language or not, or maybe a spam filter. All these things you can now do with these foundational models and at least get started before you are collecting your data, before you are annotating your data. I think you can get a huge speed up in setting up maybe a machine learning company in a weekend.

Srini Penchikala: Yes, definitely.

Dr. Einat Orr: I think it also correlates very well with another trend that we see, of tools that really focus on the data with ML, rather than focusing on the models themselves. As you said, when you get this dataset shared, you can see the mislabels. And we know that data engineers and machine learning specialists spend about 60% to 80% of their time with data preparation. And as you said, it really saves that time and allows them to focus on their models. But in other situations where you need to obtain the dataset, then that part of the process is time consuming and extremely important for the accuracy of your results. And this is why there is tooling coming up that really focuses on that part of the process, and also on the approach of focusing on the data itself. This is what is called the data-centric AI.

So it starts with tools that provide version control for data, and those focus on the data. So we had Pachyderm in 2014 already, and Data Version Control, DVC, in 2016. But in the last year, we see additional three or four companies named Activeloop and Graviti, with an I, that are really focused on unstructured data, as you mentioned. And their main mission is to help you manage the data throughout the life cycle of modeling, rather than the model itself. And we all know garbage in, garbage out. To just prevent that, there’s a lot of tooling that shows you an excellent visualization actually of the quality of your labeling and optimization algorithms that allow you to prioritize the labeling in a way that more efficiently improve your models because they cover the right parts within the data sets that you would like the model to improve in, and so on.

So I think it’s beautiful that the tooling is coming together with this democratization of the data sets themselves. So we would have the democratized data sets in very high quality of preparation and labeling in a way that will really allow to get excellent results from those. And of course, commercial companies who don’t share the data sets would be able to enjoy those tools, in order to improve that very frustrating part in the machine learning life cycle of data preparation.

Roland Merteens: I think that what you mentioned is so true, that especially if you are just starting with a new problem, the quality of your data is so important, you’re always training, you’re always telling a neural network and punishing it by getting an answer right or wrong. And this has to be perfect all the time, especially if you are just starting a new project. You just have to have the perfect data with high quality, and indeed have to have an interesting overview of all the data which is going into your system. I totally agree to that.

Dr. Einat Orr: Also in the later stages, once you already implemented your model into production, so following its accuracy and making sure that it’s still relevant, includes adding additional data points that you have just collected from production, put those back into your training sets in order to improve the model as fast and possible and adjust it to the changes that you see in the qualities of your data set as it goes along. So again, the tools that focus on data, try and focus on it on all parts of their data, life cycle management of ML. It’s really fascinating and a very useful trend.

Roland Merteens: Do you have any tips on data selection? If you would have to select data, any tools you would specifically use?

Dr. Einat Orr: Well, I’m afraid I’m a theoretician of this. I read about the companies, but I am currently not practicing. Although I did for very many years, I’m not practicing as a data engineer, and I have not tried any of them. I’m just impressed by the approach that I really believe in. As we all said, from our own experience, having high quality data is critical. And making sure it stays high quality throughout the life cycle, is just critical.

Roland Merteens: And also realizing the quality of the data you’re working with, because your model is never going to perform better than the quality you put in. So realizing exactly where the weaknesses in your data are, will tell you where the weaknesses in your performance will be. So Yes, data quality is massively important.

Dr. Einat Orr: So Galileo (https://www.rungalileo.io/), if you search in Google, have a beautiful offering around that as well.

Srini Penchikala: Yes, definitely data is where it all starts. I want to share probably a silly example of the Google Mail AI/ML innovation. So the other day, I was composing a new email message, and I typed everything up, and I was going to find a subject for the email. But Google Mail already automatically parsed the content of the message that I just typed, and did the verb to noun in translation, and actually it came up with a recommended subject for my email. And it was so accurate, so it was kind of scary, scary accurate. I thought like, “Wow.” So it just parsed through like two paragraphs of the content and found exactly what was the focus of the email, and suggested the email subject, email title. I thought like that’s very interesting and also a little bit scary. Right?

Rags Srinivas: I think most of my email, I compose in two characters. Everything else is a tab.

Srini Penchikala: Yes, that’s true.

Rags Srinivas: But I don’t want to minimize data. But obviously, the entire life cycle is important. And Srini, I don’t know if we’re going to talk about MLOps in general, but we talked about it last year for sure. But how does that factor in the bigger discussion? Because part of that is obviously kind of making sure your data is accurate, making sure your data is up to date and so on and all that. But not only that, then you move on to the ModelOps where you’re making sure your model is correct and so on and so forth.

Srini Penchikala: Yes, definitely. Yes, if anybody has any thoughts on that, the whole operationalizing, bringing the CI/CD and DevOps practices to machine learning, if you have any thoughts, please go ahead and share those with us.

Rags Srinivas: I think I’ll start off, and probably go with the things that I mentioned last podcast, but I think we’ve gotten to a point where we realized that it’s really about all phases of the life cycle, if you will. So not just data, but also it’s tuning your model, keeping it consistent, tweaking those parameters, making sure… Again, going back to my developer world, I want to be able to store everything somewhere. I want to be able to snapshot it somewhere. And that’s where the Github and the Github Copilot and all those come into the picture where those can help me not only kind of snapshotting the model, but also kind of trying to make it easier to tweak it and pushing it along the chain. So I think not something revolutionary that I’m saying here, but definitely, we are trying to mimic the DevOps model to MLOps model. But the MLOps now is really more about DataOps, ModelOps, and then so on and whatever you want to put in front of Ops, right?

Dr. Einat Orr: Yes. There are so many tools out there…

Rags Srinivas: Exactly.

Dr. Einat Orr: … with so many different missions that partially overlap. And I think in the last year, what we see is that they’re actually offering more and more of the life cycle in hope, I guess, to help their users in getting what they need. But it also means that their ability to cater for more complex use cases just drops because their offerings are so wide.

Rags Srinivas: Exactly.

Dr. Einat Orr: So this is, right now, a catch on whether there is an end to end solution here, or are we going to see actually the tools that are really focusing on one mission deeply, survive this, while the end-to-end ones might find themselves only with the beginners. But of course, it’s an open question, hard to know.

Rags Srinivas: Absolutely. Is there like an opinionated implementation?

Anthony Alford: If we want to put in a plug for some InfoQ content, Francesca Lazzeri did a talk at a recent QCon event. She gave a talk about MLOps, which we have that content on the site.

Robotics and Virtual Reality [23:03]

Srini Penchikala: The other area that’s getting a lot of attention is the robotics space. So with augmented reality, virtual reality and mixed reality are part of this space. So, Daniel, you mentioned that you have done some work on this. So would you like to lead the discussion on robotic space, what’s happening in that area, and what our readers should be aware of?

Daniel Dominguez: Okay. Basically, I think one of the most important things that are going to happen in a near future are going to be related obviously to robotics and virtual reality, mainly with the metaverse. As you know, probably last year, the metaverse was not something that we were thinking of or seeing. But right now, with all the things happening around all these new trends and new technologies that are coming on, obviously there are going to be a lot of things to catch up on in this area, mainly in artificial intelligence. For example, right now, Meta, with the Meta AI lab, are doing amazing stuff regarding e-commerce, regarding deep fake detection, regarding a lot of things, mainly on augmented reality and mixed reality. This year also, Apple is going to be showing the advance on their augmented reality glasses and all the things that are happening on its way.

I think there are going to be a lot of things happening with the interaction of artificial intelligence and machine learning, mainly focusing on metaverses, also a lot of things with blockchain technology that are also related with artificial intelligence, and it’s also related on the metaverse for the tokenization, for the things that are going to happen in this space. So it’s going to be a lot of noise and a lot of things that are going to be related to this. So I think it’s going to be something that definitely our readers are going to start taking a look at regarding these new technologies that are coming in the next year.

Anthony Alford: Daniel, I was wondering about your thoughts on this. We see with robotics research, this concept of embodied AI. And a lot of researchers are doing essentially simulations, 3D world challenges for example. What are your thoughts on that? Do you feel like that’s a good solution to try to build real world robots?

Daniel Dominguez: Yes, definitely. For example, all the things… Right now, for example, AWS has a new tool which is also for finding robotics and the AWS Robotics. So there are a lot of simulations that probably companies or people interested in research on robotics can work on those tools, can simulate all the environments, can simulate all the aspects that they were to do on robotics. And this is based on virtual reality. So there is going to be a lot of important information and important things that they can do before the actual robot is built. Also, for example, in medicine, there are a lot of tools now. For example, I just read some time ago that there was a first surgery made on virtual reality from one doctor in one place, making the surgery to a patient in another place, and everything was done by virtual reality, and that virtual reality was controlling the real robot in the real environment. So that was a pretty cool thing to see on how the robotics and virtual reality are interacting in the virtual world and in the real world.

Roland Merteens: I actually, last week, was at a meetup where the CEO of the company couldn’t be there, but he had his virtual reality glasses with him, and the company was building a remote presence robot. So he was virtually present while wearing his virtual reality goggles in the train and controlling the robot at a distance. These things are just amazing to see if you can be at any place at any time, as long as there is something for you to control. And especially during the pandemic, I noticed that hanging out with people in virtual reality is actually a great alternative when you cannot meet each other in person. Did anyone else here in this group try this?

Rags Srinivas: I prefer meeting people in person, but I’m old fashioned, let’s put it that way. But I think the tools kind of help a lot. And I also saw one demo that kind of blew me away, which is like a virtual reality where the conductor was virtual. And I think this was a pretty famous demo, whatever, much talked about. But essentially, part of the orchestra was in different cities, and they were able to kind of have a concert, seamless. And this was at the peak of COVID I guess. But I still think those are very powerful examples of where AI has made inroads like never before. Right?

Roland Merteens: I think, just going back to your first comment about, “I like to meet people in person,” I see it really as, these virtual reality meetings are, for me, a great alternative to having a Zoom call. So we are recording this podcast, and I’ve seen Srini many times virtually, but I would love to see him in person. But going to him would take a lot of time and a lot of effort. So that’s why I like talking to him on the phone like this. And the virtual reality just adds an extra dimension to it. I can recommend playing mini-golf together with people, to have a bit of real bonding going on without actually all flying and driving to a minigolf place.

Srini Penchikala: Yes. You can do a lot more than just screen sharing. Yes, definitely in manufacturing and industrial use cases, we hear the term digital twins. That’s pretty much what it is, the virtual self of a particular entity. So whether it’s a car or a robot or anything else. So definitely that’s going on. A lot of innovations are happening there. Anyone else have any thoughts?

Roland Merteens: Maybe just going back to that, I think that… But this is, I think, the second time I’m saying in this podcast, that we need more psychologists, but there needs to be more research and more reasoning around embodiment. When do you feel present somewhere? When do other people appreciate you being present? A couple of weeks ago, I was with a friend, Kimberly, who joined the podcast last year at a robotics conference, but she couldn’t be there. So she was in a robot body, and I was with my real presence in Philadelphia. And you did see a difference in how people react to people in person and people with a robot body. And this is all something we have to figure out, how do we best represent everybody? How do people react to you? How can you really feel present at a conference if you’re not physically present?

Dr. Einat Orr: Well, I’m still not in social networks. So for me, it’s just… I’m a generation behind from…

Daniel Dominguez: Not only that is funny

Dr. Einat Orr: … virtual presence perspective.

Roland Merteens: And this is why it’s so important to come to the InfoQ conferences.

Daniel Dominguez: It’s funny because actually in social media, for example, one thing that you have to do is that you have, I don’t know, to have clear usernames or to have an online presence established, so people know who you are and have a search engine recognition of your personal brand. But now with the metaverse and the virtual reality, it’s going to happen the same, but your own avatar. So there are going to be a lot of avatars and a lot of platforms. So you have to start thinking, how are you going to be recognized on those platforms? The same way that you have a user name, which was probably in all your social networks. Now your own avatar or your own physical aspects on the virtual world, somehow you’re recognizable. So you are recognized in those virtual worlds. So that’s going to be also something that… Like the personal brand, now’s going to be like in the personal world. So there’s going to be a lot of things happening around that space as well.

Dr. Einat Orr: There’s also the concern that, as we know from social networks, when people don’t really identify themselves, they allow themselves to behave way more radically. What happens when that has a physical aspect to it, even if it is a virtual physical aspect? So I think there’s a lot of moderating to be done here.

Srini Penchikala: Yes. And I think that’s where the psychology comes into the picture again. Roland has a great point. As we bring the virtual world as close as possible to the real world, we have to treat it with the same expectations, same behavior in both worlds. Right?

Dr. Einat Orr: Maybe we also need philosophers.

Roland Merteens: Yes. And we just have to discover what makes you feel real, what is acceptable behavior. This is just something which we see happening right now. Virtual reality feels a lot more real when you are having robot hands in virtual reality. So that’s at least one thing, but it’s just the same for when you are a robot at a physical place, do you need to have hands, and also what are the acceptable behaviors. When cell phones were just introduced, we would all have funny ringtones. And now everybody acknowledged that you are not having a ringtone. No, your phone is silent. These things will change over time. We have to, as a society, figure it out. I think that is definitely an interesting emerging trend.

AI/ML Infrastructure, Cloud Computing and Kubernetes [31:39]

Srini Penchikala: Yes. They evolve over time. Okay. I definitely want to also make sure that we have time for the other part of this overall AI/ML discussion, which is the infrastructure. Infrastructure is the foundation. That’s where basically everything kind of starts. Without a good infrastructure, we would not have any successful AI/ML initiatives. So let’s talk a little bit about that. Rags, I know you’ve been focusing on this area for a while, especially on how technologies like Kubernetes can help with developing and deploying software applications, including machine learning apps. Can you discuss a little bit about how do you see AI/ML infrastructure shaping up since last year, and what’s new happening in this area?

Rags Srinivas: I used to jokingly refer that your silver bullet for all your problems is Kubernetes, which is not true. We know that.

Anthony Alford: Now you have two problems.

Rags Srinivas: Now you have two problems. Right. Right, exactly. But I think the point is that the whole compute kind of becomes multiple dimensions. The first thing you want to think about is you really don’t have the power backing your on-prem systems. So most of the computing is happening on the cloud. I don’t have statistics for that, but I have a feeling that quite a bit of this, the data might still remain back in on-prem. And that is another thing that you need to want to consider when it comes to infrastructure. There are bigger and bigger pipes being built, even between cloud providers.

Multi-cloud has become a big thing in the Kubernetes world. There is a lot of talk about multi-cloud, and really not many are exactly doing it. And if you think about AI, there is definitely a case made for cloud-agnostic computing. You want be cloud-agnostic and you want to be able to kind of, for example, use my GPU’s in Azure, use my message passing on GKE and use my whatever favorite CPU it is on EKS, whatever it is. I need to be able to use those different combinations. And I don’t think that has even really been uncovered yet, primarily because even though Kubernetes is solving a lot of problems for the world, especially when it comes to the infrastructure side, it’s still freaking complicated. It is very complicated from a user perspective. And if you expect users to be able to set it up, to be able to tune it themselves in all that, it becomes really hard.

So unfortunately, again, I don’t think there is any major thing that has kind of made it a lot easier from an infrastructure perspective, to be able to… I think Amazon recently announced a package offer, which is very similar to what Azure had before, for HPC. So the idea is that, again, can we bundle up a few of these things in a kind of an opinionated implementation that I was talking about before. Kind of similar to that. And again with the AI, I think it kind of stretches most of the compute to the limit. Like for example, being able to autoscale, to be able to scale to thousands and thousands of nodes, really stretches the cloud limits and Kubernetes as well, quite a bit. But I still think that I’m a big fan of multi-cloud and cloud-agnostic computing. And I think we are kind of moving there and moving actually quicker than I thought it would be, because I know that none of the clouds really, really want to do it, unless they are forced to do it, kind of kicking and screaming, and that’s kind of where we are right now.

Closing Remarks [34:59]

Srini Penchikala: Thanks, Rags. I think that’s the last topic we wanted to discuss in this podcast. Let’s go ahead and wrap it up. If we can go around the room and talk about, briefly, closing remarks, and then we can end the podcast. Einat, we’ll start with you.

Dr. Einat Orr: Thank you very much. I think it was a fascinating discussion. So my take is of course around the importance of the data, it’s quality, it’s manageability from the inputs throughout the intermediate artifacts. The model itself is also data, and of course the implementation and the tracking and production. So we are dealing with all kinds of data artifacts that will require our attention. The better their quality, the better the quality of our work. And focusing on tooling and best practices around that, would improve the work, not only of ML engineers, but any data practitioner that is using data for any other purpose as well.

Srini Penchikala: How about you, Rags?

Rags Srinivas: It’s a great discussion, a lot of things that I learn here being part of the panel. But essentially, I’m hoping that the infrastructure will provide for a more cloud-agnostic multi-cloud, make it easier from a cost perspective to be able to solve all these different dimensions. It’s not an easy problem. I’m sure it’s a multi-year effort, but I think it’s not technologically very hard. It’s just politically not seen as a big thing to do right now. But hopefully, something is going to change and something is going to trigger. And I think AI might be the trigger that will make it happen.

Srini Penchikala: How about you, Daniel?

Daniel Dominguez: I think it was a very interesting conversation. A lot of things are going to change from the community trends and technologies. And I think it’s very cool to see all the things that are going to happen in this space over the next few years.

Srini Penchikala: Roland?

Roland Merteens: Maybe one clock, if people want to learn more about robotics, is to subscribe to their weekly robotics newsletter by Mat Sadowski. I think he started it a few years ago. It’s really good. It gives you all the latest and greatest in robotics. So Yes, I can recommend his newsletter for robotics-minded people.

Srini Penchikala: And Anthony?

Anthony Alford: I can just say what an exciting time to be involved with this. There’s so many developments going on, it feels like the boundaries are being pushed constantly. I’ve lived through a couple of AI winters myself, by now, and I sometimes wonder is a little bit of the shine wearing off? But as long as we keep seeing new developments, like these image-text models and the democratization, I think that we’ll continue to see some new developments, and I think that’s very exciting.

Srini Penchikala: Thank you all. I agree with you all. So we used to hear the phrase software is eating the world. Now I want to say AI and ML are eating the world. So thank you all for your participation. That’s a wrap for this podcast. Thank you.

About the Authors

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Slate 0.82 and Plate 15 Releases Improve Rich-Text Editing Experience

MMS Founder
MMS Dylan Schiemann

Article originally posted on InfoQ. Visit InfoQ

Slate.js, the completely customizable framework for building rich text editors, and Plate, a large collection of components and plugins for Slate. Both are authored in TypeScript.

Slate 0.82.0 made significant updates for its support of editing on Android devices and introduces useSlateSelection, and slate-react 0.82.0 additionally adds a React hook that triggers whenever the selection changes.

Slate provides vanilla JavaScript versions of its editor along a slate-react, focused on React developers. Third-party teams also create version of Slate optimized for Angular and Vue.js users.

Plate 15 improves its Link toolbar, improves paste handling of hyperlinks into an editor page, and provides greater control over rendering within plugins.

Real-time collaborative editing is possible with Slate and Plate, most commonly with slate-yjs. Other popular utilities include the remark-slate and remark-slate-transformer. Developers looking to create a math and science editor can leverage the Slate-based CoCalc.

Slate and Plate release regularly, typically the same day as significant updates land in their main branches. Work is actively underway to fully support React 18.

Developers can explore available features in the Slate examples and Plate playground.

Slate and Plate are the editor of choice for Living Spec, while Slate is also used by several editor products including Slite, Coda, KiteMaker, and many more products.

Both Slate and Plate are open-source software available under the MIT license. Contributions and feedback are encouraged via the Slate GitHub project and Plate GitHub project and should follow the Slate contribution guidelines and Plate contribution guidelines respectively. Slate and Plate share an active discussion communit via the Slate Slack group.

Disclosure: The author of this news entry is a member of the core team for Slate and Plate and works on the Living Spec product mentioned in this news entry.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.