AIMultiple ResearchAIMultiple Research

Pytorch Lightning in '24: What's new, benefits & key features

Today, almost everyone is aware of the power of data and how useful it can be to use data to solve various problems. However, not everyone can use data efficiently to drive beneficial insights. Big data stats show that companies cannot analyze 60% to 70% of all data due to several reasons:

  • Lack of awareness: Not being aware that data on hand can be analyzed more efficiently
  • Lack of computing resources
  • Lack of know-how: Not having enough people on the team that can process and analyze data

Emerging data science technologies such as PyTorch’s Lightning can make a data scientist’s life easier, helping them focus on research instead of struggling with computational problems.

What is PyTorch?

Pytorch is an open-source machine learning library that is based on the Torch library. It is mostly used for machine learning tasks such as computer vision and natural language processing. It was initially developed by Facebook’s AI Research (FAIR) team. The most common interface to use the library is Python, but it is also available in C++.

Plenty of prominent deep learning software were built on top of PyTorch including Uber’s Pyro,  Tesla’s Autopilot, HuggingFace’s Transformers and PyTorch’s Lightning. High-level features that PyTorch provides can be listed as:

  • Strong acceleration via GPUs which allows tensor computing (like NumPy)
  • Deep neural networks built on an automatic differentiation system

What is PyTorch Lightning?

PyTorch Lighting is a more recent version of PyTorch. It is  an open-source machine learning library with additional features that allow users to deploy complex models.

As the complexity and scale of deep learning evolved, some software and hardware have started to become inadequate. PyTorch Lightning was developed by the developers of PyTorch to catch up with the emerging technologies and enable users to have a better experience while building deep learning models. PyTorch was built in an era where AI research was mostly about network architectures and plenty of complex models for research or production were built with PyTorch. However, as models started to interact with each other, like Generative Adversarial Networks (GAN) or Bidirectional Encoder Representations from Transformers (BERT), adoption of new technologies became inevitable.

What are the benefits of using PyTorch Lightning?

Interacting models

Deep learning systems, a collection of models interacting with each other, are encapsulated by PyTorch Lightning. This means that Lightning is built for more complicated research and production cases of today’s world, where many models interact with each other using complex rules. For example, GAN models may interact with each other to yield more accurate results and PyTorch Lightning enables this interaction to be simpler than it used to be. You can check this website for a real-life application of GAN models, which creates a new artificial human face every time you refresh the page.

Overcome hardware limitations

PyTorch Lightning aims for users to focus more on science and research instead of worrying about how they will deploy the complex models they are building. Sometimes some simplifications are made to models so that the model can run on the computers available in the company. However, by using cloud technologies, PyTorch Lightning allows users to debug their model which normally requires 512 GPUs on their laptop using CPUs without needing to change any part of the code. You can check more about this cloud-based service that grid.ai provides and join to the waitlist.

Community-driven framework

PyTorch claims that Lightning has a growing contributor community of 300+ talented deep learning people around the world. This community involves researchers, academic staff and others that are aware of the needs that emerging technologies bring. Therefore, pertinent solutions are provided by this community to develop PyTorch into a more convenient library for certain machine learning tasks.

What are the key features of PyTorch Lightning?

  • Scaling ML/DL models to run on any hardware (CPU, GPUs, TPUs) without changing the model
  • Making code more readable by decoupling the research code from the engineering
  • Reproducing models easier
  • Automating most of the training loop
  • Removing boilerplates (sections of code that have to be included in many places with little or no alteration)
  • Out-of-the-box integration with popular logging/visualizing frameworks such as TensorboardMLFlowNeptune.aiComet.ml and Wandb
  • Tested with every combination of PyTorch and Python supported versions, operating systems, multi GPUs and TPUs
  • PyTorch Lightning has minimal running speed overhead (about 300 ms per epoch compared with PyTorch)
  • Computing metrics such as accuracy, precision, recall etc. across multiple GPUs
  • Automating optimization process of training models.
  • Logging
  • Checkpointing

What’s new in PyTorch Lightning?

Here, we deep dive into some of the new features.

Research & Production

Lightning’s main goal is to allow professional researchers to try the hardest ideas on the largest compute resources without losing any flexibility. With the launch of PyTorch Lightning, data scientists or researchers can now be the people who also put models into production, as there will not be a need for large teams of machine learning engineers. This helps businesses to cut production times without losing any flexibility needed for research.

Metrics

A metrics API was also created for easy metric development and usage in PyTorch Lightning. The updated API provides an in-built method to compute metrics across multiple GPUs, while at the same time storing statistics that allows users to compute the metric at the end of an epoch, without having to worry about any of the complexities associated with the distributed backend. Common metrics and their documentation are listed as:

Manual vs automatic optimization

Users no longer need to worry about enabling/disabling grads, doing backward passes, or updating optimizers as long as they return a loss with an attached graph from the training_step like:

def training_step(self, batch, batch_idx):

loss = self.encoder(batch[0])

return loss

The optimization is automated by Lightning. However, some researches like GANs or reinforcement learning where multiple optimizers or an inner loop is present may require turning off automatic optimization. In that case, users can turn off automatic optimization and fully control the training loop themselves by simply passing automatic_optimization=False as a parameter while defining the Trainer:

trainer = Trainer(automatic_optimization=False)

Logging

By calling the log() method anywhere on Lightning Module, users will be able to send the logged quantity to the logger of choice. Depending on where log() function is called from, Lightning auto-determines when the logging should take place (on every step or every epoch), but users can override the default behavior manually by using on_step and on_epoch parameters:

def training_step(self, batch, batch_idx):

self.log(‘my_loss’, loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)

Setting on_epoch=True accumulates logged values over the full training epoch.

Checkpointing

PyTorch Lightning automatically saves a checkpoint for the user in the current working directory, with the state of the last training epoch. This ensures that the user can resume training in case it is interrupted.

Users can customize the checkpointing behavior to monitor any quantity of the training or validation steps. For example, to update checkpoints based on validation loss, the user can follow the following steps:

  1. Calculate the desired metric or other quantity to be monitored (e.g. validation loss)
  2. Log the quantity using the log() method, with a key such as val_loss.
  3. Initialize the ModelCheckpoint callback, and set monitor to be the key of the quantity.
  4. Pass the callback to checkpoint_callback Trainer flag.

How to turn your PyTorch code into PyTorch Lightning?

As the library has new features, some modifications to the existing code will be necessary if you want to implement a project built with PyTorch in PyTorch Lightning. You can have an idea about how to turn your code into PyTorch Lightning by watching the following video:

If you want to learn more about how to turn your PyTorch code into PyTorch Lightning, feel free to watch the following in-depth tutorial:

Which parts of ML/DL research can be automated with PyTorch Lightning?

If you have further questions please do not hesitate to contact us:

Find the Right Vendors

This article was originally written by former AIMultiple industry analyst Izgi Arda Ozsubasi and reviewed by Cem Dilmegani

Access Cem's 2 decades of B2B tech experience as a tech consultant, enterprise leader, startup entrepreneur & industry analyst. Leverage insights informing top Fortune 500 every month.
Cem Dilmegani
Principal Analyst
Follow on

Cem Dilmegani
Principal Analyst

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments