The fast development of autonomous driving technology has established it as a fundamental
component of contemporary transportation networks. Using the Carla simulator, this project
aims to build and implement a simple autonomous driving system in a simulated setting. Lane
following, obstacle avoidance, and traffic sign recognition are among the basic functions that the
system does by integrating many deep learning models and control mechanisms. A UNet model
was employed for semantic segmentation to classify the road, sidewalk, vehicles, and pedestrians,
achieving an Intersection over Union (IoU) of 0.96 with a loss of 0.016. A VoxelCNN model
was developed to process LiDAR data for detecting vehicles, pedestrians, and drivable areas,
reaching an IoU of 0.92 and a loss of 0.096.
Additionally, a Faster R-CNN was utilized for real-time traffic signal and sign detection.
For navigation, Carla’s in-built waypoint generator, combined with a PID controller for lateral
and longitudinal control, ensured smooth and efficient movement from one location to another.
Moreover we can add the logic controller for decision making for smooth mobility of vehicle.
The results demonstrate the system’s ability to autonomously navigate a simulated environment
while accurately interpreting and responding to its surroundings. This project underscores
the potential of combining deep learning, control systems, and simulation tools in advancing
autonomous driving technologies.
Intelligent and robotic vehicles known as autonomous vehicles can navigate through traffic
without the need for human intervention [1], handling all driving situations and adhering to
traffic laws. In order to comprehend the outside world, make decisions, and act in ways that
are comparable to or superior to those of human drivers, they integrate a variety of sensors and
software elements to produce a rich representation of a scene. Thus, among other advantages,
they are anticipated to change urban traffic, enhancing accessibility, safety, and mobility while
lowering pollution emissions. Different software components and algorithms that make use of
ideas from machine learning, computer vision, decision theory, probability theory, control theory,
and other research domains carry out every step of an autonomous system’s workflow[12]. Its
development and assessment are made more difficult by such a diverse feature.
In this project we are aiming to design and implement a basic Autonomous driving system that
can navigate a simulated environment in Carla Simulator. The system should be capable of
lane following, obstacle avoidance and traffic sign recognition. The main aim of the project
is to simulate a vehicle that can safely navigate its surrounding based on the real time sensor
data. We have created different strategy to achieve the results, first we have started with the
semantic segmentation to classify road, sidewalk, vehicles and pedestrian. We will extract the
lane marking from the road and try to extract the curvature of the road in order to adjust
the steering angle. Secondly, we have attempted to create the bird’s eye view using the four
cameras attach on the four side of the view using the CNN model. Third model is to detect
the vehicles, road, sidewalk and pedestrian using the Lidar attached at the top of the Ego
vehicle, the CNN model takes the 3D tensor of size of defined voxel (volumetric data) and give
2D output containing the categorical representation of each class. Also, traffic signal and sign
detection are incorporated to control the vehicle accordingly. This project not only limited to
the Machine leaning algorithm but also required image processing, control system design and
basic decision theory as well to achieve the results.
The development of autonomous driving technologies is extremely important in the modern
world since it has the potential to revolutionize transportation and solve important social issues.
The majority of traffic accidents are caused by human error, making road safety a major global
concern. With their sophisticated sensors and decision-making algorithms, autonomous cars can
drastically lower these kinds of accidents by guaranteeing constant compliance with traffic laws
and making prompt, data-driven decisions in real time. By streamlining routes, easing traffic and facilitating more seamless navigation through adaptive driving strategies, these systems can
also improve traffic efficiency. In addition to being safer and more efficient, autonomous cars
give people who are unable to drive like the elderly or people with disabilities more freedom and
mobility. When combined with electric vehicle technology, autonomous systems optimize driving
behaviour to reduce emissions and fuel consumption, which helps create a more sustainable
future.
Additionally, this issue is relevant in both the technological and economic spheres. Creat
ing autonomous systems pushes the limits of innovation in control systems, computer vision,
and machine learning, advancing not only transportation but also robots and smart cities.
Safe development is ensured via simulation settings, like the Carla Simulator employed in this
research, which allow for economical, moral testing without endangering human life. Economi
cally speaking, autonomous cars lower operating costs in sectors like logistics while opening up
new markets for sensor and software development. Lastly, by reducing the demand for parking
spots and facilitating shared mobility options, autonomous systems have the ability to trans
form urban landscapes and create more sustainable and effective city plans. The development
of autonomous vehicle technology is a crucial and urgent undertaking because of its diverse
effects.
We used the U-Net architecture [3], a Convolutional Neural Network (CNN) built especially
for pixel-wise segmentation, to accomplish the semantic segmentation tasks in this study. U
Net is very effective for tasks requiring exact boundary predictions, including identifying roads,
vehicles, and pedestrians in an autonomous driving system, because of its encoder-decoder
structure and skip links, which allow it to absorb both local and global contextual information.
The U-Net model is composed of four main parts:
Encoder
• Uses a sequence of convolutional layers and ReLU activations to extract high-level spatial
information.
• Max-pooling layers are used to accomplish downsampling, which gradually lowers the
spatial resolution while boosting feature richness.
• Consists of four contraction blocks with twice as many feature channels in each.
Decoder
• Uses transposed convolutions, also known as upconvolutions, to reconstruct the feature
maps’ spatial resolution.
• To preserve fine-grained information, encoder features are concatenated with upsampled
features using skip connections.
• Consists of four expansion blocks that gradually cut the feature channels in half.
Bottleneck
• Captures the most abstract feature representations by serving as a link between the en
coder and the decoder.
Output layer
• Pixel-wise classification is provided by a final convolution layer that reduces the number
of channels to the required number of output classes.
Volumetric voxelized LiDAR data is processed by the second Convolutional Neural Network
(CNN) used in this research, VoxelCNN, which classifies each voxel into the following categories:
road, sidewalk, cars, pedestrians, and background. This model uses a CNN-based architecture
that works on the voxelized 3D space to effectively manage the 3D structure of LiDAR point
clouds. A 4D tensor of shape [batch size, channels, depth, height, breadth] is produced by the
VoxelCNN after processing LiDAR data.
Input Layer
• Takes 20 channels of voxel data with the shape [batch size, depth, height, width].
• Features like occupancy, intensity, or reflectivity could be encoded by each channel.
Intermediate Convolutional Blocks
• While maintaining spatial dimensions, several convolutional layers gradually expand the
feature representation from 32 to 128 channels.
Final Output
• A convolution reduces the feature maps to the number of output classes (5 in this case:
road, sidewalk, vehicles, pedestrians, and background).
SoftMax Activation
• Assigns class probabilities for each voxel, ensuring the model outputs valid classifications.
Training Details
• Dataset Size: 20,000 images.
• Epochs: 15.
• Loss Function: Binary Cross-Entropy with Logits Loss (BCEWithLogitsLoss).
• Number of Classes: 6
Performance Metrics
• Loss: The model achieved a minimal training loss of 0.016, indicating that the model
effectively minimized the error during training.
• Intersection over Union (IoU): Achieved an IoU score of 0.96, demonstrating excellent seg
mentation performance across different classes (road, sidewalk, vehicles, and pedestrians).
Visual Results
• The UNet model produced highly accurate semantic maps, correctly segmenting the road,
sidewalk, and objects in most cases.
• The high IoU score highlights the model’s robustness and capability to generalize across
diverse scenarios in the simulated environment.
Below image shows the input image, predicted mask of image and actual mask.
Vehicle, road, sidewalk, and pedestrian are the four main categories into which the VoxelCNN
model was created and trained. The main outcomes of the training process are listed below:
Training Details
• Dataset Size: 5,000 LiDAR data samples.
• Number of Epochs: 30.
• Loss Function: CrossEntropyLoss.
Performance Metrics
• Loss: Achieved a training loss of 0.096, indicating effective convergence and minimization
of classification error.
• Intersection over Union (IoU): Reached an IoU score of 0.92, demonstrating reliable seg
mentation and classification of LiDAR data into the specified categories.
Visual Results
• The IoU of 0.92 highlights the model’s capability to generalize across the simulated sce
narios, making it suitable for real-world applications with minor adjustments.
The goal of this project is to use the Carla simulator to design and simulate a simple autonomous
driving system. The main goal was to build a car that could safely navigate a simulated
environment while carrying out necessary functions including lane following, obstacle avoidance,
and traffic sign recognition. To accomplish these features, the system combines a number of
control algorithms and machine learning models.
The project consisted of three main components:
There are no datasets linked
There are no datasets linked