在Windows10搭建yolov5的开发环境，简单训练模型，以及脱离深度学习环境部署模型

新手记录搭建yolo开放环境

软件安装

安装显卡驱动

根据电脑配置安装对应的显卡驱动，没有显卡不安装也行

安装Anaconda

由于深度学习环境的各种依赖特别繁琐和复杂，用Anaconda来管理深度学习环境会比较方便，相当于在电脑里面划分出一块空间单独给深度学习用，无论怎么折腾它都不会影响到电脑本来的开发环境，可以避免很多不必要的麻烦
打开Anaconda官网https://www.anaconda.com/products/distribution#Downloads，下载相应安装文件安装即可
注意：后续开发环境占用空间非常大，建议不要安装在C盘，并预留50G以上的空间

安装pycharm

python最好用的ide，一开始我用的VSCode，配置起来麻烦，用pycharm简单很多。没啥好说的，官网下载安装即可

环境配置

配置PyTorch

安装PyTorch

安装好Anaconda后打开Anaconda Prompt或者Anaconda Powershell Prompt，Anaconda Powershell Prompt比Anaconda Prompt多了一些Linux下的命令操作，使用上更加方便，用哪个都行。
使用命令查看当前有哪些环境

1	conda env list

(base) PS C:\Users\Kepler> conda env list
# conda environments:
#
base                  *  D:\Anaconda3

(base) PS C:\Users\Kepler>

默认只有一个base环境，输入命令创建一个新的环境pytorch，询问是否安装的时候，输入y以安装

1	conda create -n pytorch python=3.8

安装完成后应该有两个环境

(base) PS C:\Users\Kepler> conda env list
# conda environments:
#
base                  *  D:\Anaconda3
pytorch                  D:\Anaconda3\envs\pytorch

(base) PS C:\Users\Kepler>

输入命令切换环境

1	conda activate pytorch

换源加速安装

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --set show_channel_urls yes

准备工作做完了，打开PyTorch官网,按下图复制安装命令，不可以复制完整，要不然下载很慢

1	conda install pytorch torchvision torchaudio cudatoolkit=11.3

等待安装完成，这里可能会安装失败，多试几次就可以了

验证CUDA、CUDNN

打开pycharm，新建一个工程，复制代码运行

import torch
print(torch.cuda.is_available())
print(torch.backends.cudnn.is_available())
print(torch.cuda_version)
print(torch.backends.cudnn.version())

输出以下信息则成功安装了CUDA和CUDNN

True
True
11.3
8200

配置YOLOV5

克隆YOLOV5

从yolov5的仓库下载或克隆一份代码到本地，用pycharm打开工程：

data：

主要是存放一些超参数的配置文件（这些文件（yaml文件）是用来配置训练集和测试集还有验证集的路径的，其中还包括目标检测的种类数和种类的名称）还有一些官方提供测试的图片。如果是训练自己的数据集的话，那么就需要修改其中的yaml文件。但是自己的数据集不建议放在这个路径下面，而是建议把数据集放到yolov5项目的同级目录下面

models：

里面主要是一些网络构建的配置文件和函数，其中包含了该项目的四个不同的版本，分别为是s、m、l、x。从名字就可以看出，这几个版本的大小。他们的检测测度分别都是从快到慢，但是精确度分别是从低到高。这就是所谓的鱼和熊掌不可兼得。如果训练自己的数据集的话，就需要修改这里面相对应的yaml文件来训练自己模型。

utils：

存放的是工具类的函数，里面有loss函数，metrics函数，plots函数等等。

weights：

放置训练好的权重参数。

detect.py：

利用训练好的权重参数进行目标检测，可以进行图像、视频和摄像头的检测。

train.py：

训练自己的数据集的函数。

test.py：

测试训练的结果的函数。

requirements.txt：

这是一个文本文件，里面写着使用yolov5项目的环境依赖包的一些版本，可以利用该文本导入相应版本的包

设置python解释器

设置工程的python解释器为之前在Anaconda中创建的pytorch，在pycharm右下角设置

安装依赖

在yolov5主目录打开终端输入命令安装

1	pip install -r requirements.txt

准备数据集

爬图

使用以下代码从百度图库爬取图片，根据需求修改keyword和max_download_images

import os
import re
from typing import List, Tuple
from urllib.parse import quote

import requests

# 关键词, 改为你想输入的词即可, 相当于在百度图片里搜索一样
keyword = '人'

# 最大下载数量
max_download_images = 30

url_init_first = 'https://image.baidu.com/search/flip?tn=baiduimage&word='
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/88.0.4324.192 Safari/537.36'
}

def get_page_urls(page_url: str, headers: dict) -> Tuple[List[str], str]:
    if not page_url:
        return [], ''
    try:
        html = requests.get(page_url, headers=headers)
        html.encoding = 'utf-8'
        html = html.text
    except IOError as e:
        print(e)
        return [], ''
    pic_urls = re.findall('"objURL":"(.*?)",', html, re.S)
    next_page_url = re.findall(re.compile(r'<a href="(.*)" class="n">下一页</a>'), html, flags=0)
    next_page_url = 'http://image.baidu.com' + next_page_url[0] if next_page_url else ''
    return pic_urls, next_page_url

def down_pic(pic_urls: List[str], max_download_images: int) -> None:
    pic_urls = pic_urls[:max_download_images]
    for i, pic_url in enumerate(pic_urls):
        try:
            pic = requests.get(pic_url, timeout=15)
            image_output_path = './images/' + keyword + str(i + 1) + '.jpg'
            with open(image_output_path, 'wb') as f:
                f.write(pic.content)
                print('成功下载第%s张图片: %s' % (str(i + 1), str(pic_url)))
        except IOError as e:
            print('下载第%s张图片时失败: %s' % (str(i + 1), str(pic_url)))
            print(e)
            continue

if __name__ == '__main__':
    url_init = url_init_first + quote(keyword, safe='/')
    all_pic_urls = []
    page_urls, next_page_url = get_page_urls(url_init, headers)
    all_pic_urls.extend(page_urls)

    page_count = 0  
    if not os.path.exists('./images'):
        os.mkdir('./images')

    while 1:
        page_urls, next_page_url = get_page_urls(next_page_url, headers)
        page_count += 1
        print('正在获取第%s个翻页的所有图片链接' % str(page_count))
        if next_page_url == '' and page_urls == []:
            print('已到最后一页，共计%s个翻页' % page_count)
            break
        all_pic_urls.extend(page_urls)
        if len(all_pic_urls) >= max_download_images:
            print('已达到设置的最大下载数量%s' % max_download_images)
            break

    down_pic(list(set(all_pic_urls)), max_download_images)

数据集打标签

利用labelimg给数据集打标签

安装labelimg

1	pip install labelimg -i https://pypi.tuna.tsinghua.edu.cn/simple

准备原始数据

新建如下目录结构

按上图新建好文件目录结构和文件
将上一步准备的数据复制到source_image/source_image/VOC/Image目录下
在predefined_classes.txt中写入训练class，如person dog

复制转换代码到converter.py中

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import random
from shutil import copyfile
 
classes = ["person", "dog"]
#classes=["ball"]
 
TRAIN_RATIO = 80
 
def clear_hidden_files(path):
    dir_list = os.listdir(path)
    for i in dir_list:
        abspath = os.path.join(os.path.abspath(path), i)
        if os.path.isfile(abspath):
            if i.startswith("._"):
                os.remove(abspath)
        else:
            clear_hidden_files(abspath)
 
def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)
 
def convert_annotation(image_id):
    in_file = open('./source_image/VOC/Annotations/%s.xml' %image_id)
    out_file = open('./source_image/VOC/YOLOLabels/%s.txt' %image_id, 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
 
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    in_file.close()
    out_file.close()
 
wd = os.getcwd()
wd = os.getcwd()
data_base_dir = os.path.join(wd, "source_image/")
if not os.path.isdir(data_base_dir):
    os.mkdir(data_base_dir)
work_sapce_dir = os.path.join(data_base_dir, "VOC/")
if not os.path.isdir(work_sapce_dir):
    os.mkdir(work_sapce_dir)
annotation_dir = os.path.join(work_sapce_dir, "Annotations/")
if not os.path.isdir(annotation_dir):
        os.mkdir(annotation_dir)
clear_hidden_files(annotation_dir)
image_dir = os.path.join(work_sapce_dir, "Images/")
if not os.path.isdir(image_dir):
        os.mkdir(image_dir)
clear_hidden_files(image_dir)
yolo_labels_dir = os.path.join(work_sapce_dir, "YOLOLabels/")
if not os.path.isdir(yolo_labels_dir):
        os.mkdir(yolo_labels_dir)
clear_hidden_files(yolo_labels_dir)
yolov5_images_dir = os.path.join(data_base_dir, "images/")
if not os.path.isdir(yolov5_images_dir):
        os.mkdir(yolov5_images_dir)
clear_hidden_files(yolov5_images_dir)
yolov5_labels_dir = os.path.join(data_base_dir, "labels/")
if not os.path.isdir(yolov5_labels_dir):
        os.mkdir(yolov5_labels_dir)
clear_hidden_files(yolov5_labels_dir)
yolov5_images_train_dir = os.path.join(yolov5_images_dir, "train/")
if not os.path.isdir(yolov5_images_train_dir):
        os.mkdir(yolov5_images_train_dir)
clear_hidden_files(yolov5_images_train_dir)
yolov5_images_test_dir = os.path.join(yolov5_images_dir, "val/")
if not os.path.isdir(yolov5_images_test_dir):
        os.mkdir(yolov5_images_test_dir)
clear_hidden_files(yolov5_images_test_dir)
yolov5_labels_train_dir = os.path.join(yolov5_labels_dir, "train/")
if not os.path.isdir(yolov5_labels_train_dir):
        os.mkdir(yolov5_labels_train_dir)
clear_hidden_files(yolov5_labels_train_dir)
yolov5_labels_test_dir = os.path.join(yolov5_labels_dir, "val/")
if not os.path.isdir(yolov5_labels_test_dir):
        os.mkdir(yolov5_labels_test_dir)
clear_hidden_files(yolov5_labels_test_dir)
 
train_file = open(os.path.join(wd, "yolov5_train.txt"), 'w')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'w')
train_file.close()
test_file.close()
train_file = open(os.path.join(wd, "yolov5_train.txt"), 'a')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'a')
list_imgs = os.listdir(image_dir) # list image files
prob = random.randint(1, 100)
print("Probability: %d" % prob)
for i in range(0,len(list_imgs)):
    path = os.path.join(image_dir,list_imgs[i])
    if os.path.isfile(path):
        image_path = image_dir + list_imgs[i]
        voc_path = list_imgs[i]
        (nameWithoutExtention, extention) = os.path.splitext(os.path.basename(image_path))
        (voc_nameWithoutExtention, voc_extention) = os.path.splitext(os.path.basename(voc_path))
        annotation_name = nameWithoutExtention + '.xml'
        annotation_path = os.path.join(annotation_dir, annotation_name)
        label_name = nameWithoutExtention + '.txt'
        label_path = os.path.join(yolo_labels_dir, label_name)
    prob = random.randint(1, 100)
    print("Probability: %d" % prob)
    if(prob < TRAIN_RATIO): # train dataset
        if os.path.exists(annotation_path):
            train_file.write(image_path + '\n')
            convert_annotation(nameWithoutExtention) # convert label
            copyfile(image_path, yolov5_images_train_dir + voc_path)
            copyfile(label_path, yolov5_labels_train_dir + label_name)
    else: # image dataset
        if os.path.exists(annotation_path):
            test_file.write(image_path + '\n')
            convert_annotation(nameWithoutExtention) # convert label
            copyfile(image_path, yolov5_images_test_dir + voc_path)
            copyfile(label_path, yolov5_labels_test_dir + label_name)
train_file.close()
test_file.close()

使用labelimg打标签

在source_image/source_image/VOC/目录下打开终端，输入命令启动labelimg

1	labelimg Images predefined_classes.txt

注意设置目标格式

标签格式转换

运行source_image/source_image/目录下coverter.py,将VOC格式标签xml文件转换成yolo格式标签txt文件，并将数据划分成训练数据和验证数据
转换后文件中会被填充入数据

训练模型

获取预训练权重

一般为了缩短网络的训练时间，并达到更好的精度，我们一般加载预训练权重进行网络的训练。而yolov5的5.0版本给我们提供了几个预训练权重，我们可以对应我们不同的需求选择不同的版本的预训练权重。通过如下的图可以获得权重的名字和大小信息，可以预料的到，预训练权重越大，训练出来的精度就会相对来说越高，但是其检测的速度就会越慢。预训练权重可以通过这个网址进行下载，本次训练自己的数据集用的预训练权重为yolov5s.pt

修改配置文件

拷贝数据集

将之前准备好的数据集拷贝到yolov5主目录中

修改voc.yaml

找到data目录下的voc.yaml文件，将该文件复制一份，将复制的文件重命名，最好和项目相关，这样方便后面操作。我这里修改为person_dog.yaml。该项目是对人和狗的识别

按图片修改person-dog内容（根据自己的实际情况修改）

修改模型配置文件

找到models目录下的yolov5s.yaml文件，将该文件复制一份，将复制的文件重命名，最好和项目相关，这样方便后面操作。我这里修改为person_dog.yaml。该项目是对人和狗的识别

按图片修改内容，只需要改nc的值即可（根据自己的实际情况修改）

修改训练参数

打开train.py文件，修改以下几个参数

parser.add_argument('--weights', type=str, default='weights/yolov5s.pt', help='initial weights path') #预训练权重文件
parser.add_argument('--cfg', type=str, default='models/person-dog.yaml', help='model.yaml path') #模型配置
parser.add_argument('--data', type=str, default='data/person-dog.yaml', help='data.yaml path') # 数据集
parser.add_argument('--epochs', type=int, default=300)  # 训练轮数
parser.add_argument('--batch-size', type=int, default=8, help='total batch size for all GPUs')
parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers')

改完后运行train.py开始训练

如遇虚拟内存爆了，将utils路径下datasets.py的num_workers改为0

tensorbord

运行

1	tensorboard --logdir=runs/train

点击生成的网址即可跳转浏览器查看训练过程数据

推理验证

训练结束后会在yolo主目录生成run文件夹，权重文件在run/train/exp/weights/下，best.pt是最好的权重文件，last.pt是最后一轮训练的权重文件。
打开主目录下的detect.py文件，修改设置进行推理测试。

传入权重文件路径，就是刚刚训练的结果

1	parser.add_argument('--weights', nargs='+', type=str, default='runs/train/exp/weights/best.pt', help='model.pt path(s)')

传入要测试的图片路径，改为’0’则是打开摄像头

1	parser.add_argument('--source', type=str, default='person.jpg', help='source')

嵌入式部署基础

在进行嵌入式部署的时候，为了简化依赖，需要将pt文件转为onnx文件进行推理，只需要安装opencv就可以完成推理

pt、pth、onnx转换

pt转pth代码

import torch
import pickle
import argparse
from collections import OrderedDict

if __name__ == '__main__':
    device = 'cuda' if torch.cuda.is_available() else 'cpu'

    parser = argparse.ArgumentParser()
    parser.add_argument('--source', default='best')
    args = parser.parse_args()

    modelfile = args.source + '.pt'
    utl_model = torch.load(modelfile, map_location=device)
    utl_param = utl_model['model'].model
    torch.save(utl_param.state_dict(), args.source + '.pth')
    own_state = utl_param.state_dict()
    print(len(own_state))

    numpy_param = OrderedDict()
    for name in own_state:
        numpy_param[name] = own_state[name].data.cpu().numpy()
    print(len(numpy_param))
    with open(args.source + '_numpy_param.pkl', 'wb') as fw:
        pickle.dump(numpy_param, fw)

pth转onnx代码

见目录

使用OpenCV推理ONNX

import cv2
import argparse
import numpy as np

class yolov5:
    def __init__(self, yolo_type, confThreshold=0.5, nmsThreshold=0.5, objThreshold=0.5):
        self.classes = ['person', 'dog']
        self.colors = [np.random.randint(0, 255, size=3).tolist() for _ in range(len(self.classes))]
        num_classes = len(self.classes)
        anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]]
        self.nl = len(anchors)
        self.na = len(anchors[0])
        self.no = num_classes + 5
        self.grid = [np.zeros(1)] * self.nl
        self.stride = np.array([8., 16., 32.])
        self.anchor_grid = np.asarray(anchors, dtype=np.float32).reshape(self.nl, 1, -1, 1, 1, 2)

        self.net = cv2.dnn.readNet(yolo_type + '.onnx')
        self.confThreshold = confThreshold
        self.nmsThreshold = nmsThreshold
        self.objThreshold = objThreshold

    def _make_grid(self, nx=20, ny=20):
        xv, yv = np.meshgrid(np.arange(ny), np.arange(nx))
        return np.stack((xv, yv), 2).reshape((1, 1, ny, nx, 2)).astype(np.float32)

    def postprocess(self, frame, outs):
        frameHeight = frame.shape[0]
        frameWidth = frame.shape[1]
        ratioh, ratiow = frameHeight / 640, frameWidth / 640
        # Scan through all the bounding boxes output from the network and keep only the
        # ones with high confidence scores. Assign the box's class label as the class with the highest score.
        classIds = []
        confidences = []
        boxes = []
        for out in outs:
            for detection in out:
                scores = detection[5:]
                classId = np.argmax(scores)
                confidence = scores[classId]
                if confidence > self.confThreshold and detection[4] > self.objThreshold:
                    center_x = int(detection[0] * ratiow)
                    center_y = int(detection[1] * ratioh)
                    width = int(detection[2] * ratiow)
                    height = int(detection[3] * ratioh)
                    left = int(center_x - width / 2)
                    top = int(center_y - height / 2)
                    classIds.append(classId)
                    confidences.append(float(confidence))
                    boxes.append([left, top, width, height])

        # Perform non maximum suppression to eliminate redundant overlapping boxes with
        # lower confidences.
        indices = cv2.dnn.NMSBoxes(boxes, confidences, self.confThreshold, self.nmsThreshold)
        for i in indices:
            j = i
            box = boxes[j]
            left = box[0]
            top = box[1]
            width = box[2]
            height = box[3]
            frame = self.drawPred(frame, classIds[j], confidences[j], left, top, left + width, top + height)
        return frame

    def drawPred(self, frame, classId, conf, left, top, right, bottom):
        # Draw a bounding box.
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), thickness=4)

        label = '%.2f' % conf
        label = '%s:%s' % (self.classes[classId], label)

        # Display the label at the top of the bounding box
        labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
        top = max(top, labelSize[1])
        # cv.rectangle(frame, (left, top - round(1.5 * labelSize[1])), (left + round(1.5 * labelSize[0]), top + baseLine), (255,255,255), cv.FILLED)
        cv2.putText(frame, label, (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), thickness=2)
        return frame

    def detect(self, srcimg):
        blob = cv2.dnn.blobFromImage(srcimg, 1 / 255.0, (640, 640), [0, 0, 0], swapRB=True, crop=False)
        # Sets the input to the network
        self.net.setInput(blob)

        # Runs the forward pass to get output of the output layers
        outs = self.net.forward(self.net.getUnconnectedOutLayersNames())

        z = []  # inference output
        for i in range(self.nl):
            bs, ns, ny, nx = outs[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            # outs[i] = outs[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
            out = outs[i].reshape(bs, self.na, self.no, ny, nx).transpose(0, 1, 3, 4, 2)
            # outs[i].resize(bs, 3, ny, nx, int(ns/3))
            if self.grid[i].shape[2:4] != out.shape[2:4]:
                self.grid[i] = self._make_grid(nx, ny)

            y = 1 / (1 + np.exp(-out))  ### sigmoid
            ###其实只需要对x,y,w,h做sigmoid变换的， 不过全做sigmoid变换对结果影响不大，因为sigmoid是单调递增函数，那么就不影响类别置信度的排序关系，因此不影响后面的NMS
            ###不过设断点查看类别置信度，都是负数，看来有必要做sigmoid变换把概率值强行拉回到0到1的区间内
            y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * int(self.stride[i])
            y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
            z.append(y.reshape(bs, -1, self.no))
        z = np.concatenate(z, axis=1)
        return z

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--source", default='0', type=str, help="image path")
    parser.add_argument('--net_type', default='best', choices=['yolov5s', 'yolov5l', 'yolov5m', 'yolov5x'])
    parser.add_argument('--confThreshold', default=0.5, type=float, help='class confidence')
    parser.add_argument('--nmsThreshold', default=0.5, type=float, help='nms iou thresh')
    parser.add_argument('--objThreshold', default=0.5, type=float, help='object confidence')
    args = parser.parse_args()

    if not args.source.isdigit():
        yolonet = yolov5(args.net_type, confThreshold=args.confThreshold, nmsThreshold=args.nmsThreshold, objThreshold=args.objThreshold)
        frame = cv2.imread(args.source)
        dets = yolonet.detect(frame)
        frame = yolonet.postprocess(frame, dets)
        cv2.imshow('result', frame)
        cv2.waitKey(0)
        cv2.destroyAllWindows()
    else:
        cap = cv2.VideoCapture(int(args.source))
        while cap.isOpened():
            ok, frame = cap.read()
            if not ok:
                break
            yolonet = yolov5(args.net_type, confThreshold=args.confThreshold, nmsThreshold=args.nmsThreshold, objThreshold=args.objThreshold)
            dets = yolonet.detect(frame)
            frame = yolonet.postprocess(frame, dets)
            cv2.imshow('result', frame)
            c = cv2.waitKey(1) & 0xFF
            if c == 27 or c == ord('q'):
                break
        cap.release()
        cv2.destroyAllWindows()

参考链接：

https://blog.csdn.net/didiaopao?type=blog （主要参考对象，yolov5配置全网最详细保姆级教程）
https://blog.csdn.net/nihate/article/details/112731327 (opencv推理)
https://github.com/ultralytics/yolov5

鱼香ROS

YOLOV5环境搭建及导出ONNX部署