Automatic acute ischemic stroke lesion segmentation using semi-supervised learning

Ischemic stroke is a common disease in the elderly population, which can cause long-term disability and even death. However, the time window for treatment of ischemic stroke in its acute stage is very short. To fast localize and quantitively evaluate the acute ischemic stroke (AIS) lesions, many deep-learning-based lesion segmentation methods have been proposed in the literature, where a deep convolutional neural network (CNN) was trained on hundreds of fully labeled subjects with accurate annotations of AIS lesions. Despite that high segmentation accuracy can be achieved, the accurate labels should be annotated by experienced clinicians, and it is therefore very time-consuming to obtain a large number of fully labeled subjects. In this paper, we propose a semi-supervised method to automatically segment AIS lesions in diffusion weighted images and apparent diffusion coefficient maps. By using a large number of weakly labeled subjects and a small number of fully labeled subjects, our proposed method is able to accurately detect and segment the AIS lesions. In particular, our proposed method consists of three parts: 1) a double-path classification net (DPC-Net) trained in a weakly-supervised way is used to detect the suspicious regions of AIS lesions; 2) a pixel-level K-Means clustering algorithm is used to identify the hyperintensive regions on the DWIs; and 3) a region-growing algorithm combines the outputs of the DPC-Net and the K-Means to obtain the final precise lesion segmentation. In our experiment, we use 460 weakly labeled subjects and 15 fully labeled subjects to train and fine-tune the proposed method. By evaluating on a clinical dataset with 150 fully labeled subjects, our proposed method achieves a mean dice coefficient of 0.639, and a lesion-wise F1 score of 0.799.


INTRODUCTION
Stroke has been one of the most common causes of death and longterm disability worldwide [1], which brings tremendous pain and financial burden to patients. In general, stroke can be categorized as ischemia and hemorrhage according to the types of cerebrovascular accidents, where ischemic stroke accounts for 87% [2]. As the ischemic stroke may lead to invertible damage on brain tissues, in clinical practice, it is of paramount importance to quickly diagnose and quantitively evaluate in the acute stage to improve the treatment outcome.
In diagnosing of ischemic strokes, magnetic resonance imaging (MRI) serves as the modality of choice for clinical evaluation. The diffusion-weighted images (DWIs) and the apparent diffusion coefficient (ADC) maps derived from multiple DWIs with different bvalues have been shown to be sensitive in diagnosing acute ischemic stroke (AIS). In particular, the AIS lesions appear as hyperintense on the DWIs and hypointense on the ADC maps [3]. Figure 1 presents some examples of AIS lesions. The regions identified by the red arrows are AIS lesions. The regions identified by the yellow * Corresponding author. Email: liuzhiyang@nankai.edu.cn arrows, although also shown as hyperintense on the DWIs, are nonlesion regions. In fact, such hyperintensive regions are the magnetic susceptibility artifacts. That is to say, despite that they appear as hyperintense on the DWIs, there is no abnormality on the ADC maps. Therefore, to correctly identify the lesions, it is important to jointly consider both DWIs and ADC maps to extract the semantic information.
Recently, convolutional neural network-(CNN) based methods have presented tremendous ability in image classification and semantic segmentation on medical image processing [4,5]. Different from conventional image processing methods using handycrafted image features, the CNN-based methods extract features from manually labeled data by itself. By training CNNs on a massive amount of labeled images, the CNN-based methods have achieved promising results on various organ and lesion segmentation tasks [6][7][8][9]. In order to make use of contextual information in volumetric data, in Hu et al. [10], a 3D CNN was trained to automatically localize and delineate the abdominal organs of interest with a probability prediction map, and then a time-implicit multi-phase level-set algorithm was utilized for the refinement of the multiorgan segmentation. Another 3D CNN-based method using fully Figure 1 Challenge examples in acute ischemic stroke (AIS) segmentation. The first row show apparent diffusion coefficient ( ADC) slices and the second row shows their corresponding diffusion-weighted image (DWI) slices. The yellow arrows identify the hyperintense due to magnetic susceptibility artifacts, and the red arrows identify the hyperintense that are true AIS lesions. Best viewed in color.
convolutional DenseNet [11] was proposed for automatic segmentation of AIS [12], which achieved high performance on their test set.
Typically in a CNN, millions of parameters have to be tuned, and therefore a massive amount of images with accurate annotations are required. For instance, Sharma et al. [7] used 165 fully-labeled subjects for organ segmentation, and Zhang et al. [12] collected 152 fully-labeled subjects for lesion segmentation. Different from the images in the ImageNet [13] and COCO [14] dataset which can be easily obtained from the Internet and labeled by the ordinary people, the medical images have to be acquired by special equipment, and many well-trained clinicians are further required to precisely annotate the labels. More importantly, to perform image segmentation, most methods require pixel-by-pixel annotations to train the CNN [5,10,[15][16][17][18][19][20][21][22]. For instance, in the AIS lesion segmentation task, most methods [12,23,24] required subjects with pixelwise labels as shown in Figure 2, where each pixel was annotated as normal or lesion. Obviously, annotating the pixel-level labels are labor-intensive and time-consuming, making it even difficult to establish a large dataset. This motivates us to develop a segmentation method by using much simpler annotations. One of the simpler annotations is just annotating whether each slice incorporates lesions or not, as shown in Figure 3. Hereafter, we term this simpler annotation as weak annotation, and the data samples with weak annotations as weakly-labeled. Since the clinicians only require annotating whether each slice includes AIS lesions or not, the annotation cost can be significantly reduced, making it easier to collect a large amount of labeled data samples.
With weakly-labeled data subjects, the CNN can generate a class activation map (CAM) for the AIS lesions [25]. The CAMs, however, were far from accurate segmentation, as they were in fact obtained from the weighted sum of low-resolution feature maps of a convolution layer. As shown in [25], despite that the lesion-wise detection rate is high, the CAMs only cover the most prominent part of the lesion. In fact, merely using weakly-supervised learning would inevitably lead to an underestimated segmentation for a large lesion, and an overestimated one for a small lesion. As we will show in this paper, by fusing multi-resolution feature maps, the proposed double-path classification network (DPC-Net) is able to output a suitable estimation for AIS lesion segmentation.
To generate a precise segmentation, we further propose to use a small amount of fully-labeled subjects to facilitate the segmentation. Despite that such semi-supervised learning methods have been adopted in image segmentation tasks by using a mixed dataset

Figure 3
Examples of the weakly-labeled subjects. The first two rows show apparent diffusion coefficient ( ADC) slices and their corresponding diffusion-weighted image (DWI) slices. The third row shows the annotations, "yes" indicates that the slice has lesion and "no" indicates the opposite. with many unlabeled samples and a few fully-labeled samples [26][27][28], most methods propose to adopt a self-training or cotraining strategy, where the unlabeled samples were automatically annotated and added to the labeled set. Note that the generated labels will include errors, and some errors will be even strengthened during the iterative training process, leading to a risk of performing worse than a supervised approach. To solve this problem, we propose to utilize the a priori knowledge, and use the fully-labeled subjects to fine-tune a machine learning method with much fewer parameters.
In particular, we propose to use the K-means algorithm on the DWIs due to the a priori knowledge that the AIS lesions appear as hyperintense on the DWI, so that a number of candidate AIS lesion regions can be obtained. Then a region growing algorithm is adopted to combine the semantic information from the DPC-Net and the textural and spatial information from the K-means. Different from the conventional region-growing method, the algorithm adopted in this paper can automatically find an initial point from the DPC-Net rather than manually assign one. By training on 460 weakly-labeled and 5 fully-labeled subjects, the proposed method is able to achieves a mean Dice coefficient (DC) of 0.642, and a lesionwise F1 score of 0.822 on a clinical dataset with 150 subjects.

RELATED WORK
There have been extensive efforts on automatic segmentation of ischemic stroke lesion recently. We roughly divided these methods into two categories, conventional methods and CNN-based methods, according to whether handy-crafted features are required. In the conventional methods, by defining some specific features on the MRIs, the ischemic stroke lesions can be identified by conventional image processing techniques or using machine-learning algorithms such as the random forest or the support vector machine (SVM). For instance, a gravitational histogram optimization-based method was proposed for ischemic stroke lesion segmentation (ISLES) using DWI [29]. To reduce the false positive (FP) rate, Mitra et al. [30] further proposed to use multimodal MRIs to extract features and identify the lesions using random forest. In Maier et al. [31], a stroke lesion segmentation method based on local features extracted from multimodal MRI, and a SVM classifier was further trained to segment the lesions. A random forest-based method was further used to identify the sub-AIS lesions [32], and achieves a topranking result in ISLES challenge in 2015 [33]. However, the performances of conventional methods were still not good enough as they are heavily dependent on handy-crafted features, which is understandable by the results of the recent challenge in ISLES 2015 [33].
At the same time, CNN-based methods have emerged as a powerful alternative for automatic segmentation of ischemic stroke lesion. For instance, most top-ranking methods in ISLES 2015 were based on CNNs, and the winner, DeepMedic [34], was a 3D CNN-based method, which achieved a DC of 0.59 on test set. Some other successful methods [12,18] based on CNNs in ISLES 2015 were also derived from generic CNNs architecture. Nevertheless, sub-AIS lesions have different imaging modality and lesion features from AIS lesions, so that the methods cannot be straightforwardly generalized to be used for AIS lesions. It is thus necessary to explore the methods for segmentation of AIS lesion.
Due to the lack of public dataset on the AIS lesion segmentation task, the AIS lesion segmentation methods were evaluated on institutional data were reported in the literature. A framework [23] with the combination of EDD Net and multiscale convolutional label evaluation net (MUSCLE Net) achieved a DC of 0.67 on test set by using DWIs. In order to fully utilize the MRI information, Res-Unet [24] carried out segmenting AIS lesion in multi-modality MRI. To take advantage of contextual information in volumetric data, a method based on 3D fully CNNs was proposed for AIS lesion segmentation [12], which is shown to be superior in terms of precision rate. However, these methods required hundreds of high quality fully-labeled subjects, which are very time-costly to obtain.
To reduce the burden in annotation, some semi-supervised and weakly-supervised segmentation methods were proposed. In weakly-supervised learning methods, the clinicians only need to annotate whether an image contains the target tissue or not. For instance, in Jia et al. [35] and Xu et al. [36], the cancer cells were segmented by using the labels that indicates whether a histopathology image contains cancer cells or not. When adopted in AIS segmentation, the weakly-supervised method presented high lesion-wise detection rate [25]. However, due to the lack of information on lesion positions, the CNN tends to only identify the most prominent feature that helps in classification, leading to an underestimated segmentation size when the lesion is large. For small lesions, as the segmentation results were in fact upsampled from the low-resolution feature maps in the final convolution layer, it tends to overestimate the lesion size. It implies that to accurately segment the lesions, some full annotations should be used.
The semi-supervised learning was conventionally referred to the case where the training data samples were mixed by both fullylabeled and unlabeled subjects. In medical image segmentation tasks, the network usually first learnt from the fully-labeled subjects to generate fake labels for those without full labels, and then update the network parameters based on the whole training set with both real and fake labels [26][27][28]. Such methods, however, risk in error propagation, as all knowledge of the target distribution came from the limited number of fully-labeled subjects.
To avoid this problem, in this paper, we propose to use a semisupervised learning method with a few fully-labeled and many weakly-labeled subjects, where the weakly-labeled ones were used to train a CNN to extract semantic features and the fully-labeled ones were adopted to fine tune the segmentation results.

METHODS
In this section, we present a semi-supervised deep learning method for AIS lesion segmentation on multi-modal MR images, and the whole pipeline is presented in Figure 4. In particular, our proposed method includes two pathways. In the first pathway, we propose a DPC-Net to extract semantic information. It is able to generate probability maps (PMs) for AIS lesions with higher spatial accuracy than the CAMs in the conventional weakly-supervised learning method. In the other pathway, a K-means clustering algorithm is incorporated to obtain the lesion segmentation results, thanks to the a priori knowledge that the AIS lesions appear as hyperintense on the DWIs. Then, a region-growing algorithm is adopted to combine the results of the two pathways and generate the final segmentation result. Compared to other semi-supervised learning based methods, the proposed method does not generate any fake labels for the weakly-labeled subjects, and therefore prevent propagating the errors from a network that overfits on such a small number of fully-labeled subjects.
In the following subsections, we will introduce the proposed semisupervised AIS lesion segmentation method in detail.

Double-Path Classification Network
The DPC-Net is proposed based on the VGG-16 network [37] truncated before the third maxpooling layer, and the network architecture is depicted in Figure 5(b). As we can see, we added a global average pooling (GAP) layer followed by a fully connected (FC) layer at the top of the network. Intuitively, the feature maps of the convolution block 7 is much lower than the original input images, leading to an inaccurate output PM from the view of segmentation task. To this end, we further added a side-branch with a GAP layer and a FC layer on convolution block 4 to improve the spatial resolution. As we will see in Section 4, despite that the side-branch output may contain many FPs due to the lack of semantic information, the segmentation results can be significantly improved by combining the outputs of the main branch and the side branch.
At the training stage, as only the slice-level label is available, a classifier is trained by using the DPC-Net shown in Figure 5(b), which can be regarded as a weakly-supervised leaning in the viewpoint of the AIS lesion segmentation. At the inference stage, as our objective is to locate the AIS lesions, the CAMs [38] is computed for both the main and the side branches to generate the segmentation seeds with adequate semantic information to highlight suspicious lesion region.
In particular, as Figure 5(c) shows, we generate the CAMs from conv-block 7 and conv-block 4, and then bilinearly upsample them to the same size as the original input, denoted as M 1 and M 2 , respectively, where the weights are copied from the corresponding FC layers. To ensure that the maximum activation equals one, the CAMs are further normalized as for i = 1, 2, where ⌢ y denotes the classifier output on the main branch.
Note that the main branch CAM output M 1 has a low spatial resolution with, however, more semantic information. We propose to fuse both CAMs to achieve a more accurate segmentation of the lesion regions. We first compute a binary segmentation map M 1b from the main branch CAM output M 1 by a threshold of 0.5, and then fuse M 1b and M 2 to generate PM as where • denotes the Hadamard product.

Segmentation Map Generation
By making use of the a priori knowledge that the AIS lesions appear as hyperintense on the DWIs [3], we propose to adopt K-means algorithm and a region-growing algorithm to identify the lesion regions. In particular, the initial growing points of region-growing algorithm are given by the outputs of DPC-Net rather than manual points. Our proposed algorithm is summarized as Algorithm 1.
Algorithm 1 integrates both the pixel-level information of the original DWIs and the semantic information extracted by the DPC-Net. To achieve a better performance, the key problem is how to determine the number of clusters K and the threshold . In this paper, we propose to use some fully-labeled subjects to fine-tune these two parameters by using grid search method. As we will discuss in the next section, the segmentation results can be significantly improved by using a very small amount of fully-labeled images.

Evaluation Metrics
In this paper, we propose to use DC, Matthews correlation coefficient (MCC) and intersection over union (IoU) to evaluate the pixel-level segmentation performance, which are defined as MCC = tp * tn − fp * fn √ (tp + fp)(tp + fn)(tn + fp)(tn + fn) and IoU = tp tp + fp + fn (5) respectively where tp, fp, fn and tn denote the pixel counts of true positives (TPs), FPs, false negatives (FNs) and true negatives, respectively.
Cosine similarity index (CSI) is used to measure similarity between the ground truth G and the predicted segmentation P, which is defined as where vec(A) denotes the vectorization of A. ⋅ and ‖‖ denote dot product and l 2 norm, respectively.
In clinical diagnosis, the segmentation results on both large and small lesions are of equal importance. It is therefore necessary to use the lesion-wise metrics to evaluate the performance. In particular, we perform 3D connected component analysis on both the ground truth and the predicted segmentation. A region is said to be a TP if it appears on both ground truth and the prediction. A FP is counted if a region on the prediction has no overlapping area with any region on the ground truth; while a FN is counted if a region appears on the ground truth has no overlapping area with any region on the prediction. The mean number of TPs (m#TP), the mean number of FPs (m#FP) and the mean number of FNs (m#FN) can be then calculated by averaging over the total number of subjects.
We propose to use the lesion-wise precision rate (P L ), the lesionwise recall rate (R L ) and the lesion-wise F1 score as lesion-wise metrics, which are given as R L = m#TP m#TP + m#FN (8) and respectively.

Data and Preprocessing
The experimental data used in this study were collected from Tianjin Huanhu Hospital (Tianjin, China), which includes 615 patients with AIS lesions. All clinical images were collected from a retrospective database and anonymized prior to use. Ethical approval was granted by Tianjin Huanhu Hospital Medical Ethics Committee. MR images were acquired from three MR scanners, with two 3T MR scanners (Skyra, Siemens and Trio, Simens) and one 1.5T MR scanner (Avanto, Siemens). DWI images were acquired using a spin echo-type echo planar imaging (SE-EPI) sequence with b values of 0 and 1000 s/mm 2 . The parameters are summarized in Table 1. ADC maps were calculated from the scan raw data in a pixel-by-pixel manner as where b characterizes the diffusion-sensitizing gradient pulses, with b 1 = 1000 s∕mm 2 and b 0 = 0 s∕mm 2 in our data. s 1 is the diffusion-weighted signal intensity with b = 1000 s∕mm 2 . s 0 is the signal with no diffusion gradient applied, i.e., with b = 0 s∕mm 2 .
The ischemic lesions were manually annotated by two experienced experts (Dr. Song Jin and Dr. Chen Cao) from Tianjin Huanhu Hospital.
We split the whole dataset into two subsets: training set and test set. The training set includes 460 subjects with weak labels for training and validating the DPC-Net, and 5 subjects with precise pixel-level labels to fine-tune clustering number K in the K-means algorithm and the threshold value in the region growing algorithm. The test set includes 150 subjects with full annotations to evaluate the segmentation performance.
As the MR images were acquired on the three different MR scanners, the matrix size varies. Therefore, we resample all the MR images to the same size of 192 × 192 using linear interpolation. The pixel intensity of each MR image slice is normalized into that of zero mean and unit variance, and the DWI and ADC slices are concatenated into a dual-channel images and fed into the DPC-Net. During training, data augmentation technique is adopted to avoid over-fitting. In particular, each input image is randomly rotated by a degree ranging from 1 to 360 degree, flipped vertically and horizontally on the fly, so as to augment the dataset and reduce memory footprint.

Setup and Implementation
The hyperparameters of the proposed DPC-Net are shown in Figure 5. We initialize these networks by Xavier's method [39] and use the Adam method [40] with 1 = 0.9, 2 = 0.999 and initial learning rate of 0.001 as our optimizer. The learning rate is scaled down by a factor of 0.1 if no progress is made for 15 epochs on validation loss. Early-stopping technique is adopted after 30 epochs with no progress on the validation loss.
The experiments are performed on a computer with an Intel Core i7-6800K CPU, 64GB RAM and Nvidia GeForce 1080Ti GPU with 11GB memory. The computer operates on Windows 10 with CUDA 9.0. The network is implemented in PyTorch 0.4 (https://pytorch.or g/) and K-means is implemented on scikit-learn (https://scikit-lear n.org/stable). The MR image files are stored as Neuroimaging Informatics Technology Initiative (NIfTI) format, and processed using Simple Insight ToolKit (SimpleITK) [43]. We use ITK-SNAP [44] for the visualization of results.

Results
In our experiment, 460 weakly-labeled subjects and 5 fully-labeled subjects are used to train and fine-tune the parameters. The 5 fullylabeled subjects are abbreviated as fine-tuning set. Figure 6 presents the segmentation results of our proposed method. The CAM results obtained from a naïve VGG-16 based CAM method [38], denoted as CAM-baseline, and the output PM of our proposed DPC-Net, denoted as PM-DPC-Net, are also presented for comparison. As shown in Figure 6, the CAM-based methods can successfully identify and localize the AIS lesions. Despite that the magnetic artifacts have similar appearances as the AIS lesions on the DWIs, the deep-learning-based methods can distinguish the lesions from the artifacts thanks to the semantic information extracted from the weakly-labeled subjects. From the aspect of segmentation, however, the CAM-baseline tends to segment much larger area than the actual AIS lesion, due to the fact that the output PMs in conventional CAM method is obtained from features maps with much lower resolution than the original images. Our proposed DPC-Net significantly reduces the areas of the suspicious regions thanks to the side branch output, but underestimates the lesion areas in some cases. By integrating both pixel-level clustering and the PM-DPC-Net output, the proposed method is sensitive to both large and small lesions, and presents much better segmentation results over the CAM-baseline and PM-DPC-Net.
For the sake of comparison with other methods of semantic segmentation, we also train and evaluate a U-net [41], a FCN-8s [19], a Res-Unet [24] and the method proposed in [42] on our dataset. The encoder parts of these methods are also pretrained as a classifier on our weakly-labeled data for fairness, then these methods are retrained on the 5 fully-labeled data. As the Figure 6 shows, the U-net, FCN-8s, Res-Unet and the proposed in [42] overlook some lesions in segmentation process, as merely training on 5 subjects are far from adequate to train a CNN with so much parameters. Besides, FCN-8s uses three-scale feature fusion and the outputs of its last convolutional layer resampled to the size of input image require interpolation of 32 times, which will overestimate some lesion regions.
The numerical evaluation results on the test set with 150 fullylabeled subjects are summarized in Table 2. For CAM-baseline and PM-DPC-Net, the thresholds to generate the binary segmentation are selected by using the fine-tuning set. For our proposed method, these fully-labeled images are used to tune both the threshold and the number of clusters K. , FCN-8s [19], Res-Unet [24], the method proposed in [42] and our proposed method. The CAMs and PMs are depicted on the DWI, and the redder with the color map, the more likely it is to represent the lesion region. The segmentations are also depicted on the DWI, and highlighted in red. Best viewed in color. From the aspect of the pixel-level metrics, our proposed method achieves a mean DC of 0.642, a mean CSI of 0.680, a mean MCC of 0.680 and a mean IoU (mIoU) of 0.512, which are higher than the results obtained by the competitors. In fact, such performance is very close to the method trained on fully-labeled images [23]. From the aspect of lesion-wise metrics, our proposed method achieves the precision rate of 0.880, which is higher than other methods except for FCN-8s. In fact, FCN-8s has 19 missed diagnosis subjects while our method only has 2 missed diagnosis, which leads the less FPs of FCN-8s. The recall rate, however, is worse than the Res-Unet due to the fact that the Res-Unet has the residual unit which can promote gradient propagation so as to reduce the FNs. The K-means clustering algorithm may also fail to group all lesions to the same group if the intensities of different lesions differ significantly from each other. Figure 7 further plots the scatter map between the volumes of the ground truth and the predicted segmentation, where the purple line indicates a perfect match between the predicted volumes and the ground truth volumes. As the Figure 7 shows, the predicted volumes of our proposed method are closer to the true volumes than the competitors.

Effect of Clustering Numbers
As the AIS lesions appear as hyperintense on the DWI, we adopt Kmeans clustering algorithm to identify the hyperintensive regions. Note that the artifacts on the DWIs are also the hyperintensive regions, making it crucial to fine-tune the value of K. As we can see from Figure 8, when K = 3, the lesion regions and most of the normal regions are divided into a class. When K = 4, there is a little improvement, however, artifacts will be in the same cluster as the AIS lesions including the artifacts around the lesions, thus, the clustering lesion regions will be larger than the true lesions. As the K increases, the clustering lesion regions will gradually decrease from greater than the true lesion regions to the near true lesion. The first row and the third row in Figure 8 show that some clustering lesion regions have disappeared when K = 7, and more clustering lesion regions will disappear when K = 10.
In our work, we propose to use a small amount of fully-labeled subjects to search the optimal parameters in a grid search manner, and use DC as the metric. The results with K = 3, 4, 5, 6, 7, 10 are summarized in Table 3. As we can see from Table 3, the best threshold decreases as K when K = 3, 4, 5, 6 and the performance is improved. However, after K = 7, the performance begins to decline.

Impact of Lesion Size
As pointed out in [45], a AIS lesion is classified as a lacunar infarction (LI) lesion if its diameter is smaller than 1.5 cm. Clinically, the LI stroke accounts for 85% of all AIS patients. However, it is much difficult to be diagnosed in clinical practice, especially when it is too small to be noticed. It is, therefore, very necessary to evaluate the performance on subjects with small lesions.
In this subsection, we further divide our test set to a large lesion set and a small lesion set. A subject is categorized into the small lesion set only if all of the lesions are LI lesions. Otherwise, it will be included in the large lesion set. In our test set, the large lesion and the small lesion sets include 60 subjects and 90 subjects respectively. As Table 4 shows, our proposed method achieves a DC of 0.708 on the small lesion set, while a DC of 0.543 on the large lesion set. Meanwhile, the R L of small lesions is higher than that of large lesions. The reason is that the distribution of hyperintense in large lesion regions are uneven, which makes the lesion regions predicted by DPC-Net to be smaller than the true lesion regions. Thus, how to solve the influence of hyperintense distribution imbalance on lesion segmentation is a problem to be considered in the future.
In clinical diagnosis, large lesions are more easily diagnosed, while small lesions are not. Our proposed method achieves high performance on small lesions, which might be of a good inspiration for other methods.

Discussions of the DPC-Net
In Section 3.1, we propose the DPC-Net to generate the PMs. The DPC-Net consists of two branches that are based on the convolutional block 4 and convolutional block 7 of the VGG-16. In this section, we combine different convolutional block of VGG-16 to generate many new DPC-Nets for comparison. In particular, we only focus on the convolutional blocks next to the maxpooling layer for reason that they extract the largest semantic information at the current feature map scale. Meanwhile, as a shallow convolution layer, the convolutional block 2 cannot extract semantic information well. Hence the combination of the new DPC-Nets includes convolutional block (4, 10), convolutional block (7, 10), convolutional block (4,13), convolutional block (7, 13) and convolutional block (10,13), which are referred to as the "variants" for simplicity. The Figure 9 plots the PMs generated by the "variants" and our proposed DPC-Net. We propose to use convolutional block 4 and convolutional block 7 to generate the PMs for reason that the feature maps of convolutional block 7 are much higher spatial resolution than that of deeper convolutional blocks, which can remain the approximate suspicious lesion regions to the true lesion regions when its feature maps are interpolated into the original input size. Besides, the convolutional block 4 is the closest to the convolutional block 7, which can help each other to extract semantic information well. In contrast, the "variants" use deeper convolutional blocks to generate PMs. Their feature maps have lower spatial resolution that would lead to larger suspicious lesion regions than the true lesion regions when their feature maps are interpolated into the original input size, which further provide larger growing seed points for the cluster map on the DWI so that the segmented lesions are larger than the true lesions. Table 5 summarizes the evaluation results under the "variants" and our proposed DPC-Net. As we can see, the  The convolutional block is abbreviated to CB for simplicity. CB (4, 7) denotes our proposed DPC-Net. DPC-Net, double-path classification network; DC, Dice coefficient; MCC, Matthews correlation coefficient; CSI, Cosine similarity index.

Figure 9
Examples of probability maps (PMs). The PMs are depicted on the diffusion-weighted image (DWI), and the redder with the color map, the more likely it is to represent the lesion region. The convolutional block is abbreviated to CB for simplicity. CB (4, 7) denotes our proposed double-path classification network ( DPC-Net). Best viewed in color.
proposed combination strategy achieves the best results in terms of both lesion-wise metrics ( P L , R L , F1 ) and similarity indices (DC, MCC, CSI, mIoU).

CONCLUSION
In this paper, we present a semi-supervised method for AIS lesion segmentation, where 460 weakly-labeled subjects are used to train the DPC-Net and then 5 fully-labeled subjects are used to fine-tune the parameters in a supervised way. The proposed semi-supervised method presents a high segmentation accuracy on the clinical MR images with a DC of 0.642. More importantly, it presents very high precision of 0.880, which is of paramount importance in avoiding misdiagnosis in clinical scenario. Meanwhile, the proposed method largely reduces the expense of obtaining a large number of fullylabeled subjects in a supervised setting, which is more meaningful in terms of engineering maneuverability.
In fact, lesion segmentation using unsupervised learning is an interesting topic to study, and has attracted the attentions among researchers [46]. However, the magnetic artifacts have similar appearances as the AIS lesions on the DWIs and it is difficult to distinguish the AIS lesions from the artifacts in a fully-unsupervised manner. In K-means, for instance, as we have shown in Figure 8, the unsupervised learning method cannot extract the necessary semantic information for segmentation, leading to many FPs. Such results indicate that to detect the abnormality in an unsupervised way, the methods should be designed more carefully. We would very much like to investigate the unsupervised learning methods for AIS segmentation in our future work.

CONFLICTS OF INTEREST
All authors of this manuscript do not have financial and personal relationships with other people or organizations that could inappropriately influence their work.