PyEvolve: Automating Frequent Code Changes in Python ML Systems

Here we present three supplementary materials for the paper published in ICSE-2023

  1. We provide the link to our tool PyEvolve.
  2. We applied PyEvolve to open source projects and submitted pull requests to assess its usefulness. We provide a list of all the pull requests.
  3. We used cross validation to assess PyEvolve's effectiveness. We provide the dataset that was used for cross-validation.

The supplementary materials for our paper, R-CPATMiner, can be found at https://mlcodepatterns.github.io .


1. Tool - PyEvolve

PyEvolve automates the frequently repeated code changes in Python systems. This tool presents a complete pipeline for mining and automating best code evolution practices, ensuring that the your project does not fall behind. This link -- PyEvolve -- provides open access to the executables and source code for the tool. It includes steps for building the tool, examples of how to use it, and a list of the current APIs.


2. Summary of Pull requests

We submitted pull requests that cover a wide range of changes, including dissolving FOR loops, migrating to context managers, using advanced language features, and updating APIs. The table below contains examples of transformations from each group.

Dissolving FOR loops Migrating to context managers Using advanced language features Updating APIs
Italian Trulli Italian Trulli Italian Trulli Italian Trulli

The table below contains links to all of the pull requests we submitted to the popular open source projects.

# Url State Merged
1 microsoft/nni/pull/4982 closed True
2 dipy/dipy/pull/2618 closed True
3 facebookresearch/ParlAI/pull/4718 closed True
4 HazyResearch/pdftotree/pull/122 closed True
5 brightmart/text_classification/pull/149 closed True
6 tensorflow/lattice/pull/73 closed True
7 quadrismegistus/prosodic/pull/37 closed True
8 idaholab/raven/pull/1877 closed True
9 erikbern/ann-benchmarks/pull/303 closed True
10 david-abel/simple_rl/pull/61 closed True
11 ray-project/ray/pull/26284 closed True
12 jindongwang/transferlearning/pull/341 closed True
13 pgmpy/pgmpy/pull/1551 closed True
14 reframe-hpc/reframe/pull/2565 closed True
15 DeepLabCut/DeepLabCut/pull/1905 closed True
16 pytorch/pytorch/pull/82929 closed True
17 ray-project/ray/pull/27600 closed True
18 keras-team/keras/pull/16874 closed True
19 GoogleCloudDataproc/cloud-dataproc/pull/152 closed True
20 idaholab/raven/pull/1930 closed True
21 BindsNET/bindsnet/pull/570 closed True
22 CellProfiler/CellProfiler/pull/4610 closed True
23 daniellerch/aletheia/pull/21 closed True
24 deepinsight/insightface/pull/2070 closed True
25 scikit-image/scikit-image/pull/6458 open True
26 danforthcenter/plantcv/pull/932 open True
27 aws/sagemaker-python-sdk/pull/3286 closed True
28 biolab/orange3/pull/610 closed True
29 Pinafore/qb/pull/107 open False
30 tensorflow/transform/pull/280 open False
31 tensorflow/ranking/pull/325 open False
32 google-research/google-research/pull/1189 open False
33 LCAV/pyroomacoustics/pull/271 open False
34 bnpy/bnpy/pull/42 open False
35 brainiak/brainiak/pull/516 open False
36 LxMLS/lxmls-toolkit/pull/176 open False
37 cornellius-gp/gpytorch/pull/2049 closed False
38 lmcinnes/pynndescent/pull/192 closed False
39 pyRiemann/pyRiemann/pull/185 closed False
40 calico/basenji/pull/125 closed False

3. Cross-validation dataset

We evaluated PyEvolve over 40,000 tranformation trials. The data set available in this link