Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(format): respect max line width #242

Merged
merged 47 commits into from
Jan 15, 2025
Merged

feat(format): respect max line width #242

merged 47 commits into from
Jan 15, 2025

Conversation

a-frantz
Copy link
Member

@a-frantz a-frantz commented Oct 30, 2024

Describe the problem or feature in addition to a link to the issues.

Took me a while to crack this problem, but with some help from @adthrasher I think we've finally got the feature working!

Still to do in a follow up PR: we want certain line breaks to be "matched", like if we break on one half of a parenthesis pair we should break on the other half as well. Until that gets implemented, some of the line breaks will look a little awkward.

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have added yourself or the appropriate individual as the assignee.
  • You have added at least one relevant code reviewer to the PR.
  • Your code builds clean without any errors or warnings.
  • You have added tests (when appropriate).
  • You have updated the README or other documentation to account for these
    changes (when appropriate).
  • You have added an entry to the relevant CHANGELOG.md (see
    "keep a changelog" for more information).
  • Your commit messages follow the conventional commit style.

@a-frantz a-frantz self-assigned this Jan 15, 2025
@a-frantz a-frantz marked this pull request as ready for review January 15, 2025 12:18
Copy link
Member

@adthrasher adthrasher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review before the interview starts


/// An indentation level.
#[derive(Clone, Copy, Debug)]
pub enum Indent {
/// Tabs.
Tabs(NonZeroUsize),

Tabs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a user indents with tab, they can only use one tab. I'm fine with that, but just flagging that to make sure we're OK with that choice.

Comment on lines +20 to +25
/// The default maximum line length.
pub const DEFAULT_MAX_LINE_LENGTH: usize = 90;
/// The minimum maximum line length.
pub const MIN_MAX_LINE_LENGTH: usize = 60;
/// The maximum maximum line length.
pub const MAX_MAX_LINE_LENGTH: usize = 240;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested any non-default line lengths?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out these 3 commits on workflows - stjudecloud/workflows@main...tmp/format-examples

wdl-format/src/token.rs Outdated Show resolved Hide resolved
wdl-format/src/token.rs Outdated Show resolved Hide resolved
Comment on lines +37 to +40
/// A temporary indent.
///
/// This is added after a [`PostToken::Indent`] during the formatting of
/// command sections.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me from these comments why this is needed. Maybe there is additional documentation below where the command block is formatted, but it may be useful to expand here.

msg = "Cannot use automatic control subsampling (\"chip.ctl_depth_limit\">0 and \"chip.exp_ctl_depth_limit\">0) for " + "multiple controls with mixed endedness (e.g. SE ctl-rep1 and PE ctl-rep2). " + "Automatic control subsampling is enabled by default. " + "Disable automatic control subsampling by explicitly defining the above two parameters as 0 in your input JSON file. " + "You can still use manual control subsamping (\"chip.ctl_subsample_reads\">0) since it is done " + "for individual control's TAG-ALIGN output according to each control's endedness. ",
msg = "Cannot use automatic control subsampling (\"chip.ctl_depth_limit\">0 and \"chip.exp_ctl_depth_limit\">0) for "
+ "multiple controls with mixed endedness (e.g. SE ctl-rep1 and PE ctl-rep2). "
+ "Automatic control subsampling is enabled by default. " + "Disable automatic control subsampling by explicitly defining the above two parameters as 0 in your input JSON file. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it should bring the second + down to the next line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require backtracking, which is certainly possible but I'd say is OOS for this PR. I can open an issue for tracking that feature request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I missed the + in the middle of the line initially when trying to interpret Clay's comment.

Copy link
Collaborator

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good and I like the formatting changes, especially for if expressions.

Just a few comments.

wdl-format/src/config/builder.rs Outdated Show resolved Hide resolved
wdl-format/src/config/max_line_length.rs Outdated Show resolved Hide resolved
wdl-format/src/config/max_line_length.rs Outdated Show resolved Hide resolved
wdl-format/src/lib.rs Outdated Show resolved Hide resolved
wdl-format/src/token/post.rs Outdated Show resolved Hide resolved
])
)
Boolean paired_end_ = if !defined(paired_end) && i < length(paired_ends) then paired_ends[
i] else select_first([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence whether I like this idea that ( and [ can split things down to the next line. For instance, here, I think it looks quite strange. I'm thinking that, for these lines that are being broken based on long line length, perhaps that should not be permitted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning for a follow up PR that will line break "matching" tokens. So this example might become:

        Boolean paired_end_ = if !defined(paired_end) && i < length(paired_ends) then paired_ends[
            i
        ] else select_first([

or something similar. I haven't spent time thinking about the specifics yet, but I agree that the current implementation leaves something to be desired. I can push up a change the removes the open/close tokens from can_be_line_broken so we can see how it looks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there should be some sort of priority list. Then if a line break is needed, you'd backtrack and find the highest "priority" line break location and break there. Here that's probably before then.

@@ -1627,7 +1472,8 @@ workflow chip {
}
}

Boolean has_input_of_count_signal_track = has_output_of_bam2ta || defined(bam2ta.ta)
Boolean has_input_of_count_signal_track = has_output_of_bam2ta || defined(bam2ta.ta
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, this is quite strange from a formatting perspective.

])

Boolean has_input_of_align_ctl = i < length(ctl_fastqs_R1) && length(ctl_fastqs_R1[
i]) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.


Boolean has_input_of_filter_ctl = has_output_of_align_ctl || defined(align_ctl.bam)
Boolean has_input_of_filter_ctl = has_output_of_align_ctl || defined(align_ctl.bam
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here. Sorry for so many comments, just want to call out places that require further examination.

wdl-format/src/token/pre.rs Outdated Show resolved Hide resolved
Comment on lines +1079 to +1080
File? blacklist_ = if length(blacklists) > 1 then pool_blacklist.ta_pooled else if length(
blacklists) > 0 then blacklists[0] else blacklist2_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one of the outstanding TODO items, but if not, we should revisit this case.

else "p.value"
)
String idr_rank_ = if peak_caller_ == "spp" then "signal.value" else if peak_caller_
== "macs2" then "p.value" else "p.value"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too

Comment on lines +1298 to +1301
Int num_rep_nodup_bam = if length(nodup_bams) < num_rep_bam then num_rep_bam else length(
nodup_bams)
Int num_rep_ta = if length(tas) < num_rep_nodup_bam then num_rep_nodup_bam else length(
tas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also cases to revisit.

Comment on lines +1307 to +1312
Int num_ctl_bam = if length(ctl_bams) < num_ctl_fastq then num_ctl_fastq else length(
ctl_bams)
Int num_ctl_nodup_bam = if length(ctl_nodup_bams) < num_ctl_bam then num_ctl_bam else length(
ctl_nodup_bams)
Int num_ctl_ta = if length(ctl_tas) < num_ctl_nodup_bam then num_ctl_nodup_bam else length(
ctl_tas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Comment on lines +1556 to +1557
Boolean has_input_of_bam2ta_no_dedup = (has_output_of_align || defined(align.bam))
&& !defined(bam2ta_no_dedup_R1.ta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example (in my opinion) of a good break. It broke on the && between clauses which would be my preference.

Comment on lines +1756 to +1757
if (has_input_of_count_signal_track_pooled && enable_count_signal_track_ && num_rep > 1
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another case where it should probably backtrack and break on &&.

Comment on lines +1808 to +1811
Int chosen_ctl_ta_id = if has_all_input_of_choose_ctl && !align_only_ then select_first(
[
choose_ctl.chosen_ctl_ta_ids,
])[i] else -2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is a really awkward break.

Comment on lines +1993 to +2000
Array[File?] chosen_ctl_ta_pooled = if !has_all_input_of_choose_ctl || align_only_
then [] else if chosen_ctl_ta_pooled_subsample > 0 then [
subsample_ctl_pooled.ta_subsampled,
] else if num_ctl < 2 then [
ctl_ta_[0],
] else [
pool_ta_ctl.ta_pooled,
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this indentation correct? It looks like an indent is dropped for the last 5 lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the unformatted source:

    Array[File?] chosen_ctl_ta_pooled = if !has_all_input_of_choose_ctl || align_only_ then []
        else if chosen_ctl_ta_pooled_subsample > 0 then [ subsample_ctl_pooled.ta_subsampled ]
        else if num_ctl < 2 then [ ctl_ta_[0] ]
        else [ pool_ta_ctl.ta_pooled ]

@@ -3344,7 +2954,7 @@ task call_peak {
~{"--chrsz " + chrsz} \
~{"--fraglen " + fraglen} \
~{"--peak-type " + peak_type} \
~{"--blacklist " + blacklist}
~{"--blacklist " + blacklist}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have trailing whitespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's present in the unformatted file. I'm not sure why the prior implementation was stripping it 🤔 but I think it's supposed to be there.

Copy link
Collaborator

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but we should probably create a tracking issue for any follow-up tweaking needed to the line breaking output.

@a-frantz a-frantz merged commit dbe79da into main Jan 15, 2025
16 checks passed
@a-frantz a-frantz deleted the feat/line-width branch January 15, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants